Tuesday, July 22, 2014

Dusk at a small lake, Lassen Volcanic National Park

Dusk at a small lake, Lassen Volcanic National Park, California
Nurse's Song

When the voices of children are heard on the green,
And laughing is heard on the hill,
My heart is at rest within my breast,
And everything else is still.

‘Then come home, my children, the sun is gone down,
And the dews of night arise;
Come, come leave off play, and let us away
Till the morning appears in the skies.’

‘No, no, let us play, for it is yet day,
And we cannot go to sleep;
Besides, in the sky the little birds fly,
And the hills are all cover'd with sheep.’

‘Well, well, go and play till the light fades away,
And then go home to bed.’
The little ones leapèd, and shoutèd, and laugh'd
And all the hills echoed.

Sunday, July 20, 2014

Margaret Hamilton, Mathematician and Computer Scientist, Played Key Role in Space Program



Hamilton as Lead Apollo Flight Software Designer
(Link) Wikipedia
(Link) NASA Office of Logic Design
Margaret Hamilton (born 1938) is an American former NASA scientist, and founder and CEO of software development company Hamilton Technologies, Inc. At NASA she was Director of the Software Engineering Division of the MIT Instrumentation Laboratory, later the Charles Stark Draper Laboratory, which played a key role in the success of the Apollo space program.

NASA Research

At NASA Hamilton was responsible for helping pioneer the Apollo on-board guidance software required to navigate to/from and land on the moon, and its multiple variations used on numerous missions (including the subsequent Skylab.) She worked to gain hands-on experience during a time when computer science and software engineering courses or disciplines were non-existent.

In the process, she produced innovations in the fields of system design and software development, enterprise and process modelling, preventative systems design, development paradigm, formal systems (and software) modelling languages, system-oriented objects for systems modelling and development, automated life-cycle environments, methods for maximizing software reliability and reuse, domain analysis, correctness by built-in language properties, open-architecture techniques for robust systems, full life-cycle automation, quality assurance, seamless integration (including systems to software), distributed processing systems, error detection and recovery techniques, man/machine interface systems, operating systems, end-to-end testing techniques, and life-cycle management techniques.

These in turn led her to develop concepts of asynchronous software, priority scheduling, and man-in-the-loop decision capability, which became the foundation for modern, ultra-reliable software design.

Apollo 11

Preventing an abort of the Apollo 11 mission has been attributed to her work. Just three minutes before the Lunar lander reached the Moon's surface several computer alarms were triggered. The cause of the alarms was an overload of incoming to the Apollo Guidance Computer (AGC). Due to its robust architecture, the computer was able to keep running, as the Apollo onboard flight software was developed using an asynchronous executive so that higher priority jobs (e.g. important for landing) could interrupt lower priority jobs. A 2005 re-analysis concluded that a hardware design error in the rendezvous radar provided the computer with faulty information even while in standby mode.

Margaret Hamilton, on the design of the Apollo 11 Guidance Computer software:
". . . the computer was being asked to perform all of its normal functions for landing while receiving an extra load of spurious data which used up 15% of its time. The computer (or rather the software in it) was smart enough to recognize that it was being asked to perform more tasks than it should be performing. It then sent out an alarm, which meant to the astronaut, I'm overloaded with more tasks than I should be doing at this time and I'm going to keep only the more important tasks; i.e., the ones needed for landing ... Actually, the computer was programmed to do more than recognize error conditions. A complete set of recovery programs was incorporated into the software. The software's action, in this case, was to eliminate lower priority tasks and re-establish the more important ones ... If the computer hadn't recognized this problem and taken recovery action, I doubt if Apollo 11 would have been the successful [M]oon landing it was.
—Margaret Hamilton, Letter to Datamation, March 1, 1971
Margaret's current activities as of February 2010 include fulfilling her role as the founder and CEO of Hamilton Technologies, Inc., a business developed around the Universal Systems Language (USL) which is in turn based upon her Development Before The Fact (DBTF) paradigm for systems and software design.
Margaret Hamilton has published 130 papers, proceedings and reports concerned with the 60 projects and 6 major programs in which she has been involved.

A selection:
  • M. Hamilton, S. Zeldin (1976) "Higher order software—A methodology for defining software" IEEE Transactions on Software Engineering, vol. SE-2, no. 1, Mar. 1976.
  • M. Hamilton (1994), “Inside Development Before the Fact,” cover story, Editorial Supplement, 8ES-24ES. Electronic Design, Apr. 1994.
  • M. Hamilton, Hackler, W.R.. (2004), Deeply Integrated Guidance Navigation Unit (DI-GNU) Common Software Architecture Principles (revised dec-29-04), DAAAE30-02-D-1020 and DAAB07-98-D-H502/0180, Picatinny Arsenal, NJ, 2003-2004.
  • M. Hamilton and W.R. M. Hackler (2007), “Universal Systems Language for Preventative Systems Engineering,” Proc. 5th Ann. Conf. Systems Eng. Res. (CSER), Stevens Institute of Technology, Mar. 2007, paper #36.

Forty Five Years Ago Today

David Woods, video editor
(his books)

Apollo 11 Lunar Surface Journal, The First Lunar Landing
Corrected Transcript and Commentary Copyright © 1995 by Eric M. Jones
All rights reserved.

Blog notes:

The first moon landing was forty five years ago, today.  As a child, I watched many of the moon landings and was fascinated with the details of the radio communication.  In this video, you can hear the challenges the Apollo 11 mission faced with radio communication and with different kinds of background noise.  (Some of it is also engine noise.  Today, with radio and telephony, you rarely hear radio noise because it is filtered out using digital and analog filters and other noise and error rejection schemes.)

For most of the descent, the Eagle lunar lander is controlled by the flight computer, the Apollo Guidance Computer (AGC), which among other things, is fed data by a radar system which can "see" the lunar surface. 

The full transcript of the lunar descent is available at the NASA site Apollo 11 Lunar Surface Journal, The First Lunar Landing (see above for the link).

The video starts at 102:32:35 into the transcript.

At about 5:17 into the video, or 102:38:04 into the transcript, you can hear Aldrin say "we got good lock on", meaning that the radar and AGC have acquired information about the lunar surface and are flying under computer control with the radar information.

The "P" annotations on the video refer to the computer programs that are being run during the descent.  The AGC programs were:

P63 - Landing maneouver braking program
P64 - Landing maneouver approach phase
P65 - Landing phase - auto
P66 - Rate of descent landing
P67 - Manual landing phase

From the Apollo Guidance Computer wiki page:

At about 5:40 in the video, "Buzz Aldrin gave the Apollo Guidance Computer (AGC) the command '1668' which instructed it to calculate and display DELTAH (the difference between altitude sensed by the radar and the computed altitude). This added an additional 10% to the processor work load causing executive overflow and a '1202' alarm. After being given the "GO" from Houston Aldrin entered '1668' again and another '1202' alarm occurred. When reporting the second alarm Aldrin added the comment "It appears to come up when we have a 1668 up". "

"Luckily for Apollo 11, the AGC software had been designed with priority scheduling. Just as it had been designed to do, the software automatically recovered, deleting lower priority tasks including the '1668' display task, to complete its critical guidance and control tasks."

"The problem was not a programming error in the AGC, nor was it pilot error. It was a peripheral hardware design bug that was already known and documented by Apollo 5 engineers. However because the problem had only occurred once during testing they concluded that it was safer to fly with the existing hardware that they had already tested, than to fly with a newer but largely untested radar system. In the actual hardware, the position of the rendezvous radar was encoded with synchros excited by a different source of 800 Hz AC than the one used by the computer as a timing reference. The two 800 Hz sources were frequency locked but not phase locked, and the small random phase variations made it appear as though the antenna was rapidly "dithering" in position even though it was completely stationary. These phantom movements generated the rapid series of AGC cycle steals."

Again at about 9:30 into the video, there are low priority alarms, '1201' and '1202', which are again over-ridden by the AGC.  The AGC continues to control the flight profile, processing high priority tasks, and ignoring low priority tasks.

At 10:30 into the video, Armstrong takes the AGC out of the controlled P64 approach phase program and into a P66 rate of descent manually controlled landing.  He pitches the profile of the Eagle forward, to maintain speed, in order to fly across Crater West boulder field.  Once across the boulder field, at about 11:05, he pitches back to slow down.

They land with 20 seconds of fuel remaining.

Friday, July 18, 2014

Aids conference says 100 researchers may have been on board crashed plane

Friday, July 18th, 2014

"As many as 100 of the world’s leading HIV/Aids researchers and advocates may have been on the Malaysia Airlines flight that crashed in Ukraine, in what has been described as a “devastating” blow to efforts to tackle the virus."

(read more)

Thursday, July 17, 2014


C.P. Cavafy
Collected Poems.
Translated by Edmund Keeley and Philip Sherrard.
Edited by George Savidis. Revised Edition.
Princeton University Press, 1992

As you set out for Ithaka
hope the voyage is a long one,
full of adventure, full of discovery.
Laistrygonians and Cyclops,
angry Poseidon—don’t be afraid of them:
you’ll never find things like that on your way
as long as you keep your thoughts raised high,
as long as a rare excitement
stirs your spirit and your body.
Laistrygonians and Cyclops,
wild Poseidon—you won’t encounter them
unless you bring them along inside your soul,
unless your soul sets them up in front of you.

Hope the voyage is a long one.
May there be many a summer morning when,
with what pleasure, what joy,
you come into harbors seen for the first time;
may you stop at Phoenician trading stations
to buy fine things,
mother of pearl and coral, amber and ebony,
sensual perfume of every kind—
as many sensual perfumes as you can;
and may you visit many Egyptian cities
to gather stores of knowledge from their scholars.

Keep Ithaka always in your mind.
Arriving there is what you are destined for.
But do not hurry the journey at all.
Better if it lasts for years,
so you are old by the time you reach the island,
wealthy with all you have gained on the way,
not expecting Ithaka to make you rich.

Ithaka gave you the marvelous journey.
Without her you would not have set out.
She has nothing left to give you now.

And if you find her poor, Ithaka won’t have fooled you.
Wise as you will have become, so full of experience,
you will have understood by then what these Ithakas mean.

Ocean Beach

Sea Fever

John Masefield
(Link) (poetry foundation)

I must go down to the seas again, to the lonely sea and the sky,
And all I ask is a tall ship and a star to steer her by;
And the wheel’s kick and the wind’s song and the white sail’s shaking,
And a grey mist on the sea’s face, and a grey dawn breaking,

I must go down to the seas again, for the call of the running tide
Is a wild call and a clear call that may not be denied;
And all I ask is a windy day with the white clouds flying,
And the flung spray and the blown spume, and the sea-gulls crying.

I must go down to the seas again, to the vagrant gypsy life,
To the gull’s way and the whale’s way where the wind’s like a whetted knife;
And all I ask is a merry yarn from a laughing fellow-rover,
And quiet sleep and a sweet dream when the long trick’s over.

Wednesday, July 16, 2014

Ocean Beach

Posterior predictive checks to quantify lack-of-fit in ADMIXTURE models of latent population structure

David Mimno, David M Blei, Barbara E Engelhardt
submitted 30 Jun 2014


Admixture models are a ubiquitous approach to capture latent population structure in genetic samples.  Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of a statistical model. We develop PPCs for ve population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the model estimates for four qualitatively different population genetic data sets: the POPRES European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.



We have developed posterior predictive checks (PPCs) for analyzing genomic data sets with the admixture model. We have demonstrated that the PPC-|estimating the posterior predictive distribution and checking the likelihood of the true observed data under this distribution-|gives a valuable perspective on genetic data beyond statistical inference of model parameters. In the research literature, fitted admixture models are often accompanied by a 'just so' story to explain the inferred parameters and how they are reflective of ancestral truth [13]. The model may suggest these hypotheses, but only conditioned on the model being a good fit for the observed data. PPCs check this assumption of good fit, giving weight to the hypotheses by confirming that the underlying assumptions do not oversimplify the existing structure in the observed data.  In this paper, we developed PPCs for the admixture model, designing biological discrepancy functions to quantify the effect of the model assumptions on interpreting and using the estimated parameters for downstream analyses.

Statistical modeling of genetic data requires us to balance the complexity of the model with its capacity to capture the data at hand. As examples of limitations, we may not have enough data to support an overly complex model, or the model class that that we want to fit may be too complex given our computational constraints. Thus, we support the iterative practice of fitting the simplest model (i.e., the one we fit here), checking whether a higher resolution model is needed, and then improving the model only in the ways that result in more reliable interpretations of the results. PPCs can drive this process of targeted model development, pointing us towards enriched Bayesian admixture models along gradients that quantifiably improve their performance for the exploratory tasks that matter. With this practice in mind, we revisit the PPCs described above and discuss how we might enrich the simple admixture model to address its misspecified assumptions.

Many population studies have applied admixture models to explore and quantify genetic variation between individuals within and across ancestral populations [13,45,46]; these analyses may benefit from the inter-individual PPC. For studies where this PPC indicates misfit, prior work has adapted the admixture model to control admixture LD by explicitly modeling haplotype blocks for each ancestral population instead of modeling each SNP separately [30]. In particular, the SNP-specific ancestry assignment z variables for each individual are modeled by a Markov chain, where the probability of transitioning to a different ancestral population from one position to the next has an exponential distribution. This specifies a Poisson process describing the length of haplotype blocks across the chromosome, with global rate parameter r.

Many studies have noted that background LD may lead to phantom ancestral populations [37]; applying admixture models to genomic data that contain background LD may find the SNP autocorrelation PPC useful. After identifying model misspecification using our background LD discrepancy function, we could extend the admixture model to explicitly capture background LD. Above we described a Markov model on the z variables. It assumes that, conditional on ancestral population assignment, genotypes are independent. Extending this idea, SABER [47] implements a Markov hidden Markov (MHMM) model to capture both haplotype blocks and background LD by adding a Markov chain across the population-specific allele frequencies in beta.  Others have further extended this model in various ways, including estimating recombination events explicitly in the MHMM [48].

Methods and statistics have been proposed to evaluate the proper number of latent ancestral populations, often motivated by FST [6, 49]; additionally, nonparametric Bayesian models estimate the posterior probability for each K [50, 51]. We propose a PPC with the FST discrepancy for general use in evaluating appropriate ranges of the number of ancestral populations for a specific study. A simple adaptation of the model to correct for a failure of this PPC is to change the number of ancestral populations K (Figure S3).

There are also explicit model adaptations that will affect the FST of the inferred ancestral populations.  For example, one can build hierarchical models that allow the sharing of allele frequencies across populations for some SNPs; this was implemented in the structure 2.0 model, which includes a hierarchical component to allow similar allele frequencies across ancestral populations (the so-called F model) [30]. A second example is from the topic model literature (similar models applied to modeling text documents), where the ancestral populations are captured in a tree-structured hierarchy [52, 53].  In the corresponding admixture models, the root node would include SNPs that have shared allele frequencies across all ancestral populations; at the leaves, the population-specific allele frequencies would include SNPs that have a frequency in that population that is different than the frequency in all other ancestral populations (referred to as ancestry informative markers [20]).

Previous population studies have explored and interpreted the population-specific SNP frequencies estimated by admixture models [54-56]; almost all applications of this admixture model have used MAP estimates of ancestry assignments to determine the proportion of admixture in individuals [14, 20]. The average entropy PPC will check model misspecification for ancestry assignment, and has implications for interpreting estimates of SNP frequencies. To adapt the model to this misspecification, the hyperparameters for the Dirichlet-distributed allele-specific ancestry assignments may be changed. (We and others set to alpha = 1 [6], giving equal weight to all possible contribution across ancestries for each SNP.) In particular, we might give higher weight to admixture proportions near 0 and 1 by setting alpha < 1 for studies where we expect low levels of admixture (e.g., the HapMap data). The equivalent change for the hyperparameters in the population-specific allele frequency parameters would encourage for allele frequency spectrums that more closely match what we find in natural populations [57]. Another relevant model adaptation would be to modify the distribution of a SNP to be not Bernoulli but instead Poisson [58], normal [59], or something more sophisticated [60, 61]. We emphasize that, though these extensions seem reasonable, the PPC with this discrepancy found little need to modify the admixture model assumptions in our current studies. The exception to this point is the ASW study, although we hypothesize that correcting for background LD as suggested above will address this misspecification.

We believe that all model-based methods to control for population stratification in association mapping will benefit from application of the mapping PPC, including linear mixed models and non-generative methods such as EIGENSTRAT [2, 62]. Failure of the association mapping PPC indicates that the estimates of population structure are insuffi cient to correct for the confounding latent structure in the individuals. There are many directions to consider for mitigating this type of model misspecification.  As examples, one may use larger numbers of estimated principal components or ancestral populations, use alternative approaches to specifying the latent structure variables, or correct for structure that are estimated on local regions of the genome. This same discrepancy function - replacing z with the estimated random effect from linear mixed models - would be useful in quantifying model misspecification for these alternative methods for association mapping in the presence of confounding population structure [63-65].

Applied statisticians develop models to capture the biological complexity of their data. To form hypotheses from these models, however, we need assurances that the data can support them. PPCs provide a simple mechanism to quantify when a model is suffi cient or when it needs additional structure to support downstream analysis. While we have focused on the admixture model, the PPC methodology applies to any probabilistic model of data. For example, we believe there could be a substantial role for PPCs in evaluating demographic models. As we continue to collect complex genomic data, we continue to develop complex models to explain them. Equally important to building our repertoire of statistical models for analyzing genomic data is to build our repertoire of ways to check those analyses.

Tuesday, July 15, 2014

Arlington Man of the Channel Islands

    National Park Service

     "Arlington Springs Man lived at the end of the Pleistocene when the
     four northern Channel Islands were all still united together as one
     mega-island, and the climate was much cooler than today. The evidence
     that people had arrived on that island by 13,000 years ago demonstrates
     that watercraft were in use along the California coast at that early date
     and lends support for a theory that the earliest peoples to enter the
     Western Hemisphere may have migrated along the Pacific coast from
     Siberia and Alaska using boats. Recent radiocarbon dating by Dr. Larry
     Agenbroad of pygmy mammoth fossils from Santa Rosa Island suggests
     that the last of these unique mammals may have been present on the
     island at the time the first humans arrived."