In the first Peculiar Books Reviewed we discussed David A. Mindell's
delightful book "Digital Apollo" and, in particular, took from the book this
lesson: a technical system which puts the human component in a position of
supremacy over the machine is more capable of achieving the aim of the system
than one which holds humans in a subservient position. That is, ignoring any
moral or dystopian considerations, putting people in a role in which they are
made to serve machines creates worse outcomes in terms of what has been
built. Mindell puts this as the difference between "engineering" and "pilot"
mentalities, the former being in favor of full automation–think Werner von
Braun's desire have mere passengers aboard a clockwork spacecraft–and the
later in favor of manual control of the craft. The "pilot" mentality works
fine in systems that are relatively simple but as the complexity increases
human ability to cope with the banal demands of operations falls off: we can
only do so much. The "engineer" mentality succeeds up until the system
encounters a situation not expected by the engineers and the mechanism, being
unable to cope, falls back to human operators who may or may not be paying
attention at the time of the crisis or capable of adapting the mostly
automated system to their immediate needs.
This idea, that the role of people in a complex system–spacecraft, software
only, industrial etc–can be considered in purely technical terms is
important enough that I'm going to spend this review and the next elaborating
on it. There's a moral argument to be made as well, as hinted at in the
review on Francis Spufford's "Backroom Boys", but the time is not ripe yet
At a little after nine central standard time on the night of Monday, April
13, 1970, there was, high in the western sky, a tiny flare of light that in
some respects resembled a star exploding far away in our galaxy.1
Thus begins Henry S. F. Cooper, Jr.'s "Thirteen: The Apollo Flight That
Failed", one of the best technical explanations of a catastrophic failure and
its resolution ever written. This "tiny flare of light" was a rapidly
expanding cloud of frozen oxygen coming from the now seriously damaged
Service Module (SM). A tank failure ("Later, in describing what happened,
NASA engineers avoided using the word 'explosion;' they preferred the more
delicate and less dramatic term 'tank failure'…"2) of Oxygen Tank No.
2 damaged the shared line between the two primary oxygen tanks and the three
fuel cells. Immediately after the failure two of three fuel cells began
producing reduced amounts of electricity as a set of reactant valves which
fed them were jostled shut, permanently, by the force of the failure. Another
valve, meant to isolate Oxygen Tank No. 1 from No. 2 failed because of the
same mechanical jarring, but was left in an open position. Over the next two
hours, both tanks vented into space, pushing the craft off course and ruining
the Service Module.
The subsequent flight which Cooper so expertly lays out was a "ground show",
in the words of the astronauts themselves3. Usual operation of the
flight is a delicate balance between the on-board astronauts–in physical
possession of the craft and able to manipulate it–and the flight
controllers, receiving constant telemetry from the craft, thinking through
consequences and making recommendations. Cooper describes this by saying
"Astronauts are more like officers aboard a large ship… (and) there were
about as many astronauts and flight controllers as there are officers aboard
a big vessel (…) In fact, one of the controllers, the Flight Director, in
some respects might have been regarded as the real skipper of the
spacecraft…"4 Apollo craft could have operated independently of any
ground crew but only in planned-for situations. Post-failure, it became the
flight controllers' task to find a plan to land the astronauts safely and the
crew's job to carry this out.
Plan they did. With the service module ruined it was abandoned and the crew
began to use the Lunar Module (LM) as a life-boat, an eventually never
Aside from some tests a year earlier (…) no one had ever experimented to
see how long the LM could keep men alive–the first thing one needs to know
about a lifeboat.5
Almost entirely through luck the LM was equipped sufficiently to make the
trip back survivable and possible. Cooper was likely unaware, but as Mindell
pointed out the LM and SM had duplicates of the same computer, meaning that
the LM computer, not being a special purpose lunar-landing device, could make
rocket burns to return the craft to Earth. The rigging of various internal
systems–made famous in the Apollo 13 film: the CO2 scrubbers were
incompatible between modules and had to be adapted–careful rationing of
electricity and continuous drift from a landing flight-path kept Mission
Control busy creating and testing new flight checklists.
Cooper's real interest is the people involved in this story and their
interplay through the crisis. Astronauts rushed in to man simulators to test
flight controller theories about rocket firings, computer teams kept
telemetry gathering systems, flight projection calculators and the CMS/LMS
Integrator which "would insure that the instructions for the two modules
dovetailed–that there were no conflicts between them"6 humming. Cooper
is telling the story of a complex organization manning a damaged complex
system, with human lives at risk. Implicit in all of this are the machines
these people are using: tools being adapted to new situations and the
spacecraft being repurposed in ways never intended.
In a basic sense, the Apollo spacecraft was a couple of habitable tin cans,
some rockets and two computers to control said rockets. The computer was
'programmed' by calling up subroutines and feeding in input parameters, all
augmented by feedback from the pilot. Normal flight operations dictated the
call-up of subroutines and the parameters input, with a feedback loop
dictated by real-time telemetry from the craft and astronauts' expert
opinions. The Apollo computer could not demand nor decide, it was instructed.
To deal with this 'limitation' NASA was forced to invest in training of all
flight staff and ensure that the craft could be flexibly programmed by the
astronauts. This, of course, meant that the craft and crew were not not
rigidly locked into a fixed plan but could use their human understanding to
change course (literally, in this case) as reason dictated.
In documenting the catastrophic failure of Apollo 13, Cooper has likewise
documented the exquisite working of a complex organization in a position of
mastery over a complex system. These human-oriented complex systems are
arranged to take our instructions, to guide but not command. In a crisis,
this proves invaluable: we humans may apply our intelligence to the problem
at hand and use the machine as just another tool in the solution, keeping in
mind, of course, the limitations of the machine but never once struggling to
bend it to our informed wills. We may also choose to opt out of the tool's
use. Only Jim Lovell, commander of the Apollo 13 mission, intended to make
use of the LM's ability to automatically land itself. He never got the
chance, of course, but there's something telling in the notion that every
other astronaut who landed on the Moon–comfortable with and pleased by the
craft's theoretical abilities, all–would choose to go down manually.
As a society, we're building more and more purely automatic complex systems.
In the best case they take no input from humans and function in so far as the
system's engineers were able to imagine failures. In the worst case, they
demand input from humans but do so within the limited confines of the system
engineers' imagination, implicitly invalidating any and all expert opinion of
the human component. Such systems are brittle. Such systems are, indeed, not
maintainable in the long-term: the world changes and knowledge of their
operation is lost as none of the humans involved in the system ever truly
were responsible for understanding its mechanism.
What Cooper has done is craft an engaging story about a very nearly fatal six
day trip round the moon in a faulty craft. What he has also done is to give a
vision of the effective interplay between human and machine in a way which
enhances the overall capability of the people involved, extending their
strengths and making up for their weaknesses. This is the valuable
contribution of Cooper's book: a rough blueprint, seen through a particular
accident, for complex systems that must be tolerant of faults in the
fulfillment of the designer's aims. Machine-oriented systems are fine, maybe
even less onerous to run in the average case, but in failure scenarios
seriously bad things happen.
More on that, next review.
- Henry S. F. Cooper, Jr., Thirteen: The Apollo Flight That Failed (Dial Press,1972), 3.
- Cooper, Thirteen, 21.
- Cooper, Thirteen, 68.
- Cooper, Thirteen, 6.
- Cooper, Thirteen, 50.
- Cooper, Thirteen, 143-144.