Monday, November 26, 2007

History's Most (In)Famous Software Failures


1. July 28, 1962 - Mariner I space probe

A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory.



2. 1982 - Soviet gas pipeline.
Operatives working for the Central Intelligence Agency allegedly (.pdf) plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline. The Soviets had obtained the system as part of a wide-ranging effort to covertly purchase or steal sensitive U.S. technology. The CIA reportedly found out about the program and decided to make it backfire with equipment that would pass Soviet inspection and then fail once in operation. The resulting event is reportedly the largest non-nuclear explosion in the planet's history.


3. Therac-25 (1985-1987)
Six people were overexposed during radiation treatments for cancer by Canada's Therac-25 radiation therapy machine. Three of these patients were believed to have died from the overdoses. The root cause was a lack of quality assurance, which lead to an over-complex, inadequately tested, under-documented system developed, and subsequently to the failure to take adequate corrective action. (Pooley & Stevens, 1999)

4. 1988 - Buffer overflow in Berkeley Unix finger daemon
The first internet worm (the so-called Morris Worm) infects between 2,000 and 6,000 computers in less than a day by taking advantage of a buffer overflow. The specific code is a function in the standard input/output library routine called gets() designed to get a line of text over the network. Unfortunately, gets() has no provision to limit its input, and an overly large input allows the worm to take over any machine to which it can connect.
Programmers respond by attempting to stamp out the gets() function in working code, but they refuse to remove it from the C programming language's standard input/output library, where it remains to this day.

5. 1988-1996 - Kerberos Random Number Generator
The authors of the Kerberos security system neglect to properly "seed" the program's random number generator with a truly random seed. As a result, for eight years it is possible to trivially break into any computer that relies on Kerberos for authentication. It is unknown if this bug was ever actually exploited.

6. January 15, 1990 - AT&T Network Outage
A bug in a new release of the software that controls AT&T's #4ESS long distance switches causes these mammoth computers to crash when they receive a specific message from one of their neighboring machines -- a message that the neighbors send out when they recover from a crash.
One day a switch in New York crashes and reboots, causing its neighboring switches to crash, then their neighbors' neighbors, and so on. Soon, 114 switches are crashing and rebooting every six seconds, leaving an estimated 60 thousand people without long distance service for nine hours. The fix: engineers load the previous software release.

7. 1993 - Intel Pentium floating point divide
A silicon error causes Intel's highly promoted Pentium chip to make mistakes when dividing floating-point numbers that occur within a specific range. For example, dividing 4195835.0/3145727.0 yields 1.33374 instead of 1.33382, an error of 0.006 percent. Although the bug affects few users, it becomes a public relations nightmare. With an estimated 3 million to 5 million defective chips in circulation, at first Intel only offers to replace Pentium chips for consumers who can prove that they need high accuracy; eventually the company relents and agrees to replace the chips for anyone who complains. The bug ultimately costs Intel $475 million.

8. London Ambulance System (1992)
A succession of software engineering failures, especially in project management, caused 2 failures of London's (England) Ambulance dispatch system. The repair cost was estimated at £9m, but it is believed that people died who would not have died if ambulances had reached them as promptly as they would have done without the failures.

9. Denver baggage handling system
The Denver airport baggage handling system was so complex (involving 300 computers) that the development overrun prevented the airport from opening on time. Fixing the incredibly buggy system required an additional 50% of the original budget - nearly $200m.

10. Taurus (1993)
Taurus, the planned automated transaction settlement system for the London Stock Exchange was canceled after 5 years of failed development. Losses are estimated at £75m for the project and £450m to customers. (Pooley & Stevens, 1999)

11. 1995/1996 -- The Ping of Death
A lack of sanity checks and error handling in the IP fragmentation reassembly code makes it possible to crash a wide variety of operating systems by sending a malformed "ping" packet from anywhere on the internet. Most obviously affected are computers running Windows, which lock up and display the so-called "blue screen of death" when they receive these packets. But the attack also affects many Macintosh and Unix systems as well.


12. Ariane 5 (1996)
The Ariane 5 rocket exploded on its maiden flight in June [4], 1996 because the navigation package was inherited from the Ariane 4 without proper testing.


video

The new rocket flew faster, resulting in larger values of some variables in the navigation software. Shortly after launch, an attempt to convert a 64-bit floating-point number into a 16-bit integer generated an overflow. The error was caught, but the code that caught it elected to shut down the subsystem. The rocket veered off course and exploded. It was unfortunate that the code that failed generated inertial reference information useful only before lift-off; had it been turned off at the moment of launch, there would have been no trouble. (Kernighan, 1999)

13. E-mail buffer overflow (1998)
Several E-mail systems suffer from a "buffer overflow error", when extremely long e-mail addresses are received. The internal buffers receiving the addresses do not check for length and allow their buffers to overflow causing the applications to crash. Hostile hackers use this fault to trick the computer into running a malicious program in its place.

14. USS Yorktown (1998)
A crew member of the guided-missile cruiser USS Yorktown mistakenly entered a zero for a data value, which resulted in a division by zero). The error cascaded and eventually shut down the ship's propulsion system. The ship was dead in the water for several hours because a program didn't check for valid input. (reported in Scientific American, November 1998)



15. Mars Climate Orbiter (September 23rd, 1999)

The 125 million dollar Mars Climate Orbiter is assumed lost by officials at NASA. The failure responsible for loss of the orbiter is attributed to a failure of NASA’s system engineer process. The process did not specify the system of measurement to be used on the project. As a result, one of the development teams used Imperial measurement while the other used the metric system of measurement. When parameters from one module were passed to another during orbit navigation correct, no conversion was performed, resulting in the loss of the craft. http://mars.jpl.nasa.gov/msp98/orbiter/


16. November 2000 - National Cancer Institute, Panama City
In a series of accidents, therapy planning software created by Multidata Systems International, a U.S. firm, miscalculates the proper dosage of radiation for patients undergoing radiation therapy.
Multidata's software allows a radiation therapist to draw on a computer screen the placement of metal shields called "blocks" designed to protect healthy tissue from the radiation. But the software will only allow technicians to use four shielding blocks, and the Panamanian doctors wish to use five.
The doctors discover that they can trick the software by drawing all five blocks as a single large block with a hole in the middle. What the doctors don't realize is that the Multidata software gives different answers in this configuration depending on how the hole is drawn: draw it in one direction and the correct dose is calculated, draw in another direction and the software recommends twice the necessary exposure.
At least eight patients die, while another 20 receive overdoses likely to cause significant health problems. The physicians, who were legally required to double-check the computer's calculations by hand, are indicted for murder.

16 comments:

bug-thrasher said...

Good man... add more in this blog...

software application maintenance said...

Great information !!!. :)

diecast cars said...

was very useful to receive information as current as interesting, thank you very much for sharing this with us!

vertigo symptoms said...

History? I seem sensational to easily obtain valuable information like this! thanks for sharing!

vupper eyelid said...

would like to visit somewhere else where you find more information about this topic, someone can help me?

free casino said...

Excellent to see how the software can fail and create disasters of such magnitude, thanks for the information is very good

cheap designed purses said...

Thank you for your post, I look for such article along time,today i find it finally.this post give me lots of advise it is very useful for me .i will pay more attention to you ,i hope you can go on posting more such post, i will support you all the time. May be you are Sports fans? Do you like
Designer Handbags
these have the high quality,low price,professional service,Just For You if you like.Thank you again for writing this article!

Anonymous said...

This is great!! My Site

Vee Eee Technologies said...

Excellent pieces. Keep posting such kind of information on your blog. I really impressed by your blog.
Vee Eee Technologies

jimi said...

C2Logix offers routing software to arrange, maintain and manage your most effective routes. the positioning has targeted technology to unravel your specific desires.
look here

Anonymous said...

awwso aaaaaaaaaaaaaarrrrrrrrrrrooooooooooooooonnnnnnnnnnnnnnnn!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

will dadd said...

sharnbrook suckes/rules
make your pick?

dan brown said...

i like men

Roger Haycllar said...

thumbs up for Korea

Risk Based Testing said...

Great post. very interesting and much impressive.

Risk Based Testing

Wizard Infoways said...

WIPL is a global leader in providing software solutions and it is one among the best web development company in India.

For more info : IT Companies in India