How the Bell System Missed the Internet 4

by Colin Berkshire

“The Bell System Advanced Communications System Fails…”

So, development progressed on the AT&T ACS system. This was the system destined to become what the internet is today. It was to be a massive packet switching system able to serve tens of millions of users.

The project grew from 50 to 100 persons, and then to 200 persons. (Eventually more than 1,000 engineers would work on this doomed project. In the end, the project was so doomed that people would stroke any reference to it from their resumes.) It was about 100 people at the time that I joined and about 250 people when I left the project.

The 1B5 wing of Holmdel Bell Labs was bustling. A prototype central office was being built, mini-computers and line frames were being installed. Pre-production parts were being mass produced. Everybody was under the illusion that we were just 90 days from going live with ACS. And, I had free reign with the entire system after-hours. Because I was in acceptance-testing (the final sign-off) I could talk to any department, and review any part of the system. It was a lot of fun seeing the big picture.

I was armed with a very unusual testing tool for the time: An Apple II computer. As far as I know I had the only one at Bell Labs. People regarded it as a toy. But with it I could record connections, play back scripts, and time response times. All of this was written in Applesoft Basic…an unauthorized programming language. What was really good was that it gave me extra hands and allowed me to play with the timing of sending out characters. It ended up being the tool that brought the entire system down.

So the day arrived finally when we integrated all of the individual systems. Each of the separately tested parts come together into a fully working functional system. And, this is when it all started to unravel.

The way the Bell System planned to handle the traffic was through the use of micro-processors on line cards. 8 to 16 circuits could come into a single line card, powered by the equivalent of a Z-80 chip that was more powerful than my Apple II. These line cards would talk to each other across an interconnection of high speed time-space-time digital busses to other cards. It was all much like a traditional central office. This was, in principle, a very good approach because it eliminated switching traffic through a slow and costly mini-computer like the ARPANet IMP. It was those mini-computers that made packet switching so unaffordable.

Because Bell Labs new that software updates were going to be frequent, the line cards didn’t have their programs in ROM. They only had a boot-loader in ROM and the program was downloaded from a mini-computer (a DEC PDP 11/34). Now, this was an extraordinarily clever idea at the time. Most phone systems with smart line cards had their programs fixed in ROM memory. That booted up fast, but was inflexible. Even the Apple II had its main program fixed in ROM. ROM was just what you did back then because it was inexpensive and non-volatile. Boot times were nearly instantaneous. So, to dynamically bootload line cards was clever, innovative, and unorthodox.

It took a few minutes to load the programs into the line card memory. A PDP 11/34 could do a few at a time. But therein was the first problem. When 1,000 line cards all need to boot up at the same moment time it overwhelmed the 11/34. At only a minute each this is 1,000 minutes. Even if the 11/34 can download 16 of them at a time, it is still hours for the system to start up! But, the boot process would fail about 50% of the time. When too many cards needed booting then the PDP 11/34 itself would overload and crash. This was the beginning of the end.

The line cards turned out to be unstable and when they had a bug they would crash. The result was that 8~16 customers would be taken out of service while the reboot happened. Worse, the line cards were crashing on average every 15 minutes. The result was that the entire system never could come up and get into service. The crashing was a series of hardware and software flaws. The linecard used a new Western Electric processor chip, new WeCo memory, and the specification for the main data buss was floating regularly because there was a desire to mirror the design of the 5ESS which was in parallel development in a far away laboratory. A disaster was looming.

The schedule kept slipping, we were perpetually 90 days away from going live, and the project had mushroomed to more than 250 people and was growing daily.

Finally, the word came down that there would be an executive presentation of the system. It was to be working and demonstrated to top-level AT&T executives. There was panic. The system couldn’t boot up, it couldn’t work, it crashed frequently, and everybody knew it didn’t work. Nobody would tell the executives the truth. There was too much riding on the line. (It was eerily like in 2006 when the US Banks who were too big to fail had billions of dollars thrown at them to keep them alive. This ACS project was too big to fail and money and people were just thrown at the project..)

Quietly and privately, the decision was made within Bell Labs to buy more time and to fake the executive presentation. Yes: fraud.

In the next post we will watch a bureaucracy step into action and try to save itself. But all ends up not being what it appears to be. Remember: This project is doomed in the end…

Part 1