“Demonstrating the ACS system to the AT&T executives…and apparent success…”
I have been discussing ACS, the Bell System version of the Internet. The BBN ARPANet was too small and couldn’t scale up into a national data network. So the Bell System was building a truly massive data network called ACS (Advanced Communications System.) It was the late 1970s and data growth was killing the POTS network. Voice lines would be used 6 minutes an hour, while data lines would be used 60 minutes an hour. A giant 1ESS could handle 100,000 subscribers for voice, but only 5,000 subscribers who were heavy data users. It was a disaster that was starting to hit every major city.
So the largest private development project in the history of mankind–billions of dollars in today’s dollars–was underway at Bell Labs, funded by AT&T Long lines and championed by the Steve Jobs of the era, a man named Billy Oliver.
I was in the acceptance testing group. Officially I was there to do testing of the system. Unofficially, I was to report back to Billy Oliver on how the system was going. To be on the ACS project was one of the most prestigious applied-engineering projects you could hope for at Bell Laboratories.
The problem was that when the pieces of the ACS system were put together things didn’t go very well. Everything worked well individually, but as a system it had serious problems. It took hours for the system to boot up. But the system would crash about every 15 minutes. Often when it crashed the failed line cards wouldn’t even be able to report back that they had died, so they just sat there, out of service. The system over heated. It drew too much power. It was buggy. And, it lacked core functionality.
Back in the 1970s, there was little software engineering experience with massively parallel computer architectures. The mythical man month (http://en.wikipedia.org/wiki/The_Mythical_Man-Month) was relatively new and not widely known among engineering managers. So as the development in Bell Labs started to get into trouble and become late, massive numbers of people were added to the project. This decreased the average knowledge level of the average person on the project, and it dramatically increased the communications complexity of the organization. (Just like networks, two phones require one circuit, three phones require three circuits, and five phones require 15 circuits.) The ACS project had too many new people and serious engineering problems.
The day finally came when an executive presentation was a couple of weeks away. The execs wanted to see how this system that was costing hundreds of millions of dollars was coming. They wanted to see it work. It was perpetually 90 days from being completed, and they were tired of it.
Being in system test I had the opportunity to test the system at night. After the engineers had gone home it was all mine. And, what I learned was not good news. It really took a day to boot the system up, it would only remain in service for about 15 minutes. It lacked most of the functionality that was in the specification. (While the functionality was marked as completed and ready for testing, often it would not even exist in code.)
During the few moments that the system was up and running you could place a “call” from one line to another. The call took 30 seconds to establish when the system was idle and you were the only user on it. When you typed on one end, the characters didn’t come out the other end until about a 30 second delay. If you typed too quickly the system would crash.
Needless to say, the upcoming executive presentation wasn’t going to go well.
I reported back to my real employer, AT&T Long Lines, that ACS was a disaster. It could not be delivered on time. Most of the functionality was missing. The system was unstable. It took a day to boot up and stayed up for 15 minutes. It took 30 seconds to establish a connection, and 30 seconds to send characters from one terminal to the other sitting right next to the one you typed on.
My white paper report made it up the chain of command and eventually got to Billy Oliver with the “buck slip” note that my report was so inconsistent with every other Bell Labs progress report that it could not be taken as credible. The word came back to me that I was to quit writing such “opinionated” things and stick to engineering.
I sent the note back that the system did not work, and could never work. That message went up the AT&T chain of command and over to Bell Labs who then welcomed the executives to judge for themselves at the upcoming demonstration of the live system.
I continued to test the system and take notes. On the day of the executive demonstration I was asked to attend a meeting…in another building.
The demonstration happened before the eyes of a wary AT&T Long Lines senior management team. Tipped off that there might be some problems, the delegation was beefed up and attentive.
The demonstration of ACS went well. Calls were set up between various terminals. Connections were established and torn down, and response times were relatively good. All of the executives left relieved and the decision was made to start manufacturing hardware for the full-scale Chicago ACS central office.
The following day I was called into the office of my district manager’s office at AT&T Long Lines. I was asked to sit down. And, then my district manager was very quiet for a very long time. He sighed, and he hesitated. But then in his quiet way he told me how much I had embarrassed him and hurt my own credibility. The demonstration at Bell Labs had gone well. The AT&T executives saw it working with their own eyes. My “doomsday” report had cost his department credibility and probably had possibly cost me my career. I left that meeting having been scolded pretty soundly.
The next day I went back to Bell Labs as usual. There was a visible sigh of relief and nobody was very busy doing anything. They had worked really hard preparing for the executive demonstration and it was a decompression day.
After people went home, I decided to do some testing…
I restarted the system and the first thing I noticed was that only a dozen line cards were enabled. About 99% of the line cards were simply disconnected from the main buss.
I went to one of the test terminals and tried to set up a connection to the one next to it. Voila! A fast and responsive connection. It was a miracle!
As I was torture testing the connection I looked up…and saw that the line cards had crashed. The LED lights indicated that they were downloading. But my connection was still alive! It was still fast and responsive even though none of the line cards were online at the moment. That was odd.
So, I just powered the entire system down. My connection was still alive! Even with the ACS system powered down it was still working. Now, that was curious!
In my next post I discuss how the great Bell Laboratories managed to build a working ACS system in just two weeks!