The IT Dream Team that Fixed HealthCare.gov...and the #1 Lesson Learned
Steven Brill has an excellent cover story in this week’s Time magazine on the makeshift Silicon Valley team of IT experts who, metaphorically, parachuted into Washington and fixed the Obamacare website. Among its rather shocking revelations about Healthcare.gov: a White House director of communications told a reporter that “everything has been tested and is working perfectly” one month before the site was launched; the launch plan included no “start small” trials; and the system had no real-time dashboards monitoring performance. The site went down entirely twice—once for 40 hours, and once for 37—in the first month. It had, according to Brill, an “astonishingly high” click-through error rate, and almost no meaningful ability to scale.
The good news is that the site was salvageable—at least, when the team doing the salvaging is an elite Silicon Valley group of tech wizards. Led by Google’s chief site reliability engineer, Mickey Dickerson, this ad hoc team worked days, nights and weekends, and in a little over two months turned a failed website into a fast service able to manage up to 100,000 simultaneous users (with almost no click-through errors). Theirs is a heroic high tech tale.
In my home state of Oregon, we have had own version of HealthCare.gov...only worse. Cover Oregon is the state program that was to interface with the national Obamacare system, using a home-built website. Despite spending over $200 million on its website, Cover Oregon has had no online presence since the program’s launch in October. It now appears likely that the original failed site cannot be fixed, and that the State of Oregon will have to start over. As with Healthcare.gov, the leader of Cover Oregon’s website project also told the media one month before launch that he expected, in effect, a glorious and completely successful launch.
One thing these two failed healthcare software development projects had in common was their failure to follow one of the first rules of Internet systems: think big, but start small.
Both projects (apparently) followed the old waterfall software development method: build your system over one long stretch, then drop it over the edge, to millions of users. And hope it works.
The agile development model, of course, takes a decidedly different approach—with short, highly focused “sprints,” followed by testing of each module immediately after it is built.
But what’s required for government projects to begin getting better results for our IT bucks is more than a move to agile methodology. Government agencies, at every level, should also try to leverage existing services (especially out of the cloud) that already work. And, even then, they should start with small pilots with modest numbers of users; and plan for a feedback and optimization phase that fine tunes the service, so that it fits program needs as closely as possible. It should then scale quickly, but carefully, with the absolute expectation that there will be hiccups along the way. But the goal is hiccups only, no massive failures.
There are many other lessons to be learned from the failures Healthcare.gov and Cover Oregon. But for my money, think big but start small is the most important.