Chris Lintott first met Kevin Schawinski in the summer of 2007 at the astrophysics department of the University of Oxford. Lintott had just finished a PhD at the University College of London on star formation in galaxies. He was also something of a minor celebrity in the astronomy community: he was one of the presenters of the BBC's astronomy programme The Sky at Night alongside Sir Patrick Moore, and had written a popular science book called Bang!: The Complete History of the Universe with Moore and Brian May, the Queen guitarist and astrophysicist. "I went to give a seminar talk as part of a job interview," Lintott recalls. "And this guy in a suit jumped up and started having a go at me because I hadn't checked my galaxy data properly. I thought it was some lecturer who I'd pissed off, but it turned out to be Kevin [Schawinski], who was a student at the time."
Most galaxies come in two shapes: elliptical or spiral. Elliptical galaxies can have a range of shapes, from perfectly spherical to a flattened rugby-ball shape. Spirals, like the Milky Way, have a central bulge of stars surrounded by a thin disk of stars shaped in a spiral pattern known as "arms". The shape of a galaxy is an imprint of its history and how it has interacted with other galaxies over billions of years of evolution. It is a mystery to astronomers why they have these shapes and how the two geometries related to one another. For a long time, astronomers assumed that spirals were young galaxies, with an abundance of stellar nurseries, where new stars were being formed. These regions typically emitted hot, blue radiation. Elliptical galaxies, on the other hand, were thought to be predominantly old, replete of dying stars, which are colder, and therefore have a red colour. Schawinski was working on a theory which contradicted this paradigm. To prove it, he needed to find elliptical galaxies with blue regions, where starformation was taking place.
At the time, astronomers relied on computer algorithms to filter datasets of images of galaxies. The biggest bank of such images came from the Sloan Digital Sky Survey, which contained more than two million astronomical objects, nearly a million of which were galaxies, and had been taken by an automated robotic telescope in New Mexico with a two-metre mirror. The problem was that computers can easily filter galaxies based on their colour, however it was impossible for an algorithm to pick up galaxies based on their shape. "It's really hard to teach a computer a pattern-recognition task like this," says Schawinski, currently a professor in astronomy at the Swiss Federal Institute of Technology in Zurich. "It took computer scientists a decade to [teach a computer] to tell human faces apart, something every child can do the moment they open their eyes." The only way to prove this theory, Schawinski decided, was to look at each galaxy image, one by one.
Schawinski did it for a week, working 12 hours every day. He would go to his office in the morning, click through images of galaxies while listening to music, break for lunch, and continue until late in the evening. "When I attended Chris's seminar, I had just spent a week looking through fifty thousand galaxies," says Schawinski.
When Lintott moved to Oxford, he and Schawinski started debating the problem of how to classify datasets with millions of images. They weren't the only ones. "Kate Land, one of my colleagues, was intrigued about a recent paper which claimed most galaxies were rotating around a common axis," Lintott says. "Which is indeed puzzling because the expectation was that these axes would be totally random." Land needed more data, which required looking at the rotation of tens of thousands of galaxies. "Out of the blue she asked me if I thought that, if they put a laptop with galaxy images in the middle of a pub, would people classify them?" Lintott recalls.
At the time, Nasa had launched a project called Stardust@home, which had recruited about 20,000 online volunteers to identify tracks made by interstellar dust in samples from a comet. "We thought that if people are going to look at dust tracks, then surely they'll look at galaxies," says Lintott. Once it was decided they would go ahead with the project, they built a website within days. The homepage displayed the image of a galaxy from the dataset. For each image, the volunteers were asked if the galaxy was a spiral or elliptical. If a spiral, they were asked if they could discern the direction of its arms and the direction of its rotation. There were also options for stars, unknown objects and overlapping galaxies.
Charlie Surbey and Liam Sharp
The site, called Galaxy Zoo, launched on July 11, 2007. "We thought we would get at least some amateur astronomers," Lintott says. "I was planning to go to the British Astronomical Society, give a talk and get at least 50 of their members to classify some galaxies for us." Within 24 hours of its launch, Galaxy Zoo was receiving 60,000 classifications per hour. "The cable we were using melted and we were offline for a while," Schawinski says. "The project nearly died there." After ten days, users from all over the world had submitted eight million classifications. By November, every galaxy had been seen by an average of 40 people. Galaxy Zoo users weren't just classifying galactic shapes, they were making unexpected discoveries. Barely a month after launch, Dutch schoolteacher Hanny van Arkel discovered a strange green cluster that turned out to be a never-before-seen astronomical object. Christened Hanny's Voorwerp ("voorwerp" means "object" in Dutch), it remains the subject of intense scientific scrutiny. Later that year, a team of volunteers compiled evidence for a new type of galaxy -- blue and compact -- which they named Pea galaxies.
"When we did a survey of our volunteers we found out they weren't astronomers," Lintott says. "They weren't even huge science fans and weren't that interested in making new discoveries. The majority said they just wanted to make a contribution." With Galaxy Zoo, Schawinski and Lintott developed a powerful pattern-recognition machine, composed entirely of people who could not only process data incredibly quickly and accurately -- aggregating the results via a democratic statistical process -- but also enable individual serendipitous discoveries, a fundamental component of scientific enquiry. With robotic telescopes spewing terabytes of images every year, they found an answer to big data in a big crowd of volunteers. Since Galaxy Zoo's first discoveries, this pioneering approach of crowdsourcing science has gained a strong following not only with the general public but also within the scientific community. Today, there are hundreds of crowdsourcing projects involving a variety of scientific goals, from identifying cancer cells in biological tissues to building nanoscale machines using DNA. These endeavours have resulted in breakthroughs, such as Schawinski and Lintott's discoveries on the subject of star formation, that have merited publication in the most reputed scientific journals. The biggest breakthrough, however, is not the scientific discoveries per se, but the method itself. Crowdsourcing science is a reinvention of the scientific method, a powerful new way of making discoveries and solving problems that could have otherwise remain undiscovered and unsolved.
At around the time Lintott and his team were developing Galaxy Zoo, two computer scientists at the University of Washington in Seattle, Seth Cooper and Adrien Treuille, were trying to use online crowds to solve a problem in biochemistry called protein folding.
A protein is a chain of smaller molecules called amino acids. Its three-dimensional shape determines how it interacts with other proteins and, consequently, its function in the cell. Proteins only have one possible structure, and finding that structure is a notoriously difficult problem: for a given chain of amino acids, there are millions of ways in which it can be folded into a three-dimensional shape. Biochemists know thousands of sequences of amino acids but struggle to find how they fold into the three-dimensional structures that are found in nature.
Cooper and Treuille's lab had previously developed an algorithm which attempted to predict these structures. The algorithm, named Rosetta, required a lot of computer power, so it was adapted to run as a screensaver that online volunteers could install. The screensaver, called Rosetta@home, required no input from volunteers, so Cooper and Treuille had been brought in to turn it into a game. "With the screensaver, users could see the protein and how the computer was trying to fold it, but they couldn't interact with it," Cooper says. "We wanted to combine that computer power with human problem-solving."
Cooper and Treuille were the only computer scientists in their lab. They also had no idea about protein folding. "In some sense, we were forced to look at this very esoteric and abstract problem through the eyes of a child," Cooper says. "Biochemists often tell you that a protein looks right or wrong. It seemed that with enough training you can gain an intuition about how a protein folds. There are certain configurations that a computer never samples, but a person can just look at it and say, 'that's it'. That was the seed of the idea."
The game, called Foldit, was released in May 2008. Players start with a partially-folded protein structure, which has been arrived at by the Rosetta algorithm, and have to manipulate its structure by clicking, pulling and dragging amino acids until they've arrived at its most stable shape. The algorithm calculates how stable the structure is; the more stable, the higher the score.
"When we first trialled the game with the biochemists, they weren't particularly excited," Cooper says. "But then we added a leaderboard, where you could see each other's names and respective scores. After that, we had to shut down the game for a while because it was bringing all science to a halt."
Foldit turned the goal of solving one of biochemistry's hardest problems into a game that can be won by scoring points. Over the past five years, over 350,000 people have played Foldit; these players have been able to consistently fold proteins better than the best algorithms. "Most of these players didn't have a background in biochemistry and they were beating some of the biochemists who were playing the game," Cooper says. "They also discovered an algorithm similar to one that the scientists had been developing. It was more efficient that any previously published algorithms."