Chris Lintott first met Kevin
Schawinski in the summer of 2007 at the astrophysics department of the
University of Oxford. Lintott had just finished a PhD at the University College
of London on star formation in galaxies. He was also something of a minor
celebrity in the astronomy community: he was one of the presenters of the BBC's
astronomy programme The Sky at Night alongside Sir Patrick Moore, and had
written a popular science book
called Bang!: The Complete History of the Universe with Moore and Brian May,
the Queen guitarist and astrophysicist. "I went to give a seminar talk as
part of a job interview," Lintott recalls. "And this guy in a suit
jumped up and started having a go at me because I hadn't checked my galaxy data
properly. I thought it was some lecturer who I'd pissed off, but it turned out
to be Kevin [Schawinski], who was a student at the time."
Most galaxies come in two
shapes: elliptical or spiral. Elliptical galaxies can have a range of shapes,
from perfectly spherical to a flattened rugby-ball shape. Spirals, like the
Milky Way, have a central bulge of stars surrounded by a thin disk of stars shaped
in a spiral pattern known as "arms". The shape of a galaxy is an
imprint of its history and how it has interacted with other galaxies over
billions of years of evolution. It is a mystery to astronomers why they have
these shapes and how the two geometries related to one another. For a long
time, astronomers assumed that spirals were young galaxies, with an abundance
of stellar nurseries, where new stars were being formed. These regions
typically emitted hot, blue radiation. Elliptical galaxies, on the other hand,
were thought to be predominantly old, replete of dying stars, which are colder,
and therefore have a red colour. Schawinski was working on a theory which
contradicted this paradigm. To prove it, he needed to find elliptical galaxies
with blue regions, where starformation was taking place.
At the time, astronomers relied
on computer algorithms to filter datasets of images of galaxies. The biggest
bank of such images came from the Sloan Digital Sky Survey,
which contained more than two million astronomical objects, nearly a million of
which were galaxies, and had been taken by an automated robotic telescope in
New Mexico with a two-metre mirror. The problem was that computers can easily
filter galaxies based on their colour, however it was impossible for an
algorithm to pick up galaxies based on their shape. "It's really hard to
teach a computer a pattern-recognition task like this," says Schawinski,
currently a professor in astronomy at the Swiss Federal Institute of Technology
in Zurich. "It took computer scientists a decade to [teach a computer] to
tell human faces apart, something every child can do the moment they open their
eyes." The only way to prove this theory, Schawinski decided, was to look
at each galaxy image, one by one.
Schawinski did it for a week,
working 12 hours every day. He would go to his office in the morning, click
through images of galaxies while listening to music, break for lunch, and
continue until late in the evening. "When I attended Chris's seminar, I
had just spent a week looking through fifty thousand galaxies," says
Schawinski.
When Lintott moved to Oxford,
he and Schawinski started debating the problem of how to classify datasets with
millions of images. They weren't the only ones. "Kate Land, one of my
colleagues, was intrigued about a recent paper which claimed most galaxies were
rotating around a common axis," Lintott says. "Which is indeed
puzzling because the expectation was that these axes would be totally
random." Land needed more data, which required looking at the rotation of
tens of thousands of galaxies. "Out of the blue she asked me if I thought
that, if they put a laptop with galaxy images in the middle of a pub, would
people classify them?" Lintott recalls.
At the time, Nasa had launched
a project called Stardust@home, which had recruited about 20,000 online
volunteers to identify tracks made by interstellar dust in samples from a
comet. "We thought that if people are going to look at dust tracks, then
surely they'll look at galaxies," says Lintott. Once it was decided they
would go ahead with the project, they built a website within days. The homepage
displayed the image of a galaxy from the dataset. For each image, the
volunteers were asked if the galaxy was a spiral or elliptical. If a spiral, they
were asked if they could discern the direction of its arms and the direction of
its rotation. There were also options for stars, unknown objects and
overlapping galaxies.
Charlie
Surbey and Liam Sharp
The site, called Galaxy Zoo,
launched on July 11, 2007. "We thought we would get at least some amateur
astronomers," Lintott says. "I was planning to go to the British
Astronomical Society, give a talk and get at least 50 of their members to
classify some galaxies for us." Within 24 hours of its launch, Galaxy Zoo
was receiving 60,000 classifications per hour. "The cable we were using
melted and we were offline for a while," Schawinski says. "The
project nearly died there." After ten days, users from all over the world
had submitted eight million classifications. By November, every galaxy had been
seen by an average of 40 people. Galaxy Zoo users weren't just classifying
galactic shapes, they were making unexpected discoveries. Barely a month after
launch, Dutch schoolteacher Hanny van Arkel discovered a strange green cluster
that turned out to be a never-before-seen astronomical object. Christened
Hanny's Voorwerp ("voorwerp" means "object" in Dutch), it
remains the subject of intense scientific scrutiny. Later that year, a team of volunteers
compiled evidence for a new type of galaxy -- blue and compact -- which they
named Pea galaxies.
"When we did a survey of
our volunteers we found out they weren't astronomers," Lintott says.
"They weren't even huge science fans and weren't that interested in making
new discoveries. The majority said they just wanted to make a
contribution." With Galaxy Zoo, Schawinski and Lintott developed a
powerful pattern-recognition machine, composed entirely of people who could not
only process data incredibly quickly and accurately -- aggregating the results
via a democratic statistical process -- but also enable individual
serendipitous discoveries, a fundamental component of scientific enquiry. With
robotic telescopes spewing terabytes of images every year, they found an answer
to big data in a big crowd of volunteers. Since Galaxy Zoo's first discoveries,
this pioneering approach of crowdsourcing science has gained a strong following
not only with the general public but also within the scientific community.
Today, there are hundreds of crowdsourcing projects involving a variety of
scientific goals, from identifying cancer cells in biological tissues to
building nanoscale machines using DNA. These endeavours have resulted in
breakthroughs, such as Schawinski and Lintott's discoveries on the subject of
star formation, that have merited publication in the most reputed scientific
journals. The biggest breakthrough, however, is not the scientific discoveries
per se, but the method itself. Crowdsourcing science is a reinvention of the
scientific method, a powerful new way of making discoveries and solving
problems that could have otherwise remain undiscovered and unsolved.
At around the time Lintott and
his team were developing Galaxy Zoo, two computer scientists at the University
of Washington in Seattle, Seth Cooper and Adrien Treuille, were trying to use
online crowds to solve a problem in biochemistry called protein folding.
A protein is a chain of smaller
molecules called amino acids. Its three-dimensional shape determines how it
interacts with other proteins and, consequently, its function in the cell.
Proteins only have one possible structure, and finding that structure is a
notoriously difficult problem: for a given chain of amino acids, there are
millions of ways in which it can be folded into a three-dimensional shape.
Biochemists know thousands of sequences of amino acids but struggle to find how
they fold into the three-dimensional structures that are found in nature.
Cooper and Treuille's lab had
previously developed an algorithm which attempted to predict these structures.
The algorithm, named Rosetta, required a lot of computer power, so it was
adapted to run as a screensaver that online volunteers could install. The
screensaver, called Rosetta@home, required no input from volunteers, so Cooper
and Treuille had been brought in to turn it into a game. "With the
screensaver, users could see the protein and how the computer was trying to
fold it, but they couldn't interact with it," Cooper says. "We wanted
to combine that computer power with human problem-solving."
Cooper and Treuille were the
only computer scientists in their lab. They also had no idea about protein folding.
"In some sense, we were forced to look at this very esoteric and abstract
problem through the eyes of a child," Cooper says. "Biochemists often
tell you that a protein looks right or wrong. It seemed that with enough
training you can gain an intuition about how a protein folds. There are certain
configurations that a computer never samples, but a person can just look at it
and say, 'that's it'. That was the seed of the idea."
The game, called Foldit, was
released in May 2008. Players start with a partially-folded protein structure,
which has been arrived at by the Rosetta algorithm, and have to manipulate its
structure by clicking, pulling and dragging amino acids until they've arrived
at its most stable shape. The algorithm calculates how stable the structure is;
the more stable, the higher the score.
"When we first trialled
the game with the biochemists, they weren't particularly excited," Cooper
says. "But then we added a leaderboard, where you could see each other's
names and respective scores. After that, we had to shut down the game for a
while because it was bringing all science to a halt."
Foldit turned the goal of
solving one of biochemistry's hardest problems into a game that can be won by
scoring points. Over the past five years, over 350,000 people have played
Foldit; these players have been able to consistently fold proteins better than
the best algorithms. "Most of these players didn't have a background in
biochemistry and they were beating some of the biochemists who were playing the
game," Cooper says. "They also discovered an algorithm similar to one
that the scientists had been developing. It was more efficient that any
previously published algorithms."
No comments:
Post a Comment