The Explosive Power of Trial, Error and Self-Play


In 1997 an IBM computer called Deep Blue famously defeated the world chess champion, Russian Gary Kasparov.  It was seen as a seminal victory for a machine over the human brain.  Deep Blue had been programmed by chess experts who gave it all the tactics and strategies known in chess.  It was developed over years and played many games against people.  Each time it lost the experts would reprogram Deep Blue to overcome the weaknesses they found in its play.  Its great strength was that it could calculate millions of possible continuations for each position in the game.  It is said that Deep Blue could evaluate 200 millions positions a second.  Eventually, with the help of its programmers, it became better than the world champion.

The next big challenge was for a computer to beat a human champion at Go – a game with many more possibilities and subtleties than chess.  How could a computer be programmed with the intuition of a Go grandmaster?  For 20 years the computer could not win but then a man called Demis Hassabis took a very dfferent approach.

Hassabis was born in 1972 to a Greek Cypriot father and a Chinese Singaporean mother.  He grew up in London and became a chess prodigy as a child.  He captained the English junior chess team.  He went on to study computer science at Cambridge University.  After graduating he founded Elixir Studios, an independent games developer which launched several moderately successful games.  Hassabis then completed a doctorate in cognitive neuroscience at the University of London before founding an artifical intelligence company, DeepMind in 2010.  In 2014 Google bought DeepMind for $500m; it is Google’s biggest European acquisition.

Deepmind developed a leading AI program, Alphazero.  Instead of programming Alphazero with all the tricks, tecnhiques and nuances of the game of Go, the program started with nothing but the basic rules.  It then played itself millions of times and progressively learnt to become a better and better player.  In 2017 after little more than a day of continually playing against itself, Alphazero had mastered the game and could beat the world’s strongest human Go player.  Alpahzero was trained solely by self-play.

Deepmind’s AI programs have been used in many applications.  Given enough medical data Alphazero can figure out the most accurate diagnoses for illnesses, diseases and medical conditions.  Detecting cancer conditions from breast scans is difficult because the rules are not completely clear and many doctors use their intution.   If you give the AI program 100,00 scans which were found to have had cancer and 100,000 which were healthy then it teaches itself to become more accurate at diagnosis than any human practitioner.

A disturbing consequence of this approach is that although humans set up the system to teach itself they cannot then understand exactly how they work.  The workings of the AI program become hidden.

Children learn by being taught.  They also learn through play, through experience  and through trial and error.  AI systems can use self-play and millions of trials to accelerate learning and to become incredibly smart.

Share : Share on TwitterShare on FacebookShare on GooglePlusShare on Linkedin

2 thoughts on “The Explosive Power of Trial, Error and Self-Play

  1. Deep Blue calculated billions of failure routines and “learned from those mistakes.” Perhaps AI can serve as a proxy mistake-maker so that we might learn vicariously through those failures. In that way the number of breakthrough innovations would more than quadruple.

Leave a Reply

Your email address will not be published. Required fields are marked *