MACHINES ARE NOT REALLY LEARNING
Billl Gates? Talking about AI again?
Let’s recall a back-to-the-future comment he made to college students in 2004:
“If you invent a breakthrough in artificial intelligence, so machines can
learn, that is worth 10 Microsofts.” At the financial website The Motley
Fool, Rex Moore exhumed Gates’
fifteen-year-old quote, newsworthy because it trumpets the already endlessly
broadcast trope about the success of machine learning on the internet. Says
Moore, “Fast-forward to today, and of course someone has figured it out. This
special kind of artificial intelligence is called machine learning.”
Exciting. Except for one thing.
Machine learning has been around since AI’s inception in the 1950s. Alan Turing, who essentially
inaugurated “AI” (though he didn’t coin the phrase) in his much-read 1950
paper “Computing
Machinery and Intelligence,” speculated that computer programs might
improve their performance (output) by crunching more and more data (input). He
called them “unorganized machines,” a not-so-subtle admission that organized
machines—that is, computer programs—do only what they’re programmed to do.
At issue was his now well-known
conversational test of intelligence, the eponymous Turing
test . How to pass it? Said Turing: Figure out how to program
computers to learn from experience, like people. They had to be a bit more
disorganized, like British geniuses. In 1943, earlier even than Turing,
neuroscientist Warren McColloch and logician Walter Pitts proposed simple
input-output models of neurons. In the next decade, Frank Rosenblatt at Cornell
generalized the model to adjust over time, to “learn.” The model entities were
called “perceptrons.”
Researchers likened them to neurons,
which they weren’t, though they did mimic human neurons’ behavior in a simple
sense. Multiple incoming signals, meeting a threshold, resulted in a discrete
output. They “fired” after excitation, like synapses more or less, like brain
cells. Only, they didn’t reproduce the fantastic complexity of real neurons. And
they didn’t really do much computation, either. Call them inspiring.
Still, perceptrons were
the world’s first neural networks so they take their place in the annals of
computer science. Simple networks with one layer, they could be made to perform
simple logical operations by outputting discrete values after meeting a
mathematical threshold set for their input. Cool.
Not really. The late MIT AI pioneer
later pointed out, in a devastating critique, how powerless the little
perceptrons were for addressing any interesting problems in AI. So-called
connectionist (the old term for machine learning) approaches to AI were
effectively abandoned in favor of logical and rule-based approaches (which also
failed, but later). Turing worried that machines would be too rigid. Too
organized. But the early machine learning attempts were hopeless.
Fast forward not to 2019 but to
1986. Machine learning research received a major shot of steroids. Feedback
loops were introduced into artificial neural networks. No more perceptrons.
Backpropagation meant that the input to a neural network propagates through
new, hidden layers in more complicated networks. The input adjusts the learning
weights and “propagates back” through the network until it settles on an
optimal value (actually, a local optimal value, but this is TMI).
Backpropagation-powered Artificial Neural Networks (ANNs) could learn, in a
sense.
American (‘merica!) psychologists
showed how the new systems could simulate one aspect of a toddler’s education
by, say, forming the past tenses of English verbs, like start, walk, or run.
Simply adding “ed” won’t cut it (unless you’re fine with “runned” or “goed”),
so the system had to ferret out the irregular endings and then perform well
enough on the task to convince researchers that it converged on the rule, which
it more or less did. Machine learning was resuscitated, brought back to life,
released from detention. Only, as promising as backpropagation seemed, the
modern era of machine learning hadn’t arrived by the late 1980s.
It arrived shortly after Bill Gates’
comments to college kids in 2004. No coincidence, it showed up when the web
took off, on the heels of terabytes of user-generated content from so-called
web 2.0 companies like Google and Facebook. “Big data” entered the lexicon as
volumes of unstructured (i.e., text and image) data-powered machine learning
approaches to AI. AI itself was just then experiencing another of its notorious
winters after the 2001 NASDAQ crash known as the “dot-com bubble.” Or just as
the debacle. No one trusted the web after 2001 (almost no one), because it had
failed to deliver on promises, tanked on Wall Street, and was still stuck in
what we now snigger and sniff at, the “web 1.0” ideas that launched web sites
for companies and tried to sell advertising—we are so different now!
Turns out, anyway, that, with loads
of data, machine learning can perform admirably on well-defined tasks like spam
detection and somewhat admirably on other tasks like language translation
(Google Translate is still worse than you might assume). It can also
personalize content in newsfeeds, or recommendations on Netflix, Spotify, and
Amazon and the like. Image recognition is getting better. Cars, especially in
Silicon Valley, can even sort of drive themselves. All thanks to big data and
machine learning.
The learning algorithms in vogue
today are still ANNs. They now are convoluted, in at least a wordplay nod to
Turing’s disorganization. But they are organized; typically, layers are stacked
upon other layers, no matter. Fast forward, along with our Motley Fool reporter
to 2019, and we have arrived at Deep Learning. We can trace a line from the
lowly perceptron (poor thing) to backpropagation to Deep Learning. This is
progress, no doubt, but probably not in the sense intended by Rex Moore’s bold
font.
Take a deep breath, look around at
the best AI around. Chances are it’s Deep Learning, true. Chances are, too,
that it does some one thing. It recognizes human faces, say, but not fish or
bicycles. It personalizes your newsfeeds for you, say, but you keep wondering
why you’re seeing all the same stuff (or maybe not, if you’re online too much).
It drives your car, if you’re rich and friends with Elon Musk anyway, but you
deliberately avoid pondering what would happen if the system goes into
collision avoidance mode when a deer walks onto the road, followed by a much
smaller toddler on a tricycle.
What gives? Letdown alert. Spoiler
coming. Deep Learning could be called Narrow Learning if the redundancy didn’t
discourage it because machine learning is always narrow. It is always what
pundits wax dismissively at as Narrow AI. For that matter, “learning” for a
machine isn’t learning for you or me. We (what’s-the-word?) forget, for
instance, which should make one suspicious that machines that learn and can’t
forget are ever really learning. Also, we tend to learn by incorporating
information into a deeper web of knowledge that sits front and center in the
mystery of mind. Knowledge is different from information because it gets linked
together into a general picture of the world.
General. Not narrow. Get off the
computer. Go outside (I’ll join you). Go talk to a neighbor or a friend, or pop
over to the coffee shop and chat up the barista. You’ve just done something
that Deep Learning can’t do. Worse, it can’t even learn to do it, because
that’s not a narrow, well-defined problem to solve. (Now pat yourself on the
back. You’re amazing!)
Bill Gates could have been
channeling Deep Learning, anticipating it just up ahead, when he spoke in 2004.
More likely, he was talking about machines that can really learn. That would be
worth 10 Microsofts (time makes fools of us all. He should have said
“Googles”). But we don’t have learning systems that really learn. Wait: check
that. We have seven billion of them, give or take. Our question is not how to
make more out of computer code, but how to make computer code do something
more, for us.