Tuesday, September 3, 2019

MACHINES ARE NOT REALLY LEARNING


MACHINES ARE NOT REALLY LEARNING


Billl Gates? Talking about AI again? Let’s recall a back-to-the-future comment he made to college students in 2004: “If you invent a breakthrough in artificial intelligence, so machines can learn, that is worth 10 Microsofts.” At the financial website The Motley Fool, Rex Moore exhumed Gates’ fifteen-year-old quote, newsworthy because it trumpets the already endlessly broadcast trope about the success of machine learning on the internet. Says Moore, “Fast-forward to today, and of course someone has figured it out. This special kind of artificial intelligence is called machine learning.”

Exciting. Except for one thing. Machine learning has been around since AI’s inception in the 1950s. Alan Turing, who essentially inaugurated “AI” (though he didn’t coin the phrase) in his much-read 1950 paper “Computing Machinery and Intelligence,” speculated that computer programs might improve their performance (output) by crunching more and more data (input). He called them “unorganized machines,” a not-so-subtle admission that organized machines—that is, computer programs—do only what they’re programmed to do.
At issue was his now well-known conversational test of intelligence, the eponymous Turing test . How to pass it? Said Turing: Figure out how to program computers to learn from experience, like people. They had to be a bit more disorganized, like British geniuses. In 1943, earlier even than Turing, neuroscientist Warren McColloch and logician Walter Pitts proposed simple input-output models of neurons. In the next decade, Frank Rosenblatt at Cornell generalized the model to adjust over time, to “learn.” The model entities were called “perceptrons.”

Researchers likened them to neurons, which they weren’t, though they did mimic human neurons’ behavior in a simple sense. Multiple incoming signals, meeting a threshold, resulted in a discrete output. They “fired” after excitation, like synapses more or less, like brain cells. Only, they didn’t reproduce the fantastic complexity of real neurons. And they didn’t really do much computation, either. Call them inspiring.

Still, perceptrons were the world’s first neural networks so they take their place in the annals of computer science. Simple networks with one layer, they could be made to perform simple logical operations by outputting discrete values after meeting a mathematical threshold set for their input. Cool.

Not really. The late MIT AI pioneer later pointed out, in a devastating critique, how powerless the little perceptrons were for addressing any interesting problems in AI. So-called connectionist (the old term for machine learning) approaches to AI were effectively abandoned in favor of logical and rule-based approaches (which also failed, but later). Turing worried that machines would be too rigid. Too organized. But the early machine learning attempts were hopeless.

Fast forward not to 2019 but to 1986. Machine learning research received a major shot of steroids. Feedback loops were introduced into artificial neural networks. No more perceptrons. Backpropagation meant that the input to a neural network propagates through new, hidden layers in more complicated networks. The input adjusts the learning weights and “propagates back” through the network until it settles on an optimal value (actually, a local optimal value, but this is TMI). Backpropagation-powered Artificial Neural Networks (ANNs) could learn, in a sense.

American (‘merica!) psychologists showed how the new systems could simulate one aspect of a toddler’s education by, say, forming the past tenses of English verbs, like start, walk, or run. Simply adding “ed” won’t cut it (unless you’re fine with “runned” or “goed”), so the system had to ferret out the irregular endings and then perform well enough on the task to convince researchers that it converged on the rule, which it more or less did. Machine learning was resuscitated, brought back to life, released from detention. Only, as promising as backpropagation seemed, the modern era of machine learning hadn’t arrived by the late 1980s.

It arrived shortly after Bill Gates’ comments to college kids in 2004. No coincidence, it showed up when the web took off, on the heels of terabytes of user-generated content from so-called web 2.0 companies like Google and Facebook. “Big data” entered the lexicon as volumes of unstructured (i.e., text and image) data-powered machine learning approaches to AI. AI itself was just then experiencing another of its notorious winters after the 2001 NASDAQ crash known as the “dot-com bubble.” Or just as the debacle. No one trusted the web after 2001 (almost no one), because it had failed to deliver on promises, tanked on Wall Street, and was still stuck in what we now snigger and sniff at, the “web 1.0” ideas that launched web sites for companies and tried to sell advertising—we are so different now!

Turns out, anyway, that, with loads of data, machine learning can perform admirably on well-defined tasks like spam detection and somewhat admirably on other tasks like language translation (Google Translate is still worse than you might assume). It can also personalize content in newsfeeds, or recommendations on Netflix, Spotify, and Amazon and the like. Image recognition is getting better. Cars, especially in Silicon Valley, can even sort of drive themselves. All thanks to big data and machine learning.

The learning algorithms in vogue today are still ANNs. They now are convoluted, in at least a wordplay nod to Turing’s disorganization. But they are organized; typically, layers are stacked upon other layers, no matter. Fast forward, along with our Motley Fool reporter to 2019, and we have arrived at Deep Learning. We can trace a line from the lowly perceptron (poor thing) to backpropagation to Deep Learning. This is progress, no doubt, but probably not in the sense intended by Rex Moore’s bold font.

Take a deep breath, look around at the best AI around. Chances are it’s Deep Learning, true. Chances are, too, that it does some one thing. It recognizes human faces, say, but not fish or bicycles. It personalizes your newsfeeds for you, say, but you keep wondering why you’re seeing all the same stuff (or maybe not, if you’re online too much). It drives your car, if you’re rich and friends with Elon Musk anyway, but you deliberately avoid pondering what would happen if the system goes into collision avoidance mode when a deer walks onto the road, followed by a much smaller toddler on a tricycle.

What gives? Letdown alert. Spoiler coming. Deep Learning could be called Narrow Learning if the redundancy didn’t discourage it because machine learning is always narrow. It is always what pundits wax dismissively at as Narrow AI. For that matter, “learning” for a machine isn’t learning for you or me. We (what’s-the-word?) forget, for instance, which should make one suspicious that machines that learn and can’t forget are ever really learning. Also, we tend to learn by incorporating information into a deeper web of knowledge that sits front and center in the mystery of mind. Knowledge is different from information because it gets linked together into a general picture of the world.

General. Not narrow. Get off the computer. Go outside (I’ll join you). Go talk to a neighbor or a friend, or pop over to the coffee shop and chat up the barista. You’ve just done something that Deep Learning can’t do. Worse, it can’t even learn to do it, because that’s not a narrow, well-defined problem to solve. (Now pat yourself on the back. You’re amazing!)

Bill Gates could have been channeling Deep Learning, anticipating it just up ahead, when he spoke in 2004. More likely, he was talking about machines that can really learn. That would be worth 10 Microsofts (time makes fools of us all. He should have said “Googles”). But we don’t have learning systems that really learn. Wait: check that. We have seven billion of them, give or take. Our question is not how to make more out of computer code, but how to make computer code do something more, for us.