Monday, August 21, 2017

Is Science Broken?

Or is it self-correcting?

Lisa Larson-Walker
Two years ago this month, news of the replication crisis reached the front page of the New York Times. “Psychology’s Fears Confirmed: Rechecked Studies Don’t Hold Up,” read the A1 headline on the morning of Aug. 28, 2015. The journal Science had just published a landmark effort to reproduce the findings of 100 recent peer-reviewed psychology experiments, and just 39 of those replications succeeded. This dispiriting result, the Times reported, “confirmed the worst fears of scientists who have long worried that the field needed a strong correction.”
Daniel Engber is a columnist for Slate
In a matter of hours, news of this massive scientific failure drifted into right-wing media. “So many people in the country have lost faith in so many institutions they used to trust,” Rush Limbaugh told his 13 million listenersin a bloviating monologue on the mendacity of the elites and the rise of Donald Trump. Now, the radio host explained, the American people had cause to turn on science, too. “What you can assume here, safely so, is that the vast majority of what you hear—if you hear, ‘from the journal Science,’ ‘from the journal [sic] Psychology Today’—it’s all bogus,” he said. “What has been exposed here is that science is no different than anything else in politics. It is totally determined by money. Scientific results can be purchased.”
With that, another of scientists’ biggest fears was confirmed: that any discovery of major problems in their field would end up being used against them. They’d worried that front-page coverage of the “replication crisis” would give Limbaugh types ammunition to knock them off their pedestal, and a fresh excuse to flush their carefully collected data into the sea of politics and ideology. Now it looked as though that fear, too, had been borne out. “The game is rigged,” Limbaugh concluded that day in 2015. “Everything’s been so corrupted, science especially, by politics.”
Could Limbaugh’s rant be taken as a cautionary tale for science journalism—an example of what happens when reporters catastrophize the replication crisis? A new book, The Oxford Handbook of the Science of Science Communication, lays out this case. News stories about problematic research often serve as chum for anti-science trolls, argue professors Joe Hilgard of Illinois State University and Kathleen Hall Jamieson of the University of Pennsylvania in a thoughtful chapter titled “Science as ‘Broken’ Versus Science as ‘Self-Correcting.’ ” The risk is most acute when journalists recklessly suggest that science, as a whole, has somehow gone off the rails. When they employ a “science is broken” frame, they end up causing “reputational harm to science” and contributing to a dangerous and misleading “news climate” that can be “mined by those interested in attacking scientific findings they consider ideologically uncongenial.”
According to Hilgard and Jamieson, science isn’t really broken and reporters oughtn’t say it is. They argue that scandals in the field show the ways that science works: Whenever there’s a problem, it self-corrects. That’s the frame they recommend to journalists, “science is self-correcting,” and the deeper truth they’d like to see expressed. When it comes to science, they believe, a crisis is a sign of strength.
Let me frame my thoughts on this as clearly as I can: I think Hilgard and Jamieson are wrong. Science is broken, at least by any useful definition of the word. Self-correction doesn’t always happen, and science journalists mustn’t be afraid to spell that out.
I’ll admit this conversation strikes a nerve as I’ve been working in the “broken science” frame for a long time now. In my year and a half on the replication beat for Slate, the phrase science is broken has appeared in the headlines of two of my stories. Another headline claimed that “cancer research is broken,” while a fourth announced, with reference to psychology, that “everything is crumbling.” I could say that reporters rarely write their own headlines—indeed, those phrases originated with my editors—and that I don’t believe I’ve ever put the B-word in the body of a replication piece. But I stand behind the framing nonetheless.
In the last few years we’ve learned that science sometimes fails to work the way it should. Suggesting it might be “broken” is not the same as saying it’s in a state of utter, irreversible decrepitude—that every published finding is a lie, or that every field of research is in crisis. Rather, it suggests a dawning sense that things have gotten wonky in a widespread way. It says our vaunted engine of discovery is sputtering and that it’s time we brought it in for repairs.
It was Robert Hooke, four centuries ago, who first described the scientific method in this way, as a sort of engine for “directing the mind in the search after Philosophical Truths.” By the 1700s, scientists had gained a bit more modesty; they admitted that their machine sometimes got things wrong. A new idea took hold of science as a self-correcting enterprise that converges, sometimes sloppily, on the truth. One natural philosopher likened research to the act of doing long division: With each step, the remainder shrinks a little more, and the field naturally inches ever closer to the correct answer. This theory of the scientific method—as less an engine than a self-driving car that glides toward knowledge—was most famously asserted by the philosopher Charles Sanders Peirce. “This marvelous, self-correcting property of Reason … belongs to every sort of science,” he said in 1898.
The marvelous property has since been wielded, on occasion, as a magic wand to wave away egregious missteps. Whenever a researcher is outed as a fraud, it’s inevitable that some science poobah will describe the mere fact of the miscreant’s discovery as a victory for “self-correction.” Here’s how Hilgard and Jamieson deploy the frame in reference to the 2014 case of a Japanese stem-cell researcher whose paper was retracted after she’d been found guilty of manipulating data: “If critique and self-correction are hallmarks of the scientific enterprise, then instances in which scientists detect and address flaws constitute evidence of success, not failure, because they demonstrate the underlying protective mechanisms of science at work.”
In the stem-cell case, self-correcting science did appear to work as advertised: Problems in the paper were discovered by attentive colleagues shortly after it appeared in print. But the recent history of science fraud suggests that many more examples come to light not quickly and not via any standard self-corrective mechanism—e.g., peer review or unsuccessful replications—but rather at a long delay and through the more conventional means of whistleblowing. That’s how Diedrik Stapel, a notorious fabulist with 58 retracted papers in social psychology, was discovered in 2011. The fact that Stapel’s brazen fraud had not been caught (or self-corrected) earlier made his case a seminal event in the current replication crisis. Why had no one noticed, in strictly scientific terms, all the false effects that he’d slipped into the literature?
Isolated fraud has never been the substance of the crisis, though. In the years since Stapelgate, piles of perfectly ethical research papers have been perched on precarious data. It turned out that industry-standard methods of designing, analyzing, and reporting on experiments could yield seemingly impossible results—the existence of ESP, for example, or an ability to time-travel. By the time Rush Limbaugh started yammering about “bogus science” in the summer of 2015, psychologists, doctors, and researchers in several other disciplines already had the inkling that a startling proportion of their fields’ discoveries could be little more than statistical noise.
In fact, the week before the New York Times put the replication crisis on A1, science journalist Christie Aschwanden laid out these facts in great detail in a wonderful article and interactive for FiveThirtyEight. Her piece runs through the many biases, errors, and inefficiencies of modern scientific practice that allow false findings to infiltrate the literature. Researchers can hack their way to spurious conclusions, and they’re incentivized to hide negative results. Journal editors ignore replication failures, and they’re often slow to fix mistakes.
Aschwanden’s piece could be thought of as a thorough brief for the argument that science is, indeed, a shit show—that its self-corrective mechanisms have fallen into disrepair. Yet her reporting reaches the opposite conclusion. “Science isn’t broken, nor is it untrustworthy,” she writes. “It’s just more difficult than most of us realize.” The problem, says Aschwanden, is that we expect too much of science; we act like it’s an engine for discovery, when it’s just a means of moving, herky-jerky, down the long and curvy road to truth. If science looks to be a mess, she says, that’s because it’s messy work. It’s “a process of becoming less wrong over time,” she explained in a subsequent, equally optimistic piece with the headline, “Failure Is Moving Science Forward: The replication crisis is a sign that science is working.”
I haven’t seen a better set of write-ups in the “science is self-correcting” mode. (Aschwanden’s original piece richly deserves the multiple awards it has received.) Even so, this framing has always struck me as bizarre. It’s as if we’d noticed that the engine in our car was on fire, and then concluded that the vehicle must be running fine, because otherwise how would we have ever seen the smoke billowing out from underneath the hood? To put it another way: If the replication crisis is a sign that science isn’t broken, then what does “broken” even mean?
It may be true that, eventually, science self-corrects. (At the very least it’s impossible to falsify that claim.) But the more relevant question is, how quickly does science self-correct? Are bad ideas and wrong results stamped out within a year or two, or do they last for generations? How many hours must we squander in the lab in pursuit of empty theories? How many research grants are wasted? What proportion of our scientists’ careers will be frittered away on the trail of nothing much at all?
It’s tempting to assume that self-correction is a force of nature and that every faulty fact will molder and decay. But the replication crisis shows there is no half-life for our bungling. In practice, we must always act to fix mistakes—and that action often gets delayed beyond all reason. That’s what it means to say that science is “broken”: It’s not that all of science is a sham but that it’s not self-correcting fast enough.
For example, it's been known for half a century that psychology studies tend to be too small. In a 1962 paper, statistician Jacob Cohen showed that psychologists rarely used as many subjects as they should, and that they were “non-rational” in their approach to choosing sample sizes. All this underpowered work was therefore “wasteful of research effort,” he said. His critique would be repeated and expanded many times after, including in famous work by luminaries of the field. Yet nothing changed for decades; the culture and convention never budged. A meta-analysis published last fall concluded that, up through 2011, studies in psychology were exactly as weak, statistically, as they’d been in 1960. “Power has not improved despite repeated demonstrations of the necessity of increasing power,” the authors wrote.
Anecdotally, it seems there’s been a bit of movement in the last few years—sample sizes may at last be inching up. (Several other vital fixes have also begun to spread, including study pre-registration and data sharing.) But the change in practice—if in fact there’s been a change in practice—only came after researchers spent more than 50 years mindlessly repeating the same mistakes. It took Diedrik Stapel’s fraud, and a paper “proving” ESP, and the growing sense that science is in crisis to make them “self-correct.” First the scientists had to figure out that Cohen’s quibble wasn’t just some technicality, but rather that it pointed to a deep dysfunction in their field. That is to say, they had to grapple with the naked fact that psychology was broken.
Even now, efforts to self-correct psychology have been slow and controversial. Not everyone agrees the field needs substantial fixing, especially among the older tenured generation. That’s why it worries me when I hear scientists say that stories on the replication crisis should be framed a certain way. Conservatives in science naturally prefer the “self-correcting” frame, since it implies protection of the status quo, and greater deference to authority. When we talk about brokenness, we make it harder to pretend that everything’s going to be OK.