In recent years, exposes (article, study) have shown Pharmaceutical companies deeply embedded in the culture of medical research, biasing results heavily. Pharmaceutical companies love it when studies support the drug they want to sell and are not happy with studies that indicate otherwise. The New York times carried an article that shared findings that breast cancer surgical interventions may have been too hasty and many women have had painful treatments that they didn’t need. Surgeons love it when someone publishes a new procedure that they may perform on a patient and get extremely uncomfortable when any research tries to imply that no procedure may help, and the surgeon is better off doing nothing. This conflict of interest in medicine is already common knowledge and ethics committees are doing their best to address it. However, the problem that Anjali, Imran and Pavan are up against runs much deeper. It can be traced to an unknown time and place, when a hero named Karthik, enamored by his own strength and prowess, challenged his quiet and humble brother, Ganesh, to a game he thought would help establish his superiority. Karthik bet that he could circumambulate the world quicker than Ganesh could, and the judge would be their parents. No sooner had Ganesh accepted the challenge that Karthik jumped onto his vehicle and zoomed off, setting a new record for going around the world three times. Confident that this new record could not be beaten, he waited for Ganesh to do his thing. Ganesh, being pot-bellied, got up slowly, and quietly went around his parents three times, and declared, “I’m done.” The stunned Karthik, asked him, “What do you mean?” To which, Ganesh responded, “My parents are my world, and I have just circumambulated my world.”
India being India, Karthik and Ganesh are now treated as gods, with lots of followers. While Ganesh’s simple demonstration set the record straight for Karthik, unfortunately, the same is not true for their followers in the scientific realm. Karthik is the god of a group of scientists who are labeled “frequentists,” who ban the notion of subjectivity or beliefs, and limit their existential understanding to data in the external world. Ganesh is the god of a much smaller group of scientists who are labeled “Bayesians,” who readily incorporate beliefs into their existential understanding and acknowledge that learning does not happen in a vacuum, but in the context of what we already believe. The Bayesian worldview allows for the the fact that two people, coming from entirely different belief systems, may look at the same event and draw entirely different inferences. The Bayesians and the frequentists are not at war. No, that war was won a long time back, and the conclusion was not just that frequentism is a failed philosophy, but that frequentism isn’t even a philosophy but a problematic and narrow method that misguides more than it guides. And yet, strangely, the followers of Karthik continue practicing it, in major fields that touch us very intimately.
To get a little deeper, we shall examine the main method in the frequentist’s arsenal called statistical hypothesis testing, which assumes the following: we know nothing about our universe, and all events that we see are unrelated to each other. Under these strong assumptions, when we aggregate enough trials of what we shall call “random” events, the average of a quantity of interest will be distributed in a bell-shaped curve (called a “normal” distribution). This curve allows us to calculate the chance of seeing a result below a certain level. We then conduct our experiment and gather our data, and based on the sample average and given our expectation of a bell-shaped curve, we check if the chance of getting the average is below 5%. If so, we then claim that the result is statistically significant at the 95% level, which is the standard for most journal papers. By now, many readers might be fooled into thinking all of this sounds rational and sophisticated. An example by applied mathematician John D. Cook illustrates the fundamental problem. By the 95% logic, out of 1000 pre-publication studies of a treatment that is ineffective, we would expect 50 to show an erroneous result of effectiveness being significant, and 950 to not show any significance. The culture of clinical studies is such that journals will not publish results that are not significant. Thus, only the 50 erroneous results will see the light of day and random noise will be elevated to scientific truth. As the standard increases, matters get worse. At the 99% level, only 10 effective results will trump 990 ineffective results and be even more trusted as scientific fact.
The reader may wonder why this problem cannot be solved by also accepting papers that do not claim statistical significance. While that will help, the issue is more fundamental – we live in a world where we know something about everything. To repudiate our existing knowledge and feign ignorance is pseudoscientific and amounts to pretending to be in a position where our mind is the most open, while ignoring that we learn the most when we know nothing. Consider someone who claims to be able to read your mind. If you take the traditional scientific approach and believe you know nothing at all on this topic, then a few positive trials should start to swing your judgment quickly toward believing the mind-reader. However, if you start with a skeptical position, then a few positive trials will not be enough – you will need many consecutive positive results before you start changing your position. In other words, we learn depending on where we stand. A position of total ignorance is also a position, one that results in great importance being given to the results of fewer studies, and should thus only be chosen if we truly have no prior knowledge. The underlying value here is truth – we must start with the whole truth about our prior position if we are interested in arriving at practical results of our scientific inquiry that we can trust.
In January 2011, Newsweek carried an article titled “Why Almost Everything You Hear About Medicine Is Wrong,” ringing the first death knell in the popular media of clinical medical science as we’ve known it for the last four decades. The article focused on the work of Prof. John Ioannidis (currently at Stanford University), who published a paper titled, “Why Most Published Research Findings are False,” where he shows analytically using a Bayesian approach how the positive predictive value of a study (the chance that a theory is true given that a study says its true) decreases based on a variety of practical factors that are usually ignored by the scientific community, leading to the amplification of random noise as scientific theory. He also emphasizes what my professor would say in a folksy way, “when you have lots of data, even the village idiot can tell you what’s going on.” There is no value added by statistical hypothesis testing when the sample size is large. The whole promise of statistics was to aid our intuition when the sample size is small, and this is where statistical hypothesis testing fails spectacularly, by forcing unrealistic assumptions that result in misleading conclusions.
In dramatic style, Prof. Ioannidis published a paper in the Journal of the American Medical Association (JAMA) titled “Contradicted and Initially Stronger Effects in Highly Cited Clinical Research,” which showed that 32% of the most highly-cited studies in the field had exaggerated results. In other words, the high priests of the religion were wrong, once every three times. What hope then for the rest of us? He makes many practical recommendations, some of which include taking a holistic approach, looking at the totality of evidence in the entire community, registering all studies being conducted and committing to share the results no matter what they are, avoiding pharmaceutical company-driven research on good side-effects of existing drugs (these studies have been particularly prone to amplifying random noise as positive results due to vested interests) and giving up the obsession of statistical significance. He further encourages the Bayesian approach with the following recommendation: “Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship.” This is in essence a strong affirmation of the Bayesian approach — telling the whole truth about where we stand right now, which will help determine how we learn. Although Prof. Ioannidis is not very optimistic about these major changes coming in easily, it is incumbent upon the scientists of our time to reflect on the implosion of medical science and ask some hard questions.
One important question that Ioannidis, and the scientific community at large, does not explicitly tackle is: how do we come up with a hypothesis? To push us to the limits of our knowledge, I find the “astrologer question” useful. If an Indian medical astrologer tells me that I have an inflamed liver, and on performing an ultrasound, I find that to be the case, what hypothesis should I form? I have the experience of just myself, in one situation in which the astrologer’s prediction proved uncannily and unexpectedly correct. Suppose I form a hypothesis test by checking the accuracy of all medical predictions by this astrologer, and test how many turn out to be correct. Suppose further that a majority are not accurate. Should I reject astrology altogether? Or should I then try to discover what might explain my experience? Some methods like ethnography from the social sciences allow induction of hypotheses from the best available data. The notion of hypothesis induction is taboo in the world of classical statistics which treats the use of data to form hypotheses as tampering. As we attempt a transition from being followers of Karthik to becoming followers of Ganesh, the question of hypothesis formation merits some pause. Being a Bayesian may be necessary for practical scientific thinking, but it may not be sufficient.
As legendary scholar Abraham Maslow (of Maslow’s Hierarchy of Needs fame) points out in his book, The Psychology of Science, the current methods of the “scientific orthodoxy” are the methods of physics and astronomy that were born of the Industrial Revolution. During that time, a mechanistic worldview was imposed upon every pursuit, from education to business to the sciences. While this may have yielded some results in the inanimate world, reducing humans to chemicals, mechanistic particles or single numbers is problematic and unlikely to yield practical insights of a holistic being that is far more than the sum of its parts. Methods in medicine that maintain a holistic approach in regard to understanding humans and their needs have been around in the East, but are yet to be accepted by Western scientific orthodoxy.
All of this can be very disturbing for Anjali, Imran and Pavan to accept, notwithstanding the acceptance of most of the criticism by the scientific community that is largely invisible to the public. They may rightfully question our challenge of Karthik’s religion, and, ignoring the evidence already cited, ask for more real-world evidence that Karthik’s religion is a false one, and that Ganesh’s religion might be a better one. To answer this, we need to take a journey into other fields that have used similar methods. A primary example is statistical finance, whose mess we are still cleaning up wherever the methods were used, after a stupendous collapse that was starkly predicted by the gutsy options trader Nassim Taleb, author of Fooled by Randomness and The Black Swan. In an article in the New Yorker, Jonah Lehrer covers among other things, the crisis in the field of Psychology, which has been a heavy user of statistical hypothesis testing. Theories in that field have shown a “decline effect,” where once the theory is established, it becomes harder and harder to replicate, ultimately getting next to impossible. Not quite an example just yet, but one to watch out for would be climate science, where we can expect to see an implosion in the next few years as they use a lot of statistical hypothesis testing.
On testing out Ganesh’s religion in medical science, although much work remains to be done, Prof. Stephen Schneider, a rare Bayesian climatologist from Stanford University, found himself facing off with the medical establishment when he developed a rare form of cancer with a stark prognosis. Using his understanding of the Bayesian philosophy, he fought the system and prevented himself from becoming another statistical casualty. In the process, he helped develop a new protocol for cancer treatment, survived eight years longer than this doctors thought he would, and left behind a book called “The Patient from Hell,” that points out the fallacies of evidence-based medicine (a fancier name for classical statistical science), advocating instead for a Bayesian outlook. The main argument is that an individual is not a statistical average – therefore, when trying treatments, we should not be dogmatic about requiring statistical data (especially when there are none for rare illnesses) – we must be willing to combine our beliefs with contextual observations to make sensible inferences. The book is remarkable in that it is written not just for doctors and patients facing dreadful diseases like cancer, but for what Prof. Schneider called “patient advocates,” perhaps labeling a new profession that applies the Bayesian philosophy to find practical treatment solutions for each individual patient, customized to their context.
Even after all this evidence, Anjali desperately clings on to IVF for hope. A pioneer of the technique, Dr. Sami David, said in a CBS interview that IVF has “gone amok,” with countless women who have treatable causes of infertility opting instead for IVF, because it is pushed aggressively by the clinic. People on both sides of the table are either not interested or don’t have the time to get to the root cause of the problem. Dr. Geeta Nargund, head of reproductive medicine at St. George’s hospital, said in a Daily Mail interview, “Women are going around from clinic to clinic and receiving different doses of these drugs but there is no sound scientific evidence to show that it will help improve their chances of conceiving.” Imran swallows hard when a little online research reveals a number of studies that have failed to show any link between vitamins and longer life or better health. While absence of evidence is not evidence of absence, one wonders what the science was that led the vitamin manufacturers to start their business (was it statistical hypothesis testing?). Pavan meanwhile still refuses to question his nutritionist’s source of information on ghee, although Ayurvedic practitioners consider it therapeutic when used in moderation, and studies that have vilified ghee have involved statistical hypothesis testing.
Still not convinced, the trio ask, if clinical medical science has been using erroneous methods, how have we made important advances in our understanding using these methods? A better question would be whether there exists a third factor that explains both scientific progress and the results of statistical hypothesis testing. Perhaps that factor is simply the quality of observation of the phenomenon under study by the scientist; long before statistical methods, scientists have been keen observers of nature, discovering astounding truths by committed and powerful observation.
Note: While Anjali, Imran and Pavan are fictional characters, their attitudes are not and represent conversations I’ve had with different people over the years.
Acknowledgments: Thanks to Dr. Thomas Seyller, Michael Silberman and Francisco Ramos-Stierle for helpful feedback during the development of this piece.