What Counts as Legitimate Scientific Research on Prayer?

My basic response is this: Prayer effects should be subject to the same standards as other research. This means that standards should not be lower, nor should they be higher.
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

I am pleased to see the lively exchange of comments generated by my recent post: "Testing Prayer: Can Science Prove the Healing Power of Prayer?" Many people have raised difficult and important questions that I have been thinking about for years. These questions can roughly be grouped into six categories: scientific methods, evidence, alternative explanations, other studies showing null results, theology and presuppositions.

My book, Testing Prayer: Science and Healing, is the outcome of my efforts -- in collaboration with a team of biomedical and clinical researchers -- to work through these questions over the past eight years. The book traces a history of why empirical research on prayer tends to be controversial (as responses to my blog post illustrate), as well as arguing that -- despite inherent difficulties -- there is reason to pursue such research, and suggesting how researchers might go about it. I would invite those who raised objections to my previous post to read the book as a basis for more in-depth discussion of the issues, since the book directly responds to many of the comments. In this post, I want to give some very brief responses based on a fuller treatment in the book.

My basic response is this: Prayer effects should be subject to the same standards as other research. This means that standards should not be lower, nor should they be higher. Anyone who argues for lower standards risks confirmation bias, and anyone who argues for higher standards (than for other empirical research) ends up advocating for double standards, thus betraying their metaphysical presuppositions. It is important to distinguish observable effects from a separate question of what causes those effects. In a properly done study, an effect is an effect, and a null result is a null result, regardless of prior beliefs and desired outcomes.


Could a placebo effect be responsible for at least some of the outcomes observed after prayer? Absolutely. But a more interesting question is how far do placebo effects extend, and relatedly, where do prayer effects fall relative to known placebos? To say it is all placebo and then dismiss further inquiry is about as useful as saying physics is all quantum mechanics anyway, and we know it exists so why bother studying it? Some would argue that if prayer doesn't work for everyone all the time, then it must not yield valid effects. But many medicines are not 100 percent effective, and effectiveness may even vary across genetic groups, but that doesn't mean the medicines are useless. The point is that if any effect can lead to significant improvements in certain health problems, such as hearing and vision impairments, in at least some population, then that should be interesting as a potential therapy regardless of cause. It is easy enough to suppose that prayer practitioners are wrong in their assumption that a deity is responsible for healing, and natural explanations may be found. Nevertheless, merely asserting the existence of natural mechanisms and then poking one's head in the sand without engaging the data doesn't tell us anything new.

Since some criticisms of my earlier post focus primarily on clinical and statistical methods -- although this is only one of the four "cameras" used by the book to examine prayer -- I'll say more about these methods. A "Study of the Therapeutic Effects of Proximal Intercessory Prayer (STEPP) on Auditory and Visual Impairments in Rural Mozambique" was published in the peer-reviewed Southern Medical Journal; results and responses to critics are discussed at length in the book.

The Mozambique (and Brazil) studies used a widely accepted within-subjects design as the most efficient way for a preliminary study to test whether any effects exist -- before attempting to isolate mechanisms, placebo or otherwise. The study did control for potential confounds in other relevant factors. Within-subject designs (as opposed to between-subjects designs) do not use a separate control group. There is a long tradition of using within-subjects designs for psychophysical studies including vision and hearing, even with relatively small numbers of subjects. The results of these studies have been (and continue to be) published in well-respected journals, for example the flagship Science magazine. If the STEPP results are invalid for using a within-subjects design, then so are thousands of other published studies that use similar methods, unless one applies an indefensible double standard. Now that preliminary research has suggested the existence of an effect, it would be appropriate to use a between-subjects design, one that does utilize a separate control group and certain types of blinding, in developing more refined protocols.

Even with the smaller sample size of the STEPP study, there were large enough effects in individual subjects and consistent effects across the study populations for the results to reach statistical significance for both hearing and vision improvement following prayer. A smaller sample size does not mean that the results are un-generalizable. Smaller sample sizes simply require a larger effect size in order to reach statistical significance. Can the significant improvements in visual acuity and hearing thresholds really be attributed to a mechanism acting via prayer? Correlation does not equal causation, but the real question is whether the degree of improvement after PIP (without medical treatment) is greater than the degree of recovery that occurs spontaneously or through other means. Past studies actually show that hypnosis and suggestion can yield tiny improvements in visual acuity, but the improvements we measured were orders of magnitude larger, to the point that many individual cases could be considered "black swan" events. We have also ruled out ambient noise, practice effects, Hawthorne effects, demand effects, and holdback effects as potential confounds. Nor are the results anomalous--they were replicated months later in a separate country. What remains is a set of strong effects that invite further investigation and explanation.

Popular in the Community


What's Hot