Written with Helena C. Kraemer, Ph.D.
Two years ago this month, APA announced the start of field trials that would subject proposed diagnostic criteria for the future DSM-5 to rigorous, empirically sound evaluation across diverse clinical settings. And now, as the first comprehensive analyses of that effort are published, what's clear is just how well the field trials did their job.
The point of testing the preliminary criteria in real-world environments was never to rubberstamp a specific outcome. Quite the contrary, we looked to the trials to provide crucial information on which diagnoses and definitions were most effective for and with clinicians -- and which missed the mark. We selected disorders with high clinical and public health importance, disorders with major possible changes or newly-proposed disorders, and we always expected issues to surface. Indeed, had they not, there would have been legitimate questions as to the quality of the field trials' design and sensitivity.
We ultimately tested the criteria for 23 disorders. The question we asked was a straightforward one: In the hands of regular clinicians, assessing typically symptomatic patients in no different a way than they would during everyday practice, what's the chance that a second, equally expert diagnosis will agree with the first, making a particular diagnosis reliable? A reliability of 1 means that the two diagnoses will always agree; a reliability of 0 means that the second is no more likely to agree than it is to disagree. In general, the reliability indicates how much higher the chance is that the second diagnosis will agree with the first than that it will disagree. Before field trials began, we articulated our scale for evaluating reliability along the scale of 1 (reliable) to 0 (unreliable): very good, good, questionable, and unacceptable.
For the majority of disorders, the reliability of their criteria was as good as, if not better than, that of medical diagnosis in general -- results that reflected the extraordinary work done throughout the lengthy DSM-5 development process. Yet while the strong findings were welcome validation, the less positive findings were equally instructive. We'll focus on examples of both.
Fourteen diagnoses ranked in the top categories of "good" or "very good" reliability, among them criteria for schizophrenia, attention-deficit/hyperactivity disorder, and post-traumatic stress disorder, as well as for new entries such as somatic symptom disorder and autism spectrum disorder. The results for the latter were gratifying given the concerns of advocates and parents that many children could be adversely affected, and we hope they now feel reassured.
On the other side of the line were three diagnoses that fell into the category of "unacceptable" reliability; each has since undergone substantial revision or is no longer proposed for inclusion. That leaves six additional diagnoses, which finished with acceptable but low reliability. Several already have been revised or, in the case of attenuated psychosis syndrome, recommended to be moved to the section of the manual that stipulates further study is needed.
Still, some DSM-5 detractors have spotlighted the six as indicative of flaws in the field trials, especially because this group included major depressive disorder and generalized anxiety disorder, two of the most commonly diagnosed conditions. The opposite is closer to the truth. Rather than discrediting the field trials, the outcome here reveals the critical value of how the trials were constructed and conducted and how we are moving forward.
Ironically, both major depressive disorder and generalized anxiety disorder were tested not because they were being modified for the next manual, but because they were remaining relatively unchanged and could serve as reference disorders from the DSM-IV trials. But as part of that process two decades ago, patients were carefully screened, and participating clinicians received special training and explicit direction on how to perform evaluations. In contrast, the DSM-5 field trials accepted patients as they came and asked clinicians to work as they usually did -- to mirror the circumstances in which most diagnosing takes place.
We believe the DSM-5 results represent the truer picture of the difficulty clinicians may have in reliably diagnosing both conditions, either because they often occur with other conditions or because they are accompanied by symptoms that can fluctuate greatly. Regardless of why, we acknowledge that the relatively low reliability of major depressive disorder and generalized anxiety disorder is a concern for clinical decision-making. Strategies need to be developed to address the problem as the manual evolves into a living document that incorporates revisions and additions as research and clinical practices advance. The good news is that we're now inherently better prepared for this challenge; the DSM-5 field trials have laid the groundwork for how such strategies and future changes should be judged.
The field trials were a massive undertaking involving hundreds of mental health professionals and about 3,500 patients in settings nationwide. We are deeply appreciative of the time they gave and the effort they expended, which helped us know what proposed diagnostic criteria were ready for primetime and what needed to be reconsidered or rejected. Even as the clock winds down, the field trial results continue to shape our final recommendations for DSM-5.
When the next manual is presented in December for the APA Board of Trustees review, thanks to the field trials, it will be ready.
Helena C. Kraemer, Ph.D. is a member of the DSM-5 Task Force and the chief methodologist of the DSM-5 field trials, as well as emeritus faculty at Stanford University.
For more by David J. Kupfer, M.D., click here.
For more healthy living health news, click here.