The Canadian Journal of Psychiatry / La Revue Canadienne de Psychiatrie
2023, Vol. 68(9) 639‐648© The Author(s) 2023Article reuse guidelines:sagepub.com/journals-permissionsDOI: 10.1177/07067437231154993TheCJP.ca | LaRCP.ca
Objective: This study aimed to provide a general overview of mental health randomized controlled trials (RCTs) and summarize the temporal trends in terms of the number of studies, median sample sizes, and median effect sizes using data collected from the Cochrane Database of Systematic Reviews (CDSR).
Methods: Using data collected from the CDSR, the temporal trends are compared in terms of the number of studies, median sample sizes, and median effect sizes between two broad categories of interventions: pharmacological RCT (ph-RCT) and non-pharmacological RCT (nph-RCT), and in conjunction with major mental disorder categories.
Results: Chronologically, the number of mental health RCTs reported in publications has increased exponentially from 1955 to 2020. While ph-RCT comprised a majority of mental health RCTs in the earlier years, the proportion of nph-RCTs increased more quickly over time and markedly exceeded ph-RCT after 2010. The median sample size for all 6,652 mental health RCTs was 61, with 61 for ph-RCT and 60 for nph-RCT. Over time, the median fluctuated but an increasing trend was observed over the past 60+ years. The median of the effect size, measured by Pearson’s r, for overall RCTs was 0.18, and nph-RCT (0.19) had a larger median effect size compared to ph-RCT (0.16). Over the years, the nph-RCT had a larger median effect size than the ph-RCT. Differences in the median effect sizes among the categories of mental disorders were also noted. Schizophrenia had the most RCTs, with a median Pearson’s r value of 0.17. Mood disorder had the second largest number of RCTs and a median Pearson’s r value of 0.15. Neurotic/stress-related mental disorder had the third largest number of RCTs with the highest median Pearson’s r being 0.23.
Conclusions: This study provides meaningful information and filled the knowledge gap in mental health RCTs.
La présente étude visait à offrir un aperçu général des ERC en santé mentale et à résumer les tendances temporelles du nombre d’études, des tailles moyennes d’échantillons, et des tailles d’effet moyennes à l’aide des données recueillies de la Base de données Cochrane des revues systématiques (BDCRS).
Méthodes: À l’aide des données recueillies à la Base de données Cochrane des revues systématiques, les tendances temporelles sont comparées relativement au nombre d’études, aux tailles d’échantillons moyennes, et aux tailles d’effet moyennes entre deux larges catégories d’interventions pharmacologiques ERC (ph-ERC) et d’ERC non-pharmacologiques (nph-ERC), le tout en conjonction avec les catégories majeures de trouble mental.
Résultats: Chronologiquement, le nombre d’ERC sur la santé mentale rapportés dans les publications a augmenté exponentiellement de 1955 à 2020. Même si les ph-ERC comprenaient une majorité d’ERC sur la maladie mentale dans les premières années, la proportion des nph-ERC a augmenté plus rapidement avec le temps et a nettement excédé les ph-ERC après 2010. La taille d’échantillon moyenne pour tous les 6 652 ERC sur la santé mentale était de 61, avec 61 pour les ph-ERC et 60 pour les nph-ERC. Avec le temps, la moyenne fluctuait mais une tendance à la hausse a été observée au cours des 60+ dernières années. La moyenne de la taille d’effet, mesurée par le r de Pearson, pour l’ensemble des ERC était de 0,18, et les nph-ERC (0,19) avaient une taille d’effet moyenne plus grande comparé aux ph-ERC (0,16). Avec les années, les nph-ERC avaient une taille d’effet moyenne plus grande que les ph-ERC. Les différences entre les tailles d’effet moyennes parmi les catégories de troubles mentaux ont également été notées. La schizophrénie avait le plus d’ERC avec une valeur moyenne du r de Pearson de 0.17. Le trouble de l’humeur avait le deuxième nombre d’ERC en importance et une valeur moyenne du r de Pearson de 0,15. Le trouble mental névrotique/lié au stress avait le troisième nombre d’ERC en importance avec le r de Pearson plus élevé à 0,23.
Conclusions: La présente étude offre de l’information significative et a comblé l‘écart des connaissances au sujet des ERC en santé mentale.
Keywordseffect size, mental health, non-pharmacological interventions, pharmacological interventions, randomized controlled trial
Since the introduction of randomized controlled trials (RCTs) in 1948,1 tens of thousands of such experimentations have been conducted. The ability of RCTs to control for confounding factors, reduce bias, and elucidate the direction of causation enables a robust evaluation of the efficacy and effectiveness of interventions. RCTs are regarded as the gold standard in the medical/health field and have a vital role in the development of evidence-based treatment interventions.2,4
Key issues in any scientific field are replicability, so we can have confidence in the findings before any form of application and generalization. In particular, the mental health field faces a replication problem based on the Reproducibility Project.5 In this project, 270 psychologists from around the world collaborated to replicate 100 previous studies. The findings were as follows: the mean effect size of the replication effects was only half the magnitude of the original effects; 97% of the original studies had significant results, but only 36% of the replications had significant results, and only 47% of the original effect sizes were in the 95% confidence interval of the replication effect size. While scientists on the project did not receive any funding, the replication of these 100 studies is nonetheless very costly.5 We believe that a more thorough overview is warranted to facilitate the future evaluation of replicability in this field.
Compared to RCTs in other medical fields, wherein the majority are concerned with pharmacological/pharmaceutical interventions, mental health RCTs are special in a few aspects as they are more complex in nature.6 The treatment outcomes are usually dependent on the behaviors of the therapists, clinicians, and those who receive the intervention; most of the interventions contain a range of outcomes in multiple domains, and there is often a need to adapt the intervention to different contexts and settings.6 It is also very common that patients with mental disorders may have multiple diagnoses7 and display challenging and complicated symptoms that may not be compatible with tightly controlled demands for most clinical trials. All these characteristics have impeded the generalization of RCTs in the mental health field and have elicited arguments regarding the suitability of RCTs for evaluating complex mental health interventions.8 Accordingly, advocates called for more RCTs in the domain of mental health with bigger (larger sample size), simpler and fewer outcome measures, and more “life-like realist” (more pragmatic trials such as stepped-wedge, patient preference) designs.8,9 Meanwhile, the interventions that do not use pharmacological drugs, termed as non-pharmacological interventions in this study, such as psychotherapy, behavioral therapy, social therapy, psychoeducation, care process, and others,10 have gained popularity in mental health research. These non-pharmacological interventions are often relational in nature and rely heavily on the clinician’s ability to navigate interpersonal relationships and actively engage trial participants in treatment.11
Debates regarding the efficacy of pharmacotherapy versus psychosocial therapy are ongoing. It has been reported that the utilization of psychotherapy declined remarkably in recent years at least in the outpatient treatment of mental disorders.12,13 The reasons for the decline were complex, including financial reasons such as incentives provided by the pharmaceutical industry in promoting the use of pharmacotherapy, and ideological reasons (e.g., people believed pharmacotherapy was a biological treatment and more scientifically valid than psychotherapies).13 However, several meta-analyses have provided evidence that psychotherapies are generally not less efficacious than pharmacotherapy for some particular mental disorders.14,15 A recent study showed that psychotherapy had a biological basis and probably engaged neural mechanisms similar to pharmacotherapy.16
Given the vast difference between the two broad types of interventions, pharmacological RCT (ph-RCT) versus nonpharmacological RCT (nph-RCT), an overview of the mental health RCTs comparing the two types of interventions would provide meaningful information and fill the knowledge gap regarding the difference between ph-RCTs and nph-RCTs pertaining to the efficacy and other properties of the trials. In this study, we provide a general overview of mental health RCTs and summarize the temporal trends in terms of the number of studies, median sample sizes, and median effect sizes using data collected from the Cochrane Database of Systematic Reviews (CDSR). Comparisons of these properties between ph-RCTs and nph-RCTs are conducted, in conjunction with the major mental disorder categories.
Systematic reviews and meta-analyses aim to identify, evaluate, and summarize the findings of all relevant individual studies on health-related issues and are generally considered to provide the best evidence to support practices.17 Cochrane is widely known to have developed many strict methodological and reporting standards based on which systematic reviews should be conducted.18 Cochrane systematic reviews and meta-analyses are regarded as the “gold standard” for high-quality information and are widely used to inform healthcare policy and practice.19 Many meta-studies have been performed with CDSR data because it provides a well-organized data file per review using XML that contains the data for the primary studies included in the meta-analysis.20,22 Through the R script developed by Schwab (https://github.com/schw4b/cochrane), we downloaded, imported, and parsed the data for the selected systematic reviews from the CDSR (CAMH has an institutional subscription of the Cochrane Library [accession number: 1008138765]). The parsed data contain detailed information for each study that was cited in the meta-analysis by the review, including the Cochrane ID of the review, the study title, author names, publication year, outcome measures, and sample sizes for the intervention and control. The data also contained study characteristics for identifying whether the study was an RCT and whether the key outcome measures were about efficacy or feasibility.21,22
In terms of selection criteria, we selected reviews that (a) were categorized into the “Intervention” review type (for reviews of studies involving an intervention, such as a drug, a surgical procedure, a medical device, psychotherapy, a care model, etc.), (b) were in the field of Psychiatry and Mental Health (CDSR topics of Mental Health; Developmental, Psychosocial, and Learning Problems; Neurology – Dementia and Cognition; Neurology – Delirium), and (c) were published in CDSR before December 31, 2021.
The data collection was divided into two branches. In Branch 1, study-level data for the studies cited by the selected reviews were collected automatically using the R script described above. The study characteristics collected at this branch included design type (RCT vs. non-RCT), publication year, sample sizes, and information on various types of outcome measures. In Branch 2, review-level data for the selected systematic reviews were collected manually by two researchers independently on (a) the types of intervention (pharmacological, non-pharmacological, or both) that the review evaluated and (b) the category of the mental disorder that the review focused on using the International Classification of Diseases 10th Revision (ICD-10) system. Pharmacological intervention refers to interventions that use medications, whereas non-pharmacological interventions specify interventions that do not involve the use of medication, including but not limited to surgical procedures, medical devices, psychotherapy, behavioral therapy, social therapy, psychoeducation, care processes, and others. The type of intervention for a review was categorized as “both” when the review summarized the intervention effects from both pharmacological and non-pharmacological interventions. Discrepancies between the two researchers were discussed with an additional researcher and resolved when unanimous decisions were made. With respect to mental disorder categories, there were 11 major classes of mental diseases in the ICD-10 system. As some disease classes had low frequency in the data of this study, mental disorders that were of interest to the RCTs were re-grouped into five broad categories: schizophrenia and other psychotic disorders, mood disorders, neurotic/stress-related disorders, organic mental disorders, and others (including all other mental illnesses that were not in the aforementioned four classes of diseases, and other mental health-related conditions that were not in the categories with the ICD-10 classification system). Categorization of each review was accomplished by reading the abstract, and when the abstract proved to be insufficient, the whole text was independently performed by the two researchers. Discrepancies were resolved by a three-person panel, similar to the categorization of the intervention type.
Data collected from Branch 1 were then merged with data from Branch 2 via Cochrane ID; hence, each study had additional information for “disease category” and “intervention type.” For studies in a review categorized as “both (pharmacological and non-pharmacological),” the study level data were further investigated to obtain “intervention type” at the study level. Non-RCT studies and those that contained only feasibility outcomes were excluded. Furthermore, only the primary or first reported efficacy outcomes from each RCT were included in the analysis.21,22
In the primary studies, effect sizes from the RCTs were reported using various statistics, including standardized mean difference (SMD) or mean difference (MD) for continuous measures, and relative risk (RR) or odds ratio (OR) for dichotomous measures. In this study, SMD and MD were unified using the Hedges’ g SMD statistic23 for RCTs with continuous outcome measures, and RRs were converted to ORs for RCTs with dichotomous outcomes.21 All the aforementioned calculations were conducted in R using the R package “metaphor.”24 Depending on the scales of the mental health outcome measures, a negative SMD or OR of less than 1 may represent either improved/favorable or deteriorated/non-favorable effects. The absolute values of SMD were taken, and ORs below 1 were inverted (e.g., an OR of 0.5 becomes 2.0).20 The conversion was performed so that positive SMDs and larger ORs indicated improvement/favorable outcomes. We acknowledge this notion as a limitation of the study, as a few of the conversions may have mistaken the direction of the effect. To compare across the RCTs, ORs were then converted to SMD with the R package “effect size” (with the formula: ); were further transformed to correlation coefficient Pearson’s r with the R package “effect size” (with the formula: because Pearson’s r is bounded, well-known, and more readily interpretable.5,21 SMD of 0.2, 0.5, and 0.8, or Pearson’s r of 0.1, 0.3, and 0.5 were considered small, medium, and large effect sizes, respectively.
As of December 31, 2021, there were 824 (after removing duplicates) systematic reviews in the CDSR on topics of Mental Health, Developmental, Psychosocial and Learning Problems, Neurology – Dementia and Cognition, and Neurology – Delirium. Altogether, 168 reviews were eliminated due to withdrawal, non-relevance to mental health (e.g., developmental problems), or no meta-analysis data included in the review, with 656 mental health Cochrane systematic reviews remaining.
Study-level data were downloaded from the CDSR for 9,446 primary studies that were included in the 656 reviews. In total, 2,520 primary studies were further eliminated due to not being identified as RCT or not having efficacy outcomes. Additional elimination was performed for RCTs that did not have critical information for the data analysis, such as not having publication year, not being able to dichotomize the intervention type, and not being able to calculate SMD or OR. The final dataset included 6,652 RCTs in mental health and psychiatry (refer to Figure 1).
Chronologically, the number of mental health RCTs reported in publications has increased almost exponentially from 1955 to 2020. In the five-year interval (with exceptions for the years 1955–1964 wherein ten-year data were pooled due to relatively small quantity, and for the years 2015–2020 wherein six-year data were pooled), the counts rose from 67 for the years 1965–1969 to 1,479 for 2005–2009. The drop in the number of studies in the last ten years, 1,065 for 2010–2014 and 613 for 2015–2020, might be due to a delay in reporting. While the numbers in both ph-RCTs and nph-RCTs increased, the contrast in the proportions between ph-RCTs and nph-RCTs is worth mentioning. During the years 1965–1969, the proportion of nph-RCTs (10.4%) was much lower than that of ph-RCTs (89.6%). The rate of increase of the nph-RCTs was greater than that of the ph-RCTs. During 2005–2009, the proportions of the nph-RCTs (48.4%) were almost even with ph-RCTs (51.6%). During the years 2015–2020, nph-RCTs represented the majority of mental health RCTs, with a proportion of almost 80% (refer to Figure 2 and Supplementary Table 1).
Overall, the proportion of ph-RCTs versus nph RCTs within each category of mental disorders differed widely. The proportion of ph-RCTs was more than double that of nph-RCTs among patients with schizophrenia (71.5% vs. 28.5%) and mood disorders (69.7% vs. 30.3%). Conversely, the proportions were 33.8% for the ph-RCTs and 66.2% for the nph-RCTs for neurotic/stress-related disorders, and 18.3% versus 81.7% in all other mental disorders RCTs (refer to Figure 3 and Supplementary Table 1).
The median sample size for all 6,652 mental health RCTs was 61 (interquartile range [IQR]: 36–131), with 61 for ph-RCTs (IQR: 37–145.75) and 60 for nph-RCTs (IQR: 33–121.25). Over time, the median fluctuated, but with an overall increasing trend over the past 60+ years, from less than 50 before the 1990s to around 70 after the 2000s (refer to Figure 4(a) and Supplementary Table 2). Comparing the sample size over time between the ph-RCTs and nph-RCTs, the ph-RCTs had larger median sample sizes than the nph-RCTs in general. The larger median for nph-RCTs before 1975 could be due to a bias because of the relatively small number of nph-RCTs compared to ph-RCTs.
There were 3,154 RCTs (47.4%) using dichotomous outcome measures, with a median OR 1.88 (IQR: 1.30–3.59). The remaining 3,498 RCTs used continuous outcome measures, with a median of 0.37 (IQR: 0.17–0.73) for SMDs (refer to Table 1). To compare the two types of outcome measures, ORs were further harmonized to SMD, and all the harmonized SMDs were converted to Pearson’s r. The median effect size of all the 6,652 mental health RCTs was 0.36 (IQR: 0.16–0.72) for SMD and 0.18 (0.08–0.34) for Pearson’s r (refer to Table 1).
Associating the effect sizes with sample sizes, we found that the smaller-scale RCTs (sample size <61) had larger median effect sizes (Pearson’s r: 0.23, IQR: 0.10–0.41) than those of the larger-scale RCTs (sample size ≥ 61, median Pearson’s r: 0.14, IQR: 0.06–0.26) (refer to Table 1).
Over time, in general, the mental health RCTs reported in the earlier years (before 1995) had larger median effect sizes than those in the later years, with the median of Pearson’s r being above 0.2 for the RCTs reported before 1995 (except for the years 1980–1984), and below 0.2 for those reported after 1995 (refer to Table 1).
Differences in the median effect sizes among RCTs grouped by the categories of targeted mental disorders were also noted. Schizophrenia had the most RCTs with a median Pearson’s r effect size of 0.17. Mood disorder had the second largest number of RCTs and a median of 0.15 Pearson’s r effect size. Neurotic/stress-related mental disorders had the third highest number of RCTs with the highest median Pearson’s r effect size of 0.23 (refer to Table 1).
Comparing the effect sizes by intervention types, the median of Pearson’s r was smaller for ph-RCTs (0.16, IQR: 0.07–0.32) in comparison to nph-RCTs (0.19, IQR: 0.08–0.36) (refer to Table 1). It should be noted that 64.3% of the ph-RCTs chose dichotomous measures and 35.7% of the ph-RCTs used continuous outcome measures. Among nph-RCTs, only 30.0% used dichotomous measures, while 70.0% chose a continuous outcome measure (refer to Table 1).
Further analyses were performed to compare the temporal trends in effect sizes between ph-RCTs and nph-RCTs (refer to Figure 4(b) and Supplemental Table 3). In the past (over 60 years), nph-RCTs had a larger median effect size than ph-RCTs most of the time. Some exceptions were noted; for example, during the years 1970–1974, nph-RCTs had a smaller median effect size than ph-RCTs, and during the years 2010–2014 and 2015–2020, the median effect sizes were at the same level for the two intervention types of RCTs.
We next compared the effect sizes between the ph-RCT and nph-RCT in the major mental disorder categories (refer to Figure 3 and Supplementary Table 4). The nph-RCTs had a greater median Pearson’s r than the ph-RCTs for all the top three mental disorders that had the most RCTs, schizophrenia, mood disorders, and neurotic/stress-related disorders. In contrast, ph-RCTs had larger median effect sizes than nph-RCTs in organic mental disorders and other mental disorders (refer to Figure 3 and Supplementary Table 4).
Originally designed for testing the efficacy of medications, RCTs have been widely used in all medical fields, including mental health and psychiatry. The complexity of mental disorders and the absence of objective (e.g., biologic) diagnostic tests for mental disorders and outcomes for interventions for treating mental disorders makes mental health RCTs special. In this study, we provided an overview of mental health RCTs with regard to the number of RCTs, sample size, and effect size and further examined the differences in terms of the year of publication, intervention types, and mental disorder categories. Due to the heterogeneity in the interventions and disease categories, we did not try to combine data with meta-analysis approaches but presented descriptive data for the distributions of sample sizes and effect sizes.
With the data collected from the CDRS for 6,652 mental health RCTs, we showed that over time, the number of mental health RCTs increased exponentially from 1965 to 2009, reaching a peak in the years 2005–2009. The decline in recent years (2010–2020) may be due to a delay in reporting. We believe that this overall increasing trend could mainly be attributed to two reasons: a steady increase in government funding support in the mental health field; for instance, the NIH budget increased from $10 billion to $40 billion from 1995 to 2020,25 and RCT gained popularity in research designs. Since the publishing of the first RCT (Streptomycin in Tuberculosis trial) in BMJ,1 more than half a million RCTs have been published globally.26 It has become the preferred method in the health sciences for deciding whether a treatment or other health intervention works better than alternative treatments. As this study focused only on RCTs, we also observed this phenomenon.
We showed that in the earlier years, ph-RCTs comprised a majority of mental health RCTs. However, the proportion of nph-RCTs increased more quickly over time, from 10.4% during the years 1965–1969 to almost equal-matched with the ph-RCTs during the years 2005–2009, then markedly exceeding them after 2010. The upturn in the proportion indicated the burgeoning acceptance of nph-RCTs, which may suggest the rising popularity of non-pharmacological interventions over time in mental health research.
The median effect size of the overall mental health RCTs was 0.18 as measured with Pearson’s r, which was low according to Cohen.27 The frequent use of Treatment As Usual as the control group in mental health trials may contribute to the observed lower effect sizes.28 The median effect size of nph-RCTs (0.19) was slightly larger than that of ph-RCTs (0.16). As discussed by Huhn,14 there are fundamental differences in the methodology of psychotherapy and drug trials that might have an impact on the effect sizes of mental health RCTs. We speculate that these differences may exist, in general, between ph-RCTs and nph-RCTs. First, small-study biases were more common in nph-RCTs than in their pharmaceutical counterparts, which generally led to a greater effect size.29 The median sample sizes were 61 for ph-RCTs and 60 for nph-RCTs. Over time, there was a general trend that ph-RCTs had a larger median sample size than nph-RCTs. The difference in sample sizes may also result from imbalanced funding compared to ph-RCTs as nph-RCTs are sponsored by powerful pharmaceutical companies.14 Second, the implementation of nph-RCTs was likely by inventors of the interventions, and therapists of the trials were often well-trained experts, which might positively influence the effect size.30 Third, blinding is difficult or impossible to apply to most nph-RCTs.31
Some limitations of this study must be considered when interpreting the results. First, data were collected from a single source: the CDSR. The CDSR is well-known as a leading journal and database for systematic reviews and meta-analyses of healthcare trials. Several publications have used a methodology similar to that used in this study to extract clinical trial data from the CDSR.20,22 While we believe that our data are representative, they are nowhere near exhaustive. Second, the intervention type and disease category data for RCTs were collected at the review level, instead of the study level; hence, the data were dependent on the correct documentation and accuracy of the systematic review and meta-analysis. Third, there would be a delay in the time from the completion of the trials to publication, and then to data collected by systematic reviews and the publication of the reviews. Hence, the temporal trends in recent years may not reflect the real world of mental health RCTs. Fourth, we hope to provide a general overview of mental health RCTs. However, due to the heterogeneity and breadth of nph-RCT for mental health conditions, the comparisons of effect sizes with ph-RCT may be problematic. Future studies may need to differentiate among the nph-RCT (e.g., brain stimulations, psychotherapies, other psychosocial interventions, or process of care interventions).
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Wei Wang https://orcid.org/0000-0002-3078-5545
Supplemental material for this article is available online.
1Biostatistics Core, Centre for Addiction and Mental Health, Toronto, ON, Canada
2Center for Complex Interventions, Centre for Addiction and Mental Health, Toronto, ON, Canada
3College of Public Health, University of South Florida, Tampa, FL, USA
Corresponding author:Wei Wang, Biostatistics Core, Centre for Addiction and Mental Health, 1001 Queen Street West, Toronto, ON, Canada.Email: wei.wang@camh.ca