The 4Kscore® Test (OPKO Diagnostics, Woburn, MA), also known in the scientific literature as the four-kallikrein panel, is a commercially available, prebiopsy blood test to predict a patient’s risk of high-grade prostate cancer (PCa) should the biopsy be performed. The 4Kscore Test incorporates four kallikrein protein measurements and clinical information consisting of age, DRE result, and history of prior negative biopsy result. The four kallikrein biomarkers include three different isoforms of PSA (total, free, and intact PSA), and human kallikrein-related peptidase 2 (hK2). The laboratory information from the four kallikrein measurements is combined with the clinical information by the 4Kscore Test algorithm and provides the patient’s individual risk or probability for high-grade PCa. The information provided by the 4Kscore Test is used by men with either a suspicious PSA level or DRE result for shared decision making with their physicians regarding whether to proceed with a prostate biopsy.
The 4Kscore Test algorithm was originally developed and validated in retrospective studies of serum samples collected from cohorts that were part of the European Research Study on Prostate Cancer (ERSPC).1 This algorithm (referred to in this study as the ERSPC algorithm) was later revised to the contemporary 4Kscore Test algorithm to account for fundamental changes in the standard of care occurring after the ERSPC samples were collected. These changes impacted both the detection rate and grading of PCa (the transition from sextant to 10-core or higher prostate biopsy, and changes in the pathologic definition of high-grade PCa).2 These changes were validated in a prospective US validation study,3 a 26-center study that demonstrated an area under the curve (AUC) of 0.82 for discriminating the presence of high-grade PCa with nearly perfect calibration of the risk predicted by the 4Kscore Test and the actual prostate biopsy results. Subsequent US4, and European5,6 validation studies also demonstrated similarly high AUC results.
The 4Kscore Test has been included in the National Comprehensive Cancer Network Prostate Cancer Early Detection guidelines since 2015.7 In a retrospective clinical utility study, Konety and colleagues8 demonstrated that the use of the 4Kscore Test in actual clinical practice resulted in a 65% biopsy rate reduction in a patient cohort referred for suspicion of PCa by conventional screening methods. The study also showed that physicians and patients were more inclined to proceed with a prostate biopsy for those patients with the higher 4Kscore Test results (≥20%) and less inclined to biopsy men with low 4Kscore Test results (<7.5%). Stattin and associates9 found that men with an elevated PSA level and a low 4Kscore Test (<7.5%) have a very low 20-year risk of developing metastatic PCa, and thus could safely avoid a prostate biopsy. Voigt and colleagues10 showed that reducing unnecessary prostate biopsies based on the 4Kscore Test could result in fewer negative biopsy results, fewer diagnoses of low-grade PCa, and reduced treatment or active surveillance of men with low-grade cancer, while still maintaining a high overall detection rate for high-grade PCa. These findings demonstrate that implementation of the 4Kscore Test can provide both improvement in patient care and substantial savings in healthcare costs.
The 4Kscore Test has been the subject of numerous clinical studies in Europe and the United States. Herein we report a systematic review and meta-analysis to evaluate the performance of the 4Kscore Test in the prebiopsy setting across all eligible clinical validation studies and to evaluate the heterogeneity of the 4Kscore Test performance across these studies.
This meta-analysis was conducted following the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA).11 We carried out a systematic literature search on PubMed for all articles using the search terms kallikrein panel and prostate cancer; 4Kscore Test; or four kallikrein and prostate cancer. The search results included all studies published through June 30, 2017. Studies that fulfilled all the following criteria were included in the meta-analysis: (1) case-control or cohort studies, (2) a prespecified 4Kscore Test algorithm used as the diagnostic test, and (3) the AUC for high-grade PCa upon biopsy was reported. One additional 366-patient clinical study, a prospective, multicenter, validation of the 4Kscore Test, was also identified from a recent academic conference presentation and was included in the meta-analysis.4 The results of the literature search were also confirmed by one of the academic researchers involved with the test to ensure that the systematic review was inclusive of all eligible 4Kscore Test clinical studies (A. Vickers, personal communication).
We then conducted a meta-analysis of the eligible publications using the AUC for high-grade PCa reported in each study. All studies reported AUCs from models either including DRE, without DRE, or both types of models (Table 1). The 4Kscore Test applied in the current clinical practice accepts DRE status of “positive,” “negative,” or “not available.” Therefore, in this meta-analysis, for those studies that reported AUCs from both models (with and without DRE included), we chose the AUC from the model that included DRE. Standard errors were calculated from the 95% CI reported for the AUCs in the eligible studies. We calculated the pooled AUC based on both the fixed-effects model and the random-effects model, and assessed the heterogeneity among the studies. Sensitivity analyses were performed by repeating the calculation of the pooled AUC and heterogeneity on (1) the prespecified subgroup of studies using the contemporary 4Kscore Test algorithm, (2) all studies and the subgroup of contemporary 4Kscore Test algorithm studies after excluding one study with obviously outlying AUC results, and (3) excluding the only study that is not currently published in a full article.
The key word literature search identified 60 articles. After removing duplicates, 45 articles remained. Of these 45 articles, 27 were excluded because they are not clinical study articles. Seven clinical study articles did not pass the prespecified meta-analysis criteria and therefore were excluded (Figure 1), including the first 4Kscore Test clinical study published in 2008.12 This study reported on the development of the ERSPC 4Kscore Test algorithm, but was not a validation study. Eventually, 11 eligible articles were retrieved from the literature search and 1 additional clinical study was identified from another source.4 These 12 studies included a total of 16,927 patients from the United States and 5 different European countries; among them, 11,134 patients were involved in the validation of a prespecified 4Kscore Test algorithm (Table 1).
Of the 12 studies, 5 studies used the 4Kscore Test models that included the DRE information, 2 studies used 4Kscore Test models that did not include the DRE information, and 5 studies included both types of models (Table 1). For the five studies with two models, the differences in AUC between the two models were very small, and we used the AUC of the model including DRE information in the meta-analysis.
The pooled AUCs and heterogeneity analysis of all studies and subgroups are summarized in Table 2, and demonstrate that the pooled AUC is between 0.80 and 0.82 for all studies and subgroups. Specifically, the pooled AUC of the 4Kscore Test across all 12 clinical validation studies was 0.81 (fixed effects 95% CI, 0.80-0.83; random effects 95% CI, 0.79-0.83). The heterogeneity across studies was significant (p = 0.001). One study that utilized the STHLM2 cohort13 appeared to be an outlier, with a much lower AUC (0.72; 95% CI, 0.67-0.77) than the other 11 studies (AUC range, 0.78-0.87). Exclusion of this study eliminated heterogeneity (p = 0.08) and led to an AUC of 0.82 (fixed effects 95% CI, 0.81-0.83; random effects 95% CI, 0.80-0.84). The ERSPC and contemporary 4Kscore Test studies were conducted over a 20-year timeframe on very different patient populations; however, after excluding a single study,13 the remaining clinical studies showed highly consistent AUCs (Figure 2).
Due to the differences between the ERSPC and contemporary 4Kscore Test algorithm, we conducted further prespecified subgroup analysis on the six studies (n = 5019) that used the contemporary 4Kscore Test algorithm. The pooled AUC of these newer studies was not significantly changed compared with the pooled AUC for all 12 studies (fixed-effects AUC = 0.81, 95% CI, 0.79-0.83; random-effects AUC = 0.80, 95% CI, 0.76-0.84). However, the heterogeneity due to the study by Nordström and coworkers13 stood out as an outlier (Cochrane Q test p = 0.001); again, excluding this study eliminated heterogeneity (p = 0.21) and yielded an AUC of 0.82 (95% CI, 0.80-0.84 for both fixed-effects and random-effects models).
At the time of this publication, one study published by Punnen and coworkers4 had only appeared as a conference abstract. When this study was removed, there was no impact on pooled AUC or the heterogeneity conclusions (Table 2).
The 4Kscore Test has been validated in multiple European studies and the prospective US validation study (Table 1) and was the subject of two meta-analysis studies14,15 and several reviews.1,16-19 Our systematic meta-analysis is unique, as it encompassed all studies that were included in previous reviews or meta-analyses, in addition to new studies through 2017. Our meta-analysis is fundamentally different from that of Vickers and coworkers14 in terms of the scope of patient population and evaluation method. Their study focused on patients with PSA in the 10- to 25-ng/mL range and patients with abnormal results on DRE, whereas our study covers all patients who are subject to prostate biopsy. The study by Vickers and coworkers14 also used individual patient data for their meta-analysis, and our reported results are based on published data from individual studies, summarized using both fixed effects and random effects models.
Our current meta-analysis reported here is also fundamentally different from Russo and associates,15 whose conclusions on high-grade PCa were impacted by multiple deficiencies.20 Our analysis included two studies published before 2016 that Russo and associates15 failed to include, and the methodology used in this meta-analysis is substantially different from that used by Russo and associates.15 Instead of calculating pooled sensitivity and specificity based on arbitrarily assigned cutoff points, we evaluated the performance of the 4Kscore Test by calculating the pooled AUC based on the reported AUC and 95% CI of the AUC from each study. This strategy is preferred because the 4Kscore Test gives a continuous risk score from <1% to >95% that allows the physician and patient to act according to their own desired risk threshold. The predictive accuracy of the 4Kscore Test is therefore best evaluated by its AUC instead of the sensitivity and specificity at an arbitrary cutoff point.
A single published study of 531 men from the 26,712 STHLM2 cohort was found to be an outlier in our meta-analysis.14 This study concluded that there was no difference between the performance of the 4Kscore Test and another test, the Prostate Health Index (phi). However, our statistical analyses raise concerns about this conclusion.
The results of our meta-analysis demonstrate reliable discrimination of the 4Kscore Test for high-grade PCa across multiple cohorts in the United States and in Europe (Table 1). The three prospective studies using the contemporary 4Kscore Test algorithm in routine laboratory service yielded highly consistent AUC. The most recent study,4 which was performed in a cohort with a majority of African-American subjects, also resulted in an AUC consistent with the pooled AUC.
The pooled AUC of the 4Kscore Test for discrimination of high- grade PCa is above 0.80 for all the eligible studies and all subgroups in this meta-analysis. Despite the presence of one outlier study, the 4Kscore Test performance is highly consistent across the remaining 11 clinical validation studies involving over 10,000 subjects.
This research was sponsored by OPKO Diagnostics, Woburn, MA.