To determine the accuracy for progressing records to full-text screening using one vs two reviewers to screen title and abstracts in 3 reviews of the effectiveness of interventions for chronic primary low back pain. Secondary objectives include computing inter-rater reliability, describing misclassified records and reviewer performance across reviews, and conducting sensitivity analysis limited to English records and falsely excluded records. One reviewer screened title and abstracts using standardized eligibility criteria and results were compared to consensus screening from two reviewers. We computed sensitivity, specificity, positive (PPV) and negative predictive values (NPV) with 95% confidence intervals using the two reviewers as the comparison. We calculated the inter-rater reliability, proportion of misclassified citations, and the reasons for misclassification. We conducted sensitivity analyses by restricting the analysis to English records. The sensitivity of one reviewer ranged from 48.8% to 66.3% and the specificity ranged from 88.0% to 93.3%. The PPV ranged from 40.6% to 51.8% and NPV 93.6% to 95%. The inter-rater reliability ranged from 0.39 to 0.50. Between 5.0% and 6.3% of records were misclassified as false negative by a single reviewer. Reasons for misclassification were primarily related to the assessment of relevant interventions and comparators, such as whether the intervention could be isolated. Our sensitivity analysis showed that screening English records only compared to all languages improved sensitivity and PPV, with no change in specificity and NPV. Using a single reviewer to screen titles and abstracts may lead to the exclusion of eligible records during title and abstract screening in rapid reviews of the literature. We caution against using Kappa alone as an indicator of the quality of screening, as it is influenced by classification imbalances and suggest including accuracy measures to describe the potential for differences between reviewer screening classifications. This study investigated whether one reviewer can accurately screen research articles for inclusion in a systematic review, compared to the usual approach of having 2 people do the screening. This was tested in three reviews of common treatments for chronic primary low back pain. The single reviewer who screened titles and abstracts was likely to miss relevant articles that were identified as relevant by 2 reviewers. However, the single reviewer was good at correctly excluding irrelevant articles. Between 5% and 6% of eligible articles were incorrectly excluded by the single reviewer. Most mistakes happened when the single reviewer was uncertain about a treatment's eligibility. Limiting screening to English language articles slightly improved the accuracy of the screening but it did not eliminate the risk of missing relevant research. Since artificial intelligence was used to translate Chinese studies to English, further research on the usefulness for this approach is warranted. In summary, restricting screening of articles to one reviewer may save time, but it increases the probability that important evidence will be overlooked. Researchers should be cautious about relying on a single reviewer and should use additional quality assurance to limit bias.
使用 AI 将内容摘要翻译为中文,便于快速阅读
使用 AI 分析这篇文章的核心发现、关键要点和深度见解
由 DeepSeek AI 提供分析 · 首次使用需配置 API Key
PubMed · 2026-06-06
PubMed · 2026-01-01
PubMed · 2026-06-01
PubMed · 2026-06-05