搜索 — ResearchTracker

Preprints-scientific manuscripts shared publicly prior to formal peer review-are gaining momentum across academic disciplines. However, their adoption in clinical and biomedical sciences remains limited, particularly in countries where traditional publishing norms prevail. Editorial ambiguity and a lack of national policy further complicate their use. This study aimed to assess the awareness, experiences, and attitudes of medical academics at Marmara University School of Medicine toward preprints and to explore the editorial landscape through both journal editor feedback and a review of journal-level preprint policies. A cross-sectional survey was conducted with 103 medical faculty members. The questionnaire included demographic questions, Likert scale items, and multiple-choice items assessing knowledge, familiarity, and attitudes toward preprints, as well as open-ended items to explore concerns. A "preprint test score" (0-4) was developed to quantify objective knowledge. Subgroup analyses were conducted by age (<40 vs ≥40 y) and academic discipline (basic vs clinical sciences). Additionally, all responses to open-ended questions from journal editors and 118 biomedical journals were manually reviewed for their stated stance on preprints and article processing charges (APCs). A convergent mixed methods design was used, combining a structured survey, thematic analysis of open-ended responses and editorial feedback, and a document-based review of biomedical journal policies. Only 42.9% (n=34) of participants reported familiarity with the concept of preprints, and 13% (n=10) had previously published on a preprint server. Misconceptions about ethics, peer review, and compatibility with journal policies were common. Subgroup analysis revealed that older participants scored higher on the "preprint test" (mean 2.20, SD 1.31 vs mean 1.97, SD 1.60) and had more experience with preprint publishing (1/40, 2.5% of younger participants; 7/29, 24.1% of older participants). Further, younger academics expressed less openness toward future use (n=7, 17.5% in the younger group; n=8, 27.6% in the older group). Clinical faculty were generally more hesitant than basic science faculty, although both groups raised concerns about the academic recognition of preprints. Editorial responses reflected a mix of cautious endorsement and skepticism. Among the 118 biomedical journals reviewed, most lacked clear preprint policies, while a small number either explicitly prohibited or permitted them. There is limited awareness and cautious engagement with preprints among medical academics and editors in Türkiye. Generational and discipline-based differences further influence knowledge and attitudes. The lack of clear editorial guidance from biomedical journals may reinforce academic uncertainty. Tailored educational initiatives, transparent journal policies, and institutional support will be essential to foster a more open and inclusive scientific publishing environment.

The Performance of DeepSeek R1 and Gemini 3 in Complex Medical Scenarios: Comparative Study.

PubMed2026-04-27作者：Bajwa M, Hoyt R, Knight D

Generative artificial intelligence models, especially reasoning large language models (LLMs), are gaining adoption in health care for diagnostic decision support and medical education. DeepSeek R1 is a reasoning LLM that generates extended chain-of-thought explanations to make its decision-making process more explicit. Traditional medical benchmarks often lack complexity and authenticity, motivating the adoption of scenario-rich datasets, such as the Massive Multitask Language Understanding Pro (MMLU-Pro) professional medicine subset, which provides multispecialty clinical vignettes for reasoning-centric evaluation. The objective of this study is to assess the diagnostic accuracy, reasoning quality, reasoning transparency, and practical usability of DeepSeek R1 and Gemini 3 Pro across closed- and open-ended clinical scenarios, with the intention of guiding their prospective application in practical clinical education and training. This evaluation was conducted by analyzing 162 diverse medical scenarios (both closed- and open-ended) from the MMLU-Pro health subset. In a 2-phase, dual-model evaluation, DeepSeek R1 and Gemini 3 Pro were applied to 162 matched clinical vignettes from the MMLU-Pro professional medicine subset spanning 21 specialties. Closed-ended, multiple-choice, and open-ended prompts were constructed for the same scenarios, and model outputs were coded for accuracy, reasoning steps, and citation behavior; descriptive statistics and the McNemar test were used to compare performance across formats. DeepSeek R1 achieved an accuracy of 86.4% (140/162 scenarios) on closed-ended tasks and 80.9% (131/162) on open-ended questions across 162 clinical scenarios, indicating modest attenuation of performance when answer cues were removed. Gemini 3 Pro demonstrated 90.7% (147/162) closed-ended and 88.9% (144/162) open-ended accuracy on the same scenarios, showing a similar pattern of decreased performance without answer options. Error analysis indicated that incorrect answers typically involved longer reasoning chains, suggesting overthinking. In a structured review of open-ended responses, DeepSeek R1 produced an average of 18.7 (range 0-52) references per case, with 5.2 unrelated references and 13.1 (range 3-67) reasoning steps, whereas Gemini 3 Pro averaged 22.5 (range 12-50) references, 1.9 (range 0-8) unrelated references, and 4.4 (range 1-10) reasoning steps per case. DeepSeek R1 demonstrated moderate-to-excellent accuracy and reasoning in evaluating both closed- and open-ended medical scenarios. In parallel, Gemini 3 Pro showed broadly comparable but distinct performance and reasoning patterns. While the closed-ended format may inflate accuracy due to cueing, the open-ended evaluation yielded richer insights into the fidelity of reasoning. Side-by-side evaluation of two large reasoning models highlights the importance of format, specialty, and citation behavior when considering clinical and educational use. Continued validation across a wider range of specialties and real-world contexts will enhance the model's trustworthiness for diagnostic and teaching applications.

搜索结果：JMIRx med

Awareness, Experiences, and Attitudes Toward Preprints Among Medical Academics: Convergent Mixed Methods Study.

The Performance of DeepSeek R1 and Gemini 3 in Complex Medical Scenarios: Comparative Study.

Development of a Conversational Artificial Intelligence-Based Web Application for Medical Consultations: Prototype Study.

COVID-19 Pneumonia Diagnosis Using Medical Images: Deep Learning-Based Transfer Learning Approach.

Automating Individualized Notification of Drug Recalls to Patients: Complex Challenges and Qualitative Evaluation.

Development and Assessment of a Point-of-Care Application (Genomic Medicine Guidance) for Heritable Thoracic Aortic Disease.

Investigating the Variable Component of the Systematic Error, a Neglected Error Parameter: Theoretical Reevaluation Study.

Associations Between IT Job Stressors and Anxiety, Depression, and Stress: Cross-Sectional Study.

Assessing the Limitations of Large Language Models in Clinical Practice Guideline-Concordant Treatment Decision-Making on Real-World Data: Retrospective Study.

Data Obfuscation Through Latent Space Projection for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection.

Use of a Specialist Telephone Consultation Line for Long COVID in Primary Care in British Columbia: Retrospective Descriptive Quality Improvement Study.

Effects of Interventions for the Prevention and Management of Maternal Anemia in the Advent of the COVID-19 Pandemic: Systematic Review and Meta-Analysis.

Impact of a Point-of-Care Ultrasound Training Program on the Management of Patients With Acute Respiratory or Circulatory Failure by In-Training Emergency Department Residents (IMPULSE): Before-and-After Implementation Study.

Improved Alzheimer Disease Diagnosis With a Machine Learning Approach and Neuroimaging: Case Study Development.

Applications of Indocyanine Green in Breast Cancer for Sentinel Lymph Node Mapping: Protocol for a Scoping Review.

Telerehabilitation in the management of urinary incontinence in women: a narrative review.

Economic Evaluations of Algorithm-Enabled Remote Monitoring of Adults With Cardiac Implantable Electronic Devices: Protocol for a Systematic Review.

Interpreting the Estimand Framework From a Causal Inference Perspective.

Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method.

Chaotic and Stochastic Components in an Influenza Surveillance Series: Nonlinear Dynamics and Predictive Modeling Study.