学术论文

[Evaluation of Chinese artificial intelligence large language models in oral mucosal disease consultation].

来源：PubMed发布日期：2026-03-09作者：Zhang HT, Sun WX, Li XY, Dan HX, Chen QM0 次点击

Zhonghua kou qiang yi xue za zhi = Zhonghua kouqiang yixue zazhi = Chinese journal of stomatology

内容摘要

Objective: To investigate the current application status and potential of artificial intelligence (AI) large language models (LLM) in oral mucosal disease consultation. Methods: A questionnaire survey was conducted to inform the utilization of AI for oral mucosal disease-related consultations among patients attending the Department of Oral Medicine, West China Hospital of Stomatology, Sichuan University in November 2025, and to compare the factors influencing AI usage behavior and satisfaction. Nine standardized clinical questions concerning the etiology, symptoms, treatment, care, and prognosis of oral leukoplakia (OLK) were put into major LLM platforms. The responses were quantitatively scored by ten oral medicine specialists for accuracy, clarity, relevance, completeness, and practicality using the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool. Concurrently, the readability of the responses was assessed using the Alpha Readability Chinese (ARC) tool. Results: A total of 200 patients with oral mucosal diseases were included. Only 37.5% (75/200) had ever used AI for related consultations. AI usage rate was significantly correlated with younger age and higher education level (P<0.001). Merely 40.0% (30/75) of users were relatively satisfied with current AI consultations, and only 21.3% (16/75) would adopt AI's treatment or care suggestions. However, 96.0% (72/75) expressed positive willingness to continue using AI for future consultations. Based on the QAMAI total scores for the nine typical OLK-related clinical questions, DeepSeek (25.4 points) and Tencent Hunyuan (25.3 points) performed best, rated as "very good quality", while the other models were rated "good quality." All models scored relatively low on the "sources and references" dimension. ARC readability analysis indicated that ByteDance Doubao had the best readability (weighted total score 0.511), while DeepSeek and Tencent Hunyuan had relatively poor readability (0.358 and 0.369, respectively). Conclusions: This study indicates that while current usage rates and satisfaction with AI consultation among patients with oral mucosal diseases need improvement, the future willingness to use it is strong. The systematic evaluation of six mainstream Chinese LLMs reveals significant disparities in their professional information quality and text readability for OLK consultation, alongside a prevalent lack of reliable evidence-based support. This underscores that enhancing the comprehensive quality of AI-generated responses is crucial for realizing its clinical application value. 目的： 探讨人工智能（AI）大语言模型（LLM）在口腔黏膜病疾病咨询中的应用现状与潜力。 方法： 通过问卷调查的形式了解2025年11月于四川大学华西口腔医院口腔黏膜病科就诊患者使用AI进行口腔黏膜病相关咨询的情况，并对AI使用行为和满意度的影响因素进行比较。将收集的口腔白斑病患者关注的病因、症状、治疗、护理、转归等方面的9个临床问题进行标准化后输入6种主流LLM平台（Kimi、字节豆包、通义千问、文心一言、腾讯混元、DeepSeek）获取回答，根据医学AI质量分析工具（QAMAI）的要求，由10名口腔黏膜病专科医师对获得回答的准确性、清晰度、相关性、完整性和实用性进行量化打分；同时，使用汉语文本可读性工具ARC评估回答的可读性。 结果： 本研究共纳入200例口腔黏膜病患者，仅37.5%（75/200）曾使用过AI进行口腔黏膜病相关咨询。AI使用率与年龄、受教育程度显著相关（P<0.001）。仅40.0%（30/75）的患者对当前AI咨询较为满意，仅21.3%（16/75）的患者会采纳AI的治疗或护理建议，而96.0%（72/75）的患者对未来继续使用AI进行咨询表现出积极意愿。与口腔白斑病相关的9个典型临床咨询问题的QAMAI总分显示，DeepSeek（25.4分）和腾讯混元（25.3分）表现最佳，评价等级为“质量非常好”，其余模型则被评为“质量良好”。在“来源与参考文献”维度，各模型普遍得分较低。ARC可读性分析显示，字节豆包的可读性最佳（加权总分0.511分），而DeepSeek和腾讯混元的可读性相对较差（分别为0.358和0.369分）。 结论： 当前口腔黏膜病患者对AI咨询的使用率与满意度均有待提升，但其未来使用意愿强烈；6种主流中文LLM在口腔白斑病咨询中的专业质量与文本可读性存在显著差异，且普遍缺乏可靠的循证支持，提示提升AI回答的综合质量是实现其临床应用价值的关键。.

中文翻译

使用 AI 将内容摘要翻译为中文，便于快速阅读

使用 AI 分析这篇文章的核心发现、关键要点和深度见解

由 DeepSeek AI 提供分析 · 首次使用需配置 API Key

[Evaluation of Chinese artificial intelligence large language models in oral mucosal disease consultation].

内容摘要

中文翻译

相关推荐