搜索 — ResearchTracker

Conversational Agents (CAs, chatbots) are systems with the ability to interact with users using natural human dialogue. While much of the research on CAs for sexual health has focused on adult populations, the insights from such research may not apply to CAs for youth. The study aimed to comprehensively evaluate the state-of-the-art research on sexual health CAs for youth. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we synthesized peer-reviewed studies specific to sexual health CAs designed for youth over the past 14 years. We found that most sexual health CAs were designed to adopt the persona of health professionals to provide general sexual and reproductive health information for youth. Text was the primary communication mode in all sexual health CAs, with half supporting multimedia output. Many sexual health CAs employed rule-based techniques to deliver pre-written expert knowledge on sexual health; yet most sexual health CAs did not have the safety features in place. While youth appreciated accessibility to non-judgmental and confidential conversations about sexual health topics, they perceived current sexual health CAs pro

WHBench: Evaluating Frontier LLMs with Expert-in-the-Loop Validation on Women's Health Topics

arXiv2026-03-11作者：Sneha Maurya, Pragya Saboo, Girish Kumar

Large language models are increasingly used for medical guidance, but women's health remains under-evaluated in benchmark design. We present the Women's Health Benchmark (WHBench), a targeted evaluation suite of 47 expert-crafted scenarios across 10 women's health topics, designed to expose clinically meaningful failure modes including outdated guidelines, unsafe omissions, dosing errors, and equity-related blind spots. We evaluate 22 models using a 23-criterion rubric spanning clinical accuracy, completeness, safety, communication quality, instruction following, equity, uncertainty handling, and guideline adherence, with safety-weighted penalties and server-side score recalculation. Across 3,102 attempted responses (3,100 scored), no model mean performance exceeds 75 percent; the best model reaches 72.1 percent. Even top models show low fully correct rates and substantial variation in harm rates. Inter-rater reliability is moderate at the response label level but high for model ranking, supporting WHBench utility for comparative system evaluation while highlighting the need for expert oversight in clinical deployment. WHBench provides a public, failure-mode-aware benchmark to trac

搜索结果：Current women's health reports

Current Trends and Future Directions for Sexual Health Conversational Agents (CAs) for Youth: A Scoping Review

WHBench: Evaluating Frontier LLMs with Expert-in-the-Loop Validation on Women's Health Topics

A Women's Health Benchmark for Large Language Models

The opportunities and risks of large language models in mental health

Reducing Large Language Model Safety Risks in Women's Health using Semantic Entropy

Exploring the Relationship Between COVID-19 Induced Economic Downturn and Women's Nutritional Health Disparities

The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems

Predicting pregnancy using large-scale data from a women's health tracking mobile application

Privacy and Security of Women's Reproductive Health Apps in a Changing Legal Landscape

Optimal Control for Remote Patient Monitoring with Multidimensional Health States

Analysing Health Misinformation with Advanced Centrality Metrics in Online Social Networks

Hearing Health in Home Healthcare: Leveraging LLMs for Illness Scoring and ALMs for Vocal Biomarker Extraction

Global Public Health Surveillance using Media Reports: Redesigning GPHIN

Technology in Association With Mental Health: Meta-ethnography

Redesigning Electronic Health Record Systems to Support Developing Countries

LLM on FHIR -- Demystifying Health Records

Vaccine Hesitancy on YouTube: a Competition between Health and Politics

The Impact of Medicaid Coverage on Mental Health, Why Insurance Makes People Happier in OHIE: by Spending Less or by Spending More?

"Who Has the Time?": Understanding Receptivity to Health Chatbots among Underserved Women in India

Semiparametric time to event models in the presence of error-prone, self-reported outcomes - With application to the women's health initiative