Currently, there is no standardized care for neuropalliative outpatients in Germany. Those who require specialized palliative care often receive it from a Specialized Outpatient Palliative Care (SOPC) team or hospice, but without regularly access to neurological expertise. The TANNE project elaborated hints for a cost-effective telemedical link between neuropalliative expertise (a Neuropalliative Telemedicine Center) and existing comprehensive palliative care structures such as SOPC team and hospice. Video consultations enable a joint assessment of symptoms and targeted therapy. The TANNE project aims to ensure that patients with neurological diseases or neurological symptoms in the palliative phase receive professional neurological/neuropalliative care at home or in hospices in a resource-efficient manner. A prospective, partly-cluster-randomized, two-arm intervention study with a delayed-start design was conducted between May 2021 and June 2023. The intervention group received video consultations whenever neurological problems occurred (event). The control group continued to receive treatment as usual. Primary endpoint was the change in symptom burden (iPOS - Integrated Palliative Outcome Scale) measured intra-individually before and after an event. Various secondary endpoints were assessed, namely patient's general well-being and patient's and professional's satisfaction with treatment. A total of 32 teams participated, recruiting 114 patients and registering 77 events. The primary endpoint showed a reduction in symptom burden of 2.6 (±4.15) points after teleconsultation, compared to 1.3 (±8.36) points in the control group (not statistically significant on a 5 percent level). This reduction was more pronounced in the 'Psychological and Practical Problems' subscale. High satisfaction scores with treatment and care were found in the intervention groups among patients and professionals. The teleconsultation evaluated in the TANNE project represents a form of interaction between neuropalliative expertise and specialized palliative care (SOPC team, hospice) that has not existed in this form and scope before. Due to insufficient number of cases in combination with teams' rejection to participate and additional withdrawals the projected primary endpoint could not be satisfied on a statistically relevant level. Nevertheless, results provide reliable points of reference for further research on the support of decision-making processes within SOPC and hospice teams in neurological cases through targeted teleconsultation services.
Quality of life (QoL) questionnaires are used in many disease areas to measure the burden that a disease causes for patients, which help provide insights into disease impact, identify unmet medical needs, and inform patient-centered drug development and value assessment for treatments. The collection of data imposes both a significant burden on patients as well as effort on health care personnel, thus incurring high costs for the health care system. Given that patients share detailed information about their condition and treatment experiences on social media and patient forums, an important research question is to what extent information about QoL can be obtained from patients' online forum posts to potentially complement information obtained from questionnaires. This study aimed to assess how much QoL information can be gained from the analysis of posts by patients in online health care communities and whether this information is rich enough to estimate individual patient's QoL based on their posts. We conducted this feasibility study in the context of breast cancer as it is the most prevalent cancer in the female population. We recruited 134 female patients diagnosed with breast cancer on the Inspire patient online forum, who voluntarily participated in our feasibility study. They filled in the EORTC (European Organisation for Research and Treatment of Cancer) QLQ-C30 and QLQ-BR23 questionnaires consisting of 30 general questions and 23 additional breast cancer-specific questions and provided consent to analyze their posts and comments on the online forum (756 posts and 19,478 comments). Posts were coded manually to identify parts of the text providing answers to 1 of the aforementioned 53 questions. The data annotation yielded a substantial agreement (mean Fleiss κ of 0.5, SD 0.28). Overall, we found answers in the coded data for 50 out of 53 EORTC QLQ-C30 and QLQ-BR23 questions. The information coded in the posts reliably predicted the answers given in the questionnaires (F1-score=0.7), with even better results when grouping similar questions (F1-score=0.8 for fine-grained and 0.9 for coarse-grained grouping). The 5 questions that were most frequently answered on the basis of the coded posts were "Did you feel ill or unwell?" (304 of 2683 annotated posts and comments), "Did you worry?" (105 posts and comments), "Have you had pain?" (104 posts and comments), "Did you feel tense?" (85 posts and comments), and "Were you limited in doing either your work or other daily activities?" (77 posts and comments). Our feasibility study shows that there is valuable QoL-related information in posts of online patient communities, which can potentially serve as an innovative low-burden QoL monitoring approach. Future research should consider how these insights can be used to complement existing QoL instruments and whether the process of extracting QoL-related information can be automated.
Many Nigerian adolescents lack knowledge about ideal oral hygiene practices, which has contributed to the high prevalence of poor oral health among them. Delivering oral hygiene education using innovative methods, such as board games associated with having fun while also learning, would help increase their understanding and adherence to these practices. A board game operates on the principle that knowledge is acquired and retained through repetition and interaction with peers. This paper highlights the development of a culturally tailored board game based on the Health Belief Model (HBM) and validated for promoting oral hygiene among adolescents. To report how a board game on oral hygiene education for adolescents was developed and validated in southwestern Nigeria. A Research and Development (R&D) framework, incorporating Design-Based Research (DBR) principles, was used to develop a board game containing oral hygiene messages. The messages were adapted from the World Health Organisation's (WHO) promoting Oral Health in Africa manual. This was based on the HBM constructs and tailored to fit the African context. Over a period of three months, the development of the oral hygiene education board game involved five community oral health professionals, a paediatric dentist, and a psychologist specialising in adolescent health from the University of Ibadan. Students of the Faculty of Dentistry of the University of Ibadan, a graphic designer, and an artist also contributed to the project. The board game was developed using English, the official language of Nigeria. In the validation of this tool, the ease of use, duration of play, number of players, and its relevance to this age bracket's daily activities were largely considered. A 20 by 20 inches stainless steel framed board game with an acrylic surface containing 100 small boxes, featuring black-themed oral hygiene graphical illustrations and oral hygiene messages inserted in some boxes, were developed. In addition, 10 cards of size 8.5 cm by 5.4 cm containing oral hygiene questions on one side and the answers on the other side, as well as five colour-coded laminated player identification cards, were also created. Two dice and a plastic cup for throwing the dice were procured. The oral hygiene messages, questions and answers focus on enhancing adolescents' knowledge, attitudes and practices regarding optimal oral hygiene measures in Southwestern Nigeria. Oral hygiene messages, questions and answers were modified accordingly to ensure they were age appropriate and effective for promoting oral hygiene education through a board game. The board game was designed to be colourful to increase its appeal and encourage play. The development of the board game was informed by the need for context-specific, age-appropriate tools to enhance oral hygiene education among adolescents. The design stages integrated culturally relevant content, simple language, and familiar visual elements to improve accessibility and relatability. Interactive components were incorporated to promote peer-to-peer learning and active engagement. The board game was structured for ease of implementation in school-based and community settings. While not yet evaluated through formal intervention, its design features suggest potential to support improved oral health awareness and behaviour among adolescents, particularly in low and middle-income contexts.
Pharmacotherapy and prescribing are core skills for physicians, and all medical graduates must master the basics. From a medical education perspective, it is important to understand factors that help students attain sufficient skills for safe and effective prescribing. In this study, we evaluated autograded pharmacotherapeutic quizzes for practice and summative assessments in two undergraduate clinical courses and explored medical students' views on educational components they considered helpful in learning to treat patients with medications. Pharmacotherapeutic quizzes were implemented in two steps across two course instances for two clinical courses (Psychiatry and Neurology, seventh semester in the medical programme at the University of Gothenburg, Gothenburg, Sweden). In step I, voluntary practice quizzes and a summative assessment test were introduced. In step II, clinical contexts for the quiz questions were provided, and the summative test was expanded. The students' achieved level of knowledge post-course was investigated before and after each step, using an anonymous voluntary knowledge evaluation test including 20 case-based single best answer (SBA) questions. Based on free-text replies to a concurrent questionnaire on students' views on learning pharmacotherapy, a manifest content analysis was performed, guided by the research question "What in their education do medical students consider important in enabling them to treat patients with medications?" Meaning units were extracted, and emergent categories and themes identified. In total, 274 out of 404 course participants took the knowledge evaluation test and completed the questionnaire (response rate: 68%; 56% women; 66% ≤24 years old). Compared with pre-quiz results (median correct answers out of 20 SBA questions = 10 (lower to upper quartile 9-13)), no difference was seen after step I (11 (8-13) correct answers; P = 0.88) but a clear improvement was seen after step II (14 (12-16); P < 0.0001). In the qualitative analysis, four themes emerged: Curriculum, Clinical placement, Theoretical teaching and Student responsibility. The second theme, including the categories Preparation, Participation and Performance, was particularly prominent. Elaborated quizzes about medications, for practice and summative assessment, may increase pharmacotherapeutic knowledge in medical students. The four emergent themes regarding what students consider important can guide future course developments.
ObjectivesWe assessed whether survey mode influenced reporting of sexual behaviors and psychosocial factors among men who have sex with men (MSM) in Kenya.MethodsIn cross-sectional analysis of baseline data from 493 MSM in Kisumu and Nairobi enrolled in a prospective cohort study, participants were randomized 1:1 to Computer-Assisted Personal Interview (CAPI) or Audio Computer-Assisted Self-Interview (ACASI). We compared responses across survey modes using Poisson regression with robust variance, adjusting for socio-demographics.ResultsIn both sites, CAPI users more frequently reported sex with a female partner. In Kisumu, CAPI users were less likely to report receptive anal intercourse. In Nairobi, CAPI users were less likely to report food insecurity, transactional sex, and STI symptoms.ConclusionWhile most responses were similar across modes, ACASI prompted higher reporting of sensitive behaviors, highlighting added value for capturing stigmatized and sensitive information. Offering both methods may enhance data quality and respect participant preferences. Different ways of asking survey questions affect the answers given for sensitive and stigmatized behaviorsPlain Language SummaryWe wanted to know if the way surveys are given changes how men who have sex with men answer sensitive questions about their lives and behaviors. We surveyed 493 men, who were randomly assigned to either answer questions face-to-face with an interviewer (CAPI) or privately on a computer with audio support (ACASI). We found that men answering face-to-face were more likely to report having sex with women, but less likely to report certain behaviors and challenges, such as receptive anal sex, food insecurity, transactional sex, and STI symptoms. Overall, most answers were similar, but the computer based method encouraged more reporting of sensitive and stigmatized behaviors. We recommend offering both methods to improve data quality and give participants a choice in how they share information.
ChatGPT is one of the most advanced large language models. We aim to examine ChatGPT-4o accuracy in solving radiobiology computational problems. Two board-certified radiation oncologists created a problem set consisting of 30 questions. We used OpenAI API to query the ChatGPT(model: GPT-4o) and generate corresponding answers. Answers were graded using a 3-score system. We conducted subgroup analysis for no prompts(zero-shot learning) and different prompts, questions with or without alpha-beta ratio, and question categories. ChatGPT correctly answered approximately 60% of questions without any prompting strategies. While ChatGPT demonstrates stable performance in structured calculation problems, particularly those involving alpha-beta ratios, it still exhibits notable limitations in handling multi-step reasoning and clinical decision-making tasks. This result highlights the need for integrating professional tools and refining prompting strategies to enhance their practical utility.
The complexity and rapidly evolving nature of critical patient care in Intensive Care Units underscore the importance of the accuracy and timeliness of nursing decisions, further highlighting the significance of nursing education. This study aims to examine the accuracy of four generative artificial intelligence tools (ChatGPT 5.0 Plus, ChatGPT 5.0, DeepSeek, and Google Gemini) in answering multiple-choice questions related to the intensive care nursing exam, a fundamental area in nursing education. In the study, the ChatGPT 5.0 Plus, ChatGPT 5.0, DeepSeek, and Google Gemini models were evaluated using a test data set consisting of 55 questions. The questions were classified according to their difficulty levels as easy (n = 16), medium (n = 17), and difficult (n = 22). The models' correct response rates and standard or unique correct/incorrect response distributions were examined. Computer-assisted statistical analysis used the Chi-square, one-way ANOVA, and Post-hoc Tukey tests. The study was reported according to STROBE. According to the study results, the success rates of all models were similar for easy and medium-level questions (70-82%), and the difference between them was not statistically significant (p > 0.05). Under difficult questions, however, the performance of the models diverged significantly, with Google Gemini achieving the highest success rate at 77.27% and DeepSeek showing the lowest performance at 45.45%. The chi-square analysis revealed no statistically significant difference in the correct/incorrect distribution among the models (χ²=3.69; p = 0.296), but at the observational level, Google Gemini had a higher number of unique correct answers (n = 6) compared to the other models. ChatGPT 5.0 was found to have no unique errors. In conclusion, while AI models generally showed similar levels of success in intensive care nursing exam questions, Google Gemini demonstrated superior performance in difficult questions, and DeepSeek showed the lowest level of success among the models. The study provides an essential comparative framework regarding the usability of AI-based learning and assessment tools in nursing education. It offers guidance for the future development of AI-based educational technologies. Not applicable.
In contemporary organizations, representatives of different generations meet. Many publications have been written about the work style, expectations, and needs of representatives of previous generations, but we notice a certain gap concerning the youngest participants of the labor market, i.e., Generation Alpha. The aim of the publication was to find answers to the questions about the requirements for remuneration and well-being at work of young people in Poland, whether these two groups of expectations are causally related, and what is the role of the sense of agency of representatives of Generation Alpha in this causal-effect system. We surveyed 446 respondents aged 19-22, selected on a quota basis. The results of the study confirmed that the expectations of the youngest players in the labor market go in many directions: first, toward earning "good" money, secondly - well-being at work is important. We confirmed that the expectation of young people regarding satisfactory remuneration is related to the expectations (and implicitly also the search for) of an employer who will ensure well-being at work. However, it was not possible to directly confirm the mediating role of the high sense of agency in the relationship between expectations about pay and well-being at work.
Life cycle assessment (LCA) is a comprehensive methodological framework for evaluating environmental impacts of products and processes throughout their entire life cycle. It can be used for many applications, such as identifying environmental impact hotspots of a product system, solution, and product comparison or marketing. The LCA process is divided into four phases: (1) goal and scope definition, (2) inventory analysis, (3) impact assessment, and (4) interpretation. The goal and scope definition answers to why and how the LCA will be conducted. The collected inventory forms the basis for the following impact assessment. In the end, the results are interpreted, and conclusions are made.In the evolving algae sector that moves from pilot-scale trials to complex biorefinery concepts, the role of LCAs is becoming ever more important as a key decision-support tool. However, for reliable, comparable, and useful LCAs, more common practices are needed. This chapter describes how to conduct an LCA for both micro- and macroalgae products while providing examples and describing common environmental impact hotspots.
Artificial intelligence (AI) platforms are becoming increasingly popular as resources for equine information. However, these platforms generate responses from a wide range of sources and do not always distinguish between fact and opinion. The objective of this study was to assess the accuracy and quality of AI-generated answers to equine-related questions. Researchers hypothesized that AI platforms could answer basic equine questions effectively but would perform poorly on complex topics or questions. Forty questions were written covering general horse care, facilities management, nutrition, genetics, and reproduction. Each question was categorized by difficulty level: beginner, intermediate, advanced, or trending. Three AI platforms were tested: ChatGPT (CGPT), Microsoft Copilot (MicCP), and ExtensionBot (ExtBot). Responses were scored for accuracy, relevance, thoroughness, and source quality (5 points each; total 20). Data were analyzed using PROC GLM in SAS (v. 9.4). Total score was affected by level (P = 0.002). Intermediate questions had the highest total score (15.95 ± 1.99). Accuracy was affected by platform (P < 0.001), level (P < 0.001), and topic (P = 0.015). CGPT (4.18 ± 0.93) and MicCP (4.08 ± 0.83) outperformed ExtBot (3.26 ± 1.21). Relevance was affected by platform (P = 0.042) and level (P < 0.001). Thoroughness was affected by platform (P < 0.001). Source quality differed by platform (P = 0.037). AI platforms could be resources; currently they fall short of the knowledge that Equine Extension Specialists can offer. AI platforms had difficulty addressing complex topics and demonstrated inconsistent performance across criteria.
Clinical Informatics is wide-ranging field that engages with nearly every aspect of clinical care that is documented in the electronic health record (EHR). While studies from the informatics literature had been gradually introducing more sophisticated machine learning and artificial intelligence (AI) techniques into clinical settings, the explosive growth of Large Language Models (LLMs) has enticed both entrepreneurs and clinicians to rapidly introduce LLMs into the Emergency Department. Clinical Informaticists possess a deep understanding of both the clinical significance and underlying architecture of clinical data. Misunderstanding how data is represented can pose significant hazards for clinical care, research, and AI systems. Despite the seemingly high performance of LLMs on some clinical measures, evidence for their ability to reason clinically is lacking, and they often provide confident, false answers. Emergency Physicians (EPs) who are board-certified in Clinical Informatics could be a natural constituency to help to integrate these technologies safely into the ED. However, there are very few EPs with this board-certification, due to high demand, few training programs, and a lack of visibility of the subspecialty. LLMs and other AI systems are likely to play a growing role within the ED as technology improves and hospitals partner with commercial vendors. Working EPs need to have a strong understanding of the potential benefits and limitations of these technologies, and EPs with training in Informatics will play an essential role. Increasing exposure to Clinical Informatics within Emergency Medicine residencies and supporting EPs to go into Informatics fellowships is paramount.
A number of articles have heralded the use of artificial intelligence (AI) agents to serve as a replacement for human psychotherapists. Despite the rapid advancements in the use of both rule-based and generative AI programs in the recent past, an overall review shows only small impacts on certain mental health symptoms, particularly depression, and then only in the short-term. Significant strides forward, both in terms of technology and the development of answers to ethical questions regarding AI's use in psychotherapy, must be seen before the use of such systems becomes widespread or regularly recommended to replace human mental health clinicians.
This study aimed to examine 8th-grade students' views on the concepts of nanotechnology and nanoscience through the use of the Metaverse in science courses. The study group sample consists of five students from both the before- and after-experience groups, all of whom are in 8th grade. This study employed a qualitative research method with a case study design. Observation, interview, and document analysis were used as data collection tools. Necessary measures have been taken to ensure the validity and reliability of the research within its scope. The data were analyzed using a content analysis approach. As a result of the interviews, data were collected and analyzed. As a result of the textual examinations, code, category, and theme were determined. The findings were presented in categories through tables, and the participants' answers were included in direct quotations. Upon reviewing the literature, it becomes apparent that most studies in nanotechnology and nanoscience are conducted for informational purposes, typically presented as presentations or reports. Given the limited availability of nanotechnology and metaverse education, the study was divided into two groups: a before-experience group and an after-experience group. As a result of the survey, 8th-grade students experience the metaverse and have future expectations for nanotechnology and nanoscience. Their cognitive and affective interests have increased, as evidenced by their questioning why these applications cannot be applied to all courses and by their correct expression of the concepts. At the same time, it has been concluded that using rich materials to concretize abstract concepts, such as nanotechnology, facilitates their teaching. The study provides qualitative evidence that Metaverse-based instruction can enhance both cognitive and affective dimensions of science learning, offering design implications for integrating immersive technologies into middle school curricula to teach abstract concepts.
Consumer understanding of ultra-processed foods (UPFs) is poor, and no consensus definition exists. This study examines how young adults in the United States (US) define UPF and their ability to differentiate UPF from non-UPF of varying nutritional quality (NQ). In a mixed-methods survey of young adults (18-39 years) living in the US for ≥1 year, respondents defined UPF, identified whether 24 foods were UPF or not using images with front and back of package information, and answered demographic questions. Foods were categorised using NOVA for processing and Food Compass for NQ. They included a high NQ non-UPF, low NQ non-UPF, high NQ UPF, and low NQ UPF item from six food groups: fruits, vegetables, dairy, grains, protein, and snacks/sweets. Concepts used to define UPF were reported as number of respondents mentioning each in their definition. A score of correct answers out of 24 was calculated. The sample of 422 adults, mean age 26.0±6.7 years, was predominantly white (82%), female (74%), and from the Northeast (82%). Thirty concepts were identified to define UPF. The top concepts were food containing additives, preservatives, colours/dyes, or natural or artificial flavours (N = 105), containing non-natural/artificial ingredients or food (N = 98), being highly processed/processed in multiple steps (N = 95), being altered, manipulated, or modified (N = 87), and having low nutritional value/nutrients removed (N = 75). The mean score was 16.0±3.6 (67%) foods. These results suggest limited consensus on how young adults define UPF. Studies in more diverse populations are needed, but consumers may benefit from a clear definition of UPF.
- Six years apart, we studied the knowledge of nurses and nursing assistants (NAs) working in a liver transplant department regarding immunosuppressive and anti-infective drugs in this discipline. - For 2019, we report the initial and acquired knowledge of 63 nurses and 34 NAs. The training began and ended with the same 13-question questionnaire. Given the turnover of caregivers, six years later, in 2025, we reassessed the level of knowledge using 17 questions. - In 2019, before the training, for 12 questions, the percentage of participants who gave a correct answer was less than 50%, and for 9 questions, it was less than 30%. After the training, the percentage of correct answers was greater than 75% for 11 questions and greater than 90% for 8 questions. In 2025, twenty-seven nurses and nine ANs participated in this mapping exercise. For nine questions, the percentage of participants who provided a correct answer was less than 50%, and for seven questions, it was less than 30%. Regarding the distribution of scores obtained out of 17 questions, the highest score obtained by one nurse and one AN was 16 and 9, respectively. The lowest score obtained by one nurse and one NA was 4 and 2, respectively. - Nurses and NAs working in a transplant department should receive training on medications specific to this discipline.
Y.A. Bassiouny, D.M.R. Dakhly, Y.A. Bayoumi, N.A. Salaheldin, H.A. Gouda, and A.A. Hassan, "Randomized Trial of Combined Cabergoline And Coasting in Preventing Ovarian Hyperstimulation Syndrome During In Vitro Fertilization/intracytoplasmic Sperm Injection Cycles," International Journal of Gynecology & Obstetrics 140, no. 2 (2018): 217-222, https://doi.org/10.1002/ijgo.12360. This retraction has been issued for the above article, published online on 21 October 2017 in Wiley Online Library (wileyonlinelibrary.com), by agreement between the journal Editor-in-Chief, Michael Geary; and John Wiley & Sons Ltd. A third party expressed concerns about the randomization allocation process, noting that the reported methodology does not appear to reflect the standards of a randomized clinical trial. When asked for clarification, the authors provided their study data and explanation. However, the editorial team and publisher did not feel that the answers and data alleviated the concerns with the methodology. While the work may have been undertaken in good faith, it does not meet the minimum standards to constitute a randomized study. As a result, the data and conclusions are considered unreliable, and therefore the article must be retracted.
Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline-appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model-scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open-response sub-analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider-dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy-based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.
Self-reported data is subject to reporting biases including social desirability bias. List randomization is one method that can help mitigate the impact of such biases. Here, we examined the utility of list randomization among women living with HIV of reproductive age in sub-Saharan Africa. In the Family Planning and Antiretroviral Therapy study, participants were randomized to answer five blocks of true/false statements via either direct or list response. Each block contained three non-sensitive statements and one sensitive statement related to either condom use or HIV disclosure. For each sensitive statement, we calculated the prevalence difference (PD) comparing list response to direct response overall and stratified by socioeconomic status. The PD for four of the sensitive statements was negligible. However, we found that self-report of always using a condom was reported by 53.1% at list response visits versus 34.7% at direct response visits (PD: 18.5%; 95% CI: 6.2%, 30.7%), a difference that was attenuated among those with higher socioeconomic status. In this setting, list randomization did not meaningfully change the estimated prevalences for most questions, except for one question which unexpectedly produced a higher estimate for a positive behavior. Examining this method in other settings and populations is warranted.
Emergency medicine (EM) residency curricula are designed to prepare future physicians for independent practice. Although the Accreditation Council for Graduate Medical Education requires that EM residents have prehospital experiences, very few programs augment this experience with a dedicated resident response vehicle. There are minimal data demonstrating the utility of such an approach. Our residency program staffs a dedicated response vehicle with a PGY-2/3 resident 24/7/365 to respond to high-acuity emergency medical services (EMS) calls. Additionally, from 0800 to 2300, the on-duty resident provides on-line medical control (OLMC) for the county. Each resident averages one 24-hour shift per 4-week EM block. The purpose of this study is to describe the prehospital educational experiences and curricular contributions that this program provides. We used a retrospective observational study design of administrative patient care records over a 5-year period. The primary outcomes were the number of unique encounters and patient experiences per resident per cohort year. The secondary outcomes included characterization of the prehospital experiences among all residents: physician role, patient age-group and sex, problem type, scene location, and procedures. Descriptive statistics were computed to quantify the number, type, and characteristics of the prehospital encounters. Ninety unique resident users were identified in the charting system. The mean number of encounters per resident was grouped by graduation year and spanned from 28.7 (SD 15) for 2018 to 79.2 (SD 49.2) for 2022, with a range of 2 to 222 encounters per resident documented. Over the study period, our residents managed 1313 out-of-hospital cardiac arrests (34 pediatric), 1048 refusals, 596 death pronouncements, 172 critical trauma patients, and answered 2053 complex OLMC consults. This study quantified the prehospital experiences of our senior EM residents with the addition of a physician response vehicle to our longitudinal EMS curriculum. This has allowed our residents to gain valuable first-hand exposure to out-of-hospital adult and pediatric cardiac arrests, refusals of care, altered mental status, and respiratory emergencies, in addition to prehospital scenarios not likely to be seen within the hospital walls, including motor vehicle collisions with entrapment and mass casualty incidents.
Large Language Models (LLMs) have demonstrated strong performance in medical question-answering tasks, highlighting their potential for clinical decision support and medical education. However, their effectiveness in subspecialty areas such as nephrology remains underexplored. In this study, we assess the performance of open-source LLMs in answering multiple-choice questions from the Nephrology Self-Assessment Program (NephSAP) to better understand their capabilities and limitations within this specialized clinical domain. We evaluated the performance of five open-source large language models (LLMs): PodGPT which a podcast-pretrained model focused on STEMM disciplines, Llama 3.2-11B, Mistral-7B-Instruct-v0.2, Falcon3-10B-Instruct, and Gemma-2-9B-it. Each model was tested on its ability to answer multiple-choice questions derived from the NephSAP. Model performance was quantified using accuracy, defined as the proportion of correctly answered questions. In addition, the quality of the models explanatory responses was assessed using several natural language processing (NLP) metrics: Bilingual Evaluation Understudy (BLEU), Word Error Rate (WER), cosine similarity, and Flesch-Kincaid Grade Level (FKGL). For qualitative analysis, three board-certified nephrologists reviewed 40 randomly selected model responses to identify factual and clinical reasoning errors, with performance summarized as average error ratios based on the proportion of error-associated words per response. Among the evaluated models, PodGPT achieved the highest accuracy (64.77%), whereas Llama showed the lowest performance with an accuracy of 45.08%. Qualitative analysis showed that PodGPT had the lowest factual error rate (0.017), while Llama and Falcon achieved the lowest reasoning error rates (0.038). This study highlights the importance of STEMM-based training to enhance the reasoning capabilities and reliability of LLMs in clinical contexts, supporting the development of more effective AI-driven decision-support tools in nephrology and other medical specialties.