In high-stakes postgraduate examinations, the cognitive complexity of assessment items is central to evaluating advanced clinical reasoning and decision-making competencies. Alignment between examination content, cognitive demand, and competency-based educational frameworks is essential for assessment validity. This study evaluated the cognitive structure of pediatric dentistry questions in the Turkish Dental Specialty Examination (DUS) using Bloom's revised taxonomy and examined their alignment with curricular expectations. A retrospective cross-sectional analysis was conducted on 127 officially released pediatric dentistry questions administered between 2012 and 2021. Each item was independently classified according to Bloom's revised cognitive levels. Curriculum relevance and scientific accuracy were rated using a 5-point Likert scale. Inter-rater reliability was assessed using weighted Cohen's kappa. Associations between cognitive level and curriculum relevance were analyzed, and temporal trends across examination years were explored. Questions were predominantly concentrated at the Understand and Apply levels, with fewer items categorized at the Analyze level. No questions were classified at the Evaluate or Create levels. Although lower- and higher-order cognitive skills appeared proportionally balanced when dichotomized, higher-order items largely reflected procedural application rather than advanced analytical or evaluative reasoning. No significant temporal progression toward greater cognitive complexity was observed. Curriculum relevance ratings were high overall but showed no significant association with cognitive level. This high-stakes specialty examination predominantly assesses lower- and intermediate-level cognitive processes, with limited representation of advanced higher-order thinking. The findings indicate potential blueprint misalignment with postgraduate competency expectations and underscore the need for deliberate integration of higher cognitive-level items to strengthen assessment validity.
This forum essay calls for greater sociological attention to the theoretical and empirical study of attitudes about the families and care of older adults living with Alzheimer's disease and related dementias (ADRD; dementia). Investigating these attitudes can help expand our understanding not only of the social experience of older adults with dementia, but also of family members and caregivers, as dementia is often highly stigmatized, memory loss changes relationships, and relationship dynamics influence care provision and inequalities. Attitudes and norms function at multiple levels - individual, family, and societal - and have large-scale consequences for social systems and inequality in an aging and increasingly diverse United States, where a growing number of older adults have dementia and family caregiving is normative. We briefly highlight demographic trends and interdisciplinary developments that underscore the urgency of and advantages to addressing these attitudes in sociology specifically. We conclude with a call to action and recommendations for scholars seeking to pursue related research within four relevant subfields within sociology: families, aging in the life course, stratification (race, gender, class), and medical sociology.
Clonal hematopoiesis of indeterminate potential (CHIP) is the clonal expansion of somatically mutated hematopoietic stem cells (HSCs) in the bone marrow. CHIP mutations are relatively common in multiple myeloma (MM) and have been identified as potential biomarkers for poorer survival outcomes. MM is a hematological malignancy that, despite treatment advances, remains aggressive and incurable for many patients. The potential impact of CHIP mutations on the outcomes of MM treatments has been the topic of several recent studies, yet both the magnitude and the modality by which CHIP exerts its negative effects on treatment and disease progression remain to be fully elucidated. Evidence suggests that CHIP mutations may contribute to inferior survival and treatment tolerances, as well as contribute to greater treatment toxicity and related frailty. In this review, we synthesize and discuss the available literature to provide an updated understanding of the complex role that CHIP plays in altering the MM microenvironment, and the resulting impact on standard MM treatments, autologous stem cell transplant (ASCT) and B-cell maturation antigen (BCMA)-targeted therapy/CAR-T, and the important role of immunomodulatory drug (IMiD) maintenance therapy in clinical outcomes.
Gambling is a major public health issue increasingly affecting adolescents globally and worsened in Nigeria by weak enforcement of betting laws among other factors. The burden of gambling and its health effects among Nigerian adolescents is not well understood. Hence, this study assessed the prevalence of gambling, as well as the association between gambling and other health-related factors among male adolescents in Osun State, Nigeria. Using a multistage sampling technique, this study utilised a descriptive, cross-sectional design and was conducted among 517 male senior secondary school students attending ten randomly selected schools. Health related factors were measured using the Kessler Psychological Distress Scale and the Jenkins Sleep Scale, while alcohol and drug risk was assessed using the CRAFFT screening tool. The multivariable logistic regression model adjusted for age, fathers' occupation, parental and peer gambling, mother's educational attainment, access to betting, smartphone ownership, sleep disturbance, anxiety, and substance use. The study revealed a lifetime prevalence of gambling among male adolescents in Osun State, Nigeria, to be 40%. Significant associations were found between gambling and anxiety (p < 0.001) as well as substance use (p < 0.001). Respondents aged 15-17 years had 1.7 times higher odds of gambling in the past year compared to those aged 12-14 years (AOR: 1.7, 95% CI: 1.02-2.8, p = 0.042). Similarly, those aged 18-19 years had four times higher odds of gambling compared with the 12-14-year-olds (AOR: 4.0, 95% CI: 1.4-11.6, p = 0.007). Adolescents with parents who gamble had significantly higher odds of gambling (AOR: 7.0, 95% CI: 3.2-15.2, p < 0.001 ), as did those with gambling friends (AOR: 2.0, 95% CI: 1.2-3.5 p = 0.007 ). Access to betting shops (AOR: 2.1, 95% CI: 1.3-3.4 p = 0.003) and having a smart phone (AOR: 2.1; 95% CI: 1.0-4.2, p = 0.042), frequent sleep disturbances (AOR = 3.1, 95% CI: 1.4-6.9, p = 0.007) and substance use (AOR = 4.9, 95% CI: 2.3-10.6, p < 0.001) increased the odds of gambling in the past year. Participants with anxiety symptoms had significantly higher odds of gambling in the past year (AOR = 5.3, 95% CI: 2.3-12.4, p < 0.001). Gambling among adolescents was associated with increased anxiety and substance use. Parental and peer influences were also key factors in gambling engagement. Addressing adolescent gambling effectively requires a multi-faceted strategy, including parental education and involvement, peer-led prevention programs, restricting access to gambling platforms, and strict enforcement of gambling laws.
Antimicrobial Stewardship in the intensive care unit setting is a difficult task due to diagnostic uncertainty and perceived high-risk of poor outcomes in case of delayed or inappropriate treatment. Although novel diagnostics and other strategies have been proposed to improve antimicrobial use, their clinical effectiveness in real-world settings has been suboptimal. We designed a critical interpretative synthesis of the literature, which allows the combination of quantitative and qualitative studies to revise and critique concepts used in Antimicrobial Stewardship efforts in the ICU setting. We searched the literature in duplicate with a sensitive strategy to identify main concepts, and we developed a main theme and conceptual framework after identifying the main concepts and strategies. After screening 41,192 titles and abstracts and reviewing 1,335 full-text manuscripts, we selected 29 main manuscripts for this synthesis. We identified that classical concepts, such as the use of broad-spectrum antibiotics followed by de-escalation and the use of biomarkers of infection and novel diagnostics, although with face validity and supported by efficacy studies, carry a high risk of being ineffective in real-world settings. We argue that this discrepancy is due to cognitive biases in antimicrobial decision-making in the ICU setting, including risk-aversion behavior, diagnostic momentum, premature closure, therapeutic momentum, hyperbolic discounting, commission bias, and anchoring bias, among others, which drive intensivists towards overdiagnosis and overtreatment of infection. Incorporation of the cognitive theory of decision-making in future stewardship efforts is necessary in the ICU setting along with traditional stewardship interventions.
This chapter introduces the issue of the International Journal of Bioethics and Ethics of Science that deals with the gift and utilization of body parts and substances of human origin in human health care. I notably emphasize, in this introduction, the idea that care ethics, in the variant developed by Paul Ricoeur, provides a framework that is peculiarly suitable for the treatment of the ethical stakes associated with the field of therapeutic activities so delineated. I also emphasize, on the basis of two relevant practical cases, the fundamental importance of the organization of the health care system, both factually and in terms of individual and collective responsibility. I devote, finally, a third section to a synthetic presentation of the contributions to this volume. The latter are structured by three great interdependent types of stakes&#160;: the epistemological stakes relative to the basic distinction between body and mind&#160;; the ethical stakes centered, notably, on the norm of non-remuneration&#160;; and the political stakes shaped, in particular, by the norm of self-sufficiency and the trade-off between self-sufficiency and non-remuneration. Cet article introduit le volume du Journal International de Bioéthique et d’Ethique des Sciences qui traite du don des parties du corps et de substances d’origine humaine à des fins thérapeutiques. On soutient notamment, dans cette introduction, l’idée que l’éthique du soin, dans la variante développée par Paul Ricoeur, fournit un cadre particulièrement approprié pour le traitement des questions éthiques associées au domaine d’activités thérapeutiques ainsi caractérisé. On souligne, en second lieu, en s’appuyant sur deux cas pratiques pertinents, l’importance déterminante de l’organisation du système de soins, tant factuellement qu’en termes de responsabilité individuelle et collective. On consacre, enfin, une troisième partie à une présentation synthétique des contributions à ce volume. Celles-ci s’organisent autour de trois grands pôles interdépendants : enjeux épistémologiques relatifs à la distinction entre corps et esprit ; enjeux éthiques centrés, notamment, sur la norme de non-rémunération ; et enjeux politiques, qui prennent forme, en particulier, autour de la norme d’autosuffisance et de sa mise en tension avec la norme de non-rémunération.
Programmatic assessment offers a system-level approach to evaluating students' competence by integrating multiple low-stakes assessments, longitudinal evidence and expert judgement. Although widely adopted across several health education disciplines in Australia, radiography education providers have not implemented programmatic assessment at a programme or course level. This paper proposes a radiography-specific programmatic assessment framework. The objective is to translate core programmatic assessment principles into curriculum design strategies that strengthen feedback, improve the defensibility of decisions and enhance national workforce readiness. The paper outlines key purposes of programmatic assessment in undergraduate radiography education including supporting learning, strengthening feedback mechanisms, tracking developmental progress and enabling defensible decisions grounded in longitudinal evidence. Critical design considerations include aligning assessments with a capability framework, generating evidence across diverse clinical contexts, prioritising narrative feedback and using portfolios as central evidence repositories. The analysis highlights the importance of competence committees for high-stakes decisions and the need to support shared assessment practices across varied clinical placement environments. The proposed radiography model integrates six components: capability framework, evidence generation, evidence aggregation, interpretation, decision-making and system learning. This model addresses radiography's multimodality workflow, training variation across sites and accreditation requirements for fairness, transparency and systematic monitoring. Programmatic assessment offers a coherent approach to strengthening radiography education by supporting clearer insight into learner development and ensuring consistent evidence of capability achievement across clinical environments. When adapted to radiography's multimodality practice and evolving workforce demands, programmatic assessment enhances readiness for independent practice and supports continuous curriculum improvement. Programmatic assessment provides a coherent framework for evaluating diagnostic radiography students’ professional capability by integrating longitudinal, narrative-rich evidence across clinical and simulated learning environments.Aligning assessment design with the Medical Radiation Practice Board of Australia (MRPBA) Professional Capabilities enables transparent, defensible progression decisions that evidence accreditation requirements while supporting learner development.Effective implementation of programmatic assessment in radiography depends on deliberate system design, including balanced assessment stakes, structured portfolios, assessor calibration and collective decision-making through competence committees.
High-stakes examinations, such as those used for board certification, must be valid and fair across demographic groups. The American Board of Emergency Medicine (ABEM) developed a structured process for bias and fairness assessment to identify and refine potentially biased examination items. ABEM implemented a three-phase innovation: (1) statistical flagging of potentially biased items using differential item functioning (DIF) analysis; (2) expert panel qualitative review; and (3) holistic content review by the editorial team. Over an 8-year period, 3736 items were analyzed. DIF flagged 597 items (16.0%) for review. The expert Bias and Fairness Panel recommended deletion of 62 (10.4% of flagged items) due to construct-irrelevant bias, most often related to racial bias (53.2% of items recommended for deletion), followed by regional jargon or practice variation (43.5%). The process has been adopted consistently and is being extended to new examination formats. A structured, theory-informed bias and fairness assessment process can reduce construct-irrelevant variance in high-stakes learner assessments. This can serve as a replicable model for other certifying bodies and medical educators seeking to enhance their approach to assessment.
In 2019, Ethiopia introduced the National Medical Licensing Examination (NMLE) to standardize medical competence, enhance accountability in education, and ensure patient safety. The effectiveness of such high-stakes exams is significantly influenced by the perceptions of key stakeholders, including medical students, faculty, and school deans. This study investigates their views on the relevance and impact of the NMLE. A qualitative descriptive study was conducted with a stratified sample of public and private medical schools. We conducted eight key informant interviews (KIIs) with deans and eleven focus group discussions (FGDs); five with medical faculty and six with graduating medical students. Interviews and FGDs were audio-recorded, transcribed verbatim, coded, and analysed thematically. Data were coded in MAXQDA24 using pre-identified themes and open coding to include newly identified themes. Three main themes and eighteen subthemes were identified. The NMLE was generally recognized as an important tool for establishing minimum competencies, standardizing medical education, and building public trust, particularly among faculty and deans. However, students expressed concerns about redundancy with other assessments and relying solely on knowledge-based assessments. Faculty and deans acknowledged the exam's role in quality assurance and institutional benchmarking but highlighted design flaws and a lack of practical assessments. While faculty expressed trust in the exam's intent, students expressed scepticism due to perceived imposition and transparency issues. The exam encouraged learning and prompted curricular changes, but also resulted in significant anxiety, stress, and a delayed entry into the workforce. Students felt their institutions provided inadequate support. There is strong consensus among study participants on the need for reform on the exam: adding a practical component, applying a stepwise assessment model, reducing redundancy, and improving transparency. While the NMLE is recognised for promoting educational quality, ensuring minimum competence and safeguarding patient safety, it is considered inadequate for fully assessing physician competence and redundant with other assessments. The findings highlight gaps between policy intent and implementation, which require collaborative dialogue among stakeholders to co-create meaningful improvements in the assessment and ensure effective policy implementation.
Value-based decision-making engages brain-wide motivational, cognitive, and motor processes. Yet, information integration and gating that culminate in immediate decisions upon salient events likely occur within small neural nuclei and cortical layers at the mesoscale not resolved with conventional human neuroimaging. Using submillimeter-resolution 7 T functional MRI with acquisition-matched anatomical references and a lottery choice task incorporating salient superhigh stakes, we dissociated mesoscale operations spanning a brainstem-prefrontal-striatal pathway during choice and outcome processing. The locus coeruleus, caudate, and prefrontal cortex showed enhanced activity during superhigh-stake choices, while the substantia nigra/ventral tegmental area and nucleus accumbens additionally distinguished gains from losses. In contrast, gray-matter bridges between caudate and putamen were associated with faster responses. Laminar analyses revealed deeper prefrontal layers predominating during choice selection and superficial layers during outcome evaluation. Here, we show a mesoscale framework integrating brainstem modulation, striatal gating, and laminar cortical computation in human decision-making upon salient events.
Occupational ApplicationThis study found that vendors were involved in nearly one-third of all observed flow disruptions during orthopedic surgery, with a disproportionate share linked to coordination issues and protocol failures, including breaches of the sterile field. While vendors provide critical technical expertise on equipment and implants, their involvement can unintentionally blur role boundaries and disrupt team coordination in high-stakes environments. For ergonomics and human factors practitioners, these findings underscore the importance of designing systems that support clearer role delineation, structured integration of non-clinical participants, and improved communication protocols in surgical teams. Practical applications include developing vendor orientation programs, establishing explicit boundaries on clinical versus technical responsibilities, and training OR staff to effectively leverage vendor expertise without over-reliance. Addressing these challenges can improve team resilience, reduce safety risks, and optimize workflow efficiency in surgical and other complex, multidisciplinary work settings. Background: Surgery demands coordination, yet flow disruption (FD) interruptions that divert attention are common and can undermine safety. Among the contributors to FDs is the surgical vendor, an external representative who provides expertise on the prosthetic device being implanted. Although vendors are valuable resources, their presence in the operating room (OR) has also been associated with safety risks.Methods: This study examines the nature and frequency of vendor-related FDs during orthopedic surgery. Trained human factors observers were embedded in orthopedic ORs and systematically documented FDs in real time. Disruptions were subsequently categorized using the RIPCHORD-TWA taxonomy and analyzed to quantify vendor involvement.Results: Of 1,387 observed FDs, vendors were involved in 425 (31%). Despite being one of several OR participants, vendors accounted for a disproportionate share of protocol-related failures, including 13 of 31 (42%) observed breaches of sterile field and other procedural deviations.Conclusion: Vendors provide essential technical knowledge while also representing a significant source of disruption. These findings highlight the need for clearer role delineation, structured integration of vendors into surgical teams, and enhanced training for both vendors and OR staff to minimize inappropriate task delegation. Addressing these issues through structured integration, role delineation, and team-centered process redesign can enhance human-system performance and occupational safety in high-stakes surgical environments.
Public health emergencies such as pandemics, natural disasters, and epidemics may require rapid, high-stakes decisions often made by elected officials with limited public health training. Artificial intelligence (AI) holds significant promise to enhance the quality, transparency, and timeliness of governmental decision-making during such crises. This paper examines the potential of AI as a decision-support tool for elected officials while identifying key technical, logistical, ethical, and policy challenges. Technical considerations include model accuracy, data representativeness, and privacy protection, while ethical imperatives center on fairness, transparency, and accountability to prevent amplification of existing health disparities. The paper further explores workforce development needs, emphasizing AI literacy and cross-sector collaboration to enable informed use of AI insights. This viewpoint presents a novel AI Decision Support Lifecycle framework specifically designed for governmental public health emergency response, mapping six phases from problem definition through post-emergency evaluation. We provide stakeholder-specific recommendations for model developers, health agencies, and elected officials, and illustrate practical application through a detailed case example and use cases. Drawing on empirical evidence regarding digital health technologies and AI governance, we emphasize that technology deployment alone is insufficient. Successful implementation requires complementary investments in organizational capacity, data infrastructure, workforce training, community engagement, and continuous evaluation. AI integration also requires robust governance frameworks, continuous model evaluation, and alignment with existing crisis management structures. Policy recommendations highlight the importance of ethical AI frameworks, risk assessments, and public engagement to foster trust. Ultimately, AI can strengthen public health decision-making if developed and implemented responsibly within transparent and equitable systems.
Policy Points Researchers investigate how recent elections in the United States have influenced mental health, especially among political- and policy-based election losers. The previous two presidential elections worsened the self-reported mental health of Americans on average. Likely partisan election losers and those who had the most to lose in terms of health policy were even more likely to have their mental health affected by the results of elections. As American politics has become increasingly polarized and the perceived stakes of elections have loomed larger in recent years, elections have become a source of worsening mental health for Americans. Politics is increasingly important to many Americans. Yet little is known about how the increasing centrality of politics affects Americans' mental health. This work aimed to evaluate how recent polarized elections have influenced Americans' mental health. To investigate this question, we compared online search interest in politically related mental health issues and self-reported mental health data. Analyses explored changes before and after election days in 2020 and 2024. The two outcome variables were aggregate Google search interest in politics-related mental health issues and individual responses to the following item from the Behavioral Risk Factor Surveillance System (BRFSS): ''Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? With BRFSS, we compared differential changes for likely Democrats and Republicans using multiple proxy measures and for those with health policy interest in the election. The 2020 and 2024 presidential elections substantially increased interest in politics-related mental health issues online. The 2020 election led to just under 0.2 additional days of poor mental health (P < .05), and the 2024 election led to just under 0.5 additional days of poorer mental health (P < .05). Likely losing partisans and those who stood to lose out from Trump's reelection in terms of health policy were found to drive most of this relationship, with just under 1 full additional day of poorer mental health for each group. The stakes of elections in this polarized era of American politics are worsening the mental health of Americans. Additional resources may be necessary to allow therapists and clinicians to navigate additional care-seeking surrounding and following elections.
Self-presentation theory suggests people strategically adjust trait displays to meet evaluative goals, meaning faking can sometimes enhance the link between scores and real-world performance. We tested this in a large-scale military selection field experiment (N = 1,133) by manipulating the salience of self-presentation motives during personality assessment. We examined how varying the salience of self-presentation affects personality trait levels, convergent validity with low-stakes scores, and the ability to predict performance and career outcomes. Participants completed the same personality inventory under low, moderate, or high self-presentation. Despite trait score inflation, convergent validity with low-stakes benchmarks remained largely equivalent, and predictive validity was preserved or even enhanced under high-salience conditions. Notably, traits such as conscientiousness and extraversion showed stronger predictive utility when self-presentation motives were made explicit. These findings challenge the common view that response distortion inherently undermines test validity and instead suggest that motivated self-presentation may reflect context-relevant trait expression.
: Background: Pediatric neurosurgery is practiced in a complex, demanding and high-stakes environment. Consequential and high-impact decisions are made while undertaking delicate operations on very young and vulnerable patients. Summary: The team dynamics in pediatric neurosurgery - how the team communicates, functions under stress, adapts and supports each member - is the prime determinant for success in this high-stakes milieu. Key messages: Building an effective team in pediatric neurosurgery requires focused vision and mission alignment, effective communication infrastructure, inspirational and transformative leadership as well as efficient mechanisms for conflict management, and burnout prevention.
Informal caregivers are widely recognised as part of the 'unit of care' in palliative care, yet this recognition has rarely been translated into clearly specified ethical obligations. The present paper argues that informal caregivers are not merely instrumentally relevant to patient-centred care but are independent moral stakeholders whose vulnerability grounds direct obligations of support. The analysis demonstrates that ethical obligations towards caregivers cannot be justified solely by reference to patient welfare. While many such duties are best understood as prima facie obligations, some reach the level of threshold obligations where caregivers' fundamental interests-such as autonomy, integrity or protection from serious harm-are at stake. The paper argues that the provision of support to informal caregivers should not be regarded as a discretionary component of good palliative care but a threshold requirement of ethically grounded palliative practice, with implications for clinical decision-making and institutional responsibility.
Traditional anthropometric methods for personalising equipment in high-stakes professions are often costly, time-consuming, and lack scalability. This study proposes and validates a low-cost, human-centered framework that integrates machine learning with usability evaluation to address this problem. The framework consisted of two stages: first, applying clustering algorithms to anthropometric data to establish a data-driven sizing model; and second, developing a smartphone-based prototype to validate the framework's real-world applicability. A comprehensive evaluation with 20 university students and a supplementary validation with 6 active Air Force Academy students demonstrated the framework's success, achieving an average System Usability Scale (SUS) score of 83 (82.5) and a total Questionnaire for User Interaction Satisfaction (QUIS) score of 209.55 (187.33). The data model was also validated, with key anthropometric variables effectively stratifying complex body types (p < .001). The primary contribution of this study is a generalisable framework for developing user-accepted personalised fitting systems in resource-constrained settings. This study provides a validated, low-cost framework for practitioners in military or high-stakes professions. By leveraging smartphone imaging and machine learning, organisations can rapidly develop user-accepted, data-driven sizing systems that replace outdated manual measurements to enhance personnel safety and operational effectiveness.
Background Effective invigilation is crucial to the dependability of high-stakes medical exams. MBBS theory exams in the context of undergraduate medical education involve a significant number of candidates and require meticulous preparation to guarantee operational efficiency, security, and fairness. Despite the significance of invigilation, there is a dearth of empirical information on the effects of invigilation on academic integrity in the Indian setting. Materials and methods A prospective, observational, descriptive study was carried out at the Government Medical College in Nagpur during the MBBS theory exams. All registered candidates were included in the study. Structured observation and documentation formats were employed to monitor invigilation operations in real time. Candidate registration and attendance, absenteeism, academic integrity infractions (like attempts at malpractice and unauthorised use of electronic devices), operational problems (like mistakes in documentation and disruptions in procedures), medical emergencies, and the way each exam session was conducted were all included in the data that were gathered. Descriptive statistics were used to analyse the data, and the results were displayed as percentages and frequencies. Results A 98.1% attendance rate was achieved by 210 of the 214 registered candidates who took the test. Four candidates (1.9%) were affected by the only incident that was documented, i.e., absenteeism. Over the course of seven examination sessions, no instances of detected malpractice, unauthorised electronic or device breaches, paperwork errors, procedural disruptions, or medical/emergency incidents were seen. There was no need for invigilator interventions, as every session started on time and proceeded smoothly. Conclusion This study demonstrates how high attendance, seamless examination conduct, and the preservation of academic integrity during MBBS theory exams may be guaranteed by organised invigilation, adequate manpower deployment, and standardised operating standards. Exam governance and quality control in medical education may be improved by routinely recording and analysing invigilation procedures.
Generative artificial intelligence (GenAI) tools are being increasingly applied to teaching and learning in medical education creating both instructional opportunities and pedagogical challenges. While GenAI offers potential to enhance teaching, assessment, and curriculum design, many medical faculty lack structured guidance on how to integrate these tools ethically and pedagogically within discipline-specific, high-stakes educational contexts. This study aimed to design, implement, and evaluate a faculty development workshop series for ethical and pedagogical integration of GenAI in medical education teaching. A mixed methods pilot study was conducted to design, implement, and evaluate a faculty development workshop series "Professional Development in Generative Artificial Intelligence for Pedagogy" at Weill Cornell Medicine-Qatar, a US medical school in Qatar. The program consisted of five 1-hour synchronous online workshops grounded in Experiential Learning Theory and the Technological Pedagogical Content Knowledge framework. Ten medical faculty from multiple disciplines participated. Quantitative data were collected through an online preintervention survey, an online postintervention survey with open-ended questions, and an online 2-week follow-up survey. Surveys consisted of 5-point Likert scale items capturing perceptions of workshop quality, confidence, and intended application. Qualitative data included full workshop transcripts, facilitator theoretical notes, and facilitator memos. Descriptive statistics summarized quantitative findings, while qualitative data were analyzed using a combination of deductive and inductive coding, alongside narrative analysis. Findings were integrated to generate convergent interpretations. Qualitative analysis of workshop transcripts suggested evolving engagement with GenAI, with participants describing movement from exploratory use toward more intentional pedagogical application. Postintervention survey results indicated high satisfaction with program content, organization, relevance, and overall quality. Two-week follow-up survey responses (n=5) suggested increased self-reported confidence in applying GenAI tools, and perceived shifts in how participants conceptualized teaching with GenAI. Faculty described intended strategies for integrating GenAI into lesson planning, assessment design, visualization of learning materials, and case-based instruction, while emphasizing the importance of human oversight, critical appraisal, and ethical judgment. Findings highlighted the perceived value of hands-on experimentation, reflective discussion, and adaptive facilitation in supporting early faculty engagement. This pilot study provides early evidence that an experiential, theory-informed, and adaptively facilitated faculty development workshop series may support medical faculty in developing self-reported confidence, awareness, and initial strategies for responsible GenAI integration. Findings are exploratory and limited by a small sample size, a single institution, and reliance on self-reported data. Nevertheless, the Professional Development in Generative Artificial Intelligence for Pedagogy workshop series presents a flexible and theory-informed faculty development approach that may inform future faculty development initiatives in medical education as GenAI technologies continue to evolve.
Non-determinism in deep learning algorithm design and implementation leads to performance variation, meaning model performance is not a single value, but rather a distribution. These model performance distributions are underexplored despite their impact on robustness. We investigate the robustness of deep learning performance to sources of non-determinism, specifically focusing on how performance distributions differ across various architectures and tasks. We conducted 186 experiments on state-of-the-art image classification (ResNet, ViT) and time series forecasting (Autoformer, iTransformer, NLinear, TSMixer) architectures. Each experiment was run 100 times with different random seeds to generate performance distributions, resulting in 18,600 runs. Robustness was quantified using metrics for spread, symmetry, and tail risk. Performance distributions are frequently non-Gaussian, particularly in time series forecasting. Model size does not systematically affect robustness - larger image classification models show fewer outliers but not lower spread, while smaller time series models show lower spread but more extreme underperformers. Training duration does not scale linearly; early stopping effectively balances performance and robustness. Mean performance does not predict robustness - time series forecasting shows moderate correlation while image classification shows none. Time series models produce nearly three times more underperforming outliers than image classification models, indicating substantially higher tail risk. Tail risk poses serious concerns for Trustworthy AI in high-stakes applications. Models performing well on average may exhibit long tails and extreme outliers revealed only through distributional analysis. Mean performance alone should not guide model selection; assessment of spread, symmetry, and tail risk is essential for reliable model assessment where consistent performance is critical.