Various studies show that many students enter higher education with insufficient performance in reading comprehension, which affects their disciplinary learning. Educational neuroscience suggests that guided, multimodal instruction in reading strategies can enhance the processes associated with reading comprehension. In this context, the importance of developing structured and culturally relevant interventions in higher education is emphasized. This study assessed the effectiveness of teaching reading comprehension strategies to first-year college students enrolled in various degree programs at a university in southern Chile. This research employed a quantitative approach with a non-experimental, longitudinal, and descriptive design, using non-probabilistic convenience sampling, which included 404 students majoring in Psychology, Law, Obstetrics, Nursing, Medical Technology, and Nutrition and Dietetics. The LECTUM 7 Form A assessment tool was used to measure reading comprehension levels at two points in time: before and after a four-month intervention. During this intervention, short digital modules were provided to teach and practice reading comprehension strategies. The results showed that, at the pre-test, only 51% of the students demonstrated average, high, or very high reading comprehension. After the intervention, this percentage increased to 75%—a 23% improvement compared to the pre-test (p = 0.001). Likewise, an increase was observed in the number of students who achieved normal, high, and very high reading comprehension performance, suggesting a positive effect of the explicit and systematic teaching of reading comprehension strategies in the context of higher education. These findings confirm the need to strengthen reading comprehension in the university context, with a focus on teaching strategies that enhance this skill and support students in their academic journey.
According to the American College of Radiology (ACR) Technical Standard Guidelines, there is a recommended illuminance level for reading rooms set between 25 and 75 lux. However, these guidelines are based on studies completed in 2007-2009, which predate significant advancements in display technology. No recent study has re-evaluated these lighting conditions, and more specifically, no prior studies have examined the effects of ambient lighting on the detection of intracranial hemorrhage on non-contrast brain CT. To determine if higher ambient lighting levels (> 400 lux) significantly effects the diagnostic accuracy of board-certified neuroradiologists in detecting intracranial hemorrhage on brain CT, compared with standard low-light reading room conditions (20-70 lux). Forty-two standard-of-care non-contrast head CT scans (axial acquisitions with coronal/sagittal reformats) from a tertiary-care hospital were selected by two independent neuroradiologists. These neuroradiologists established the presence or absence of intracranial hemorrhage. Four additional board-certified neuroradiologists independently interpreted all cases twice: first in a brightly lit environment (> 400 lux, measured at desktop level) and, after a > 6-month washout period to minimize recognition memory bias, in a standard reading room (20-70 lux). Interpretations were performed on calibrated Barco MDNC-3421 displays using the institutional Fuji PACS. Sensitivity, specificity, and accuracy were calculated for each condition and compared using paired two-tailed t-tests with Bonferroni correction for multiple comparisons. No statistically significant differences were found between bright and dark conditions for sensitivity (0.974 vs. 0.974, p = 1.00), specificity (0.978 vs. 1.0, p = 0.49 adjusted), or overall accuracy (0.976 vs. 0.988, p = 1.00 adjusted). Increased ambient lighting levels (> 400 lux) do not significantly affect the ability of board-certified neuroradiologists to accurately detect intracranial hemorrhage on modern high-resolution displays. These findings suggest that current guidelines should be re-evaluated with recent improvements in technology. Larger multi-center studies are warranted to confirm these results across additional pathologies and modalities.
The non-selective processing hypothesis for bilinguals posits that during the processing of the target language, the lexical information of the non-target language is concurrently activated. Nonetheless, the lexical representation of bilinguals in both languages and the temporal dynamics of lexical information representation remain unclear. Here, we utilized fMRI in Experiment 1 and EEG in Experiment 2, combined with representational similarity analysis (RSA), to explore the neural representation of lexical access in both the target and non-target languages during L2 word reading and the modulatory effects of processing demands. Results of two Experiments jointly revealed that the lexical information of L2 was represented widely and earlier during L2 lexical processing, whereas the lexical information of L1 was represented widely and earlier during L2 semantic processing. These findings provide spatiotemporally integrated evidence for the bilingual non-selective access hypothesis, indicating that non-selective processing occurs only under conditions of high semantic processing demands.
The purpose was to examine how variations in AI output design affect radiologists' performance in interpreting chest X-rays. Eight readers interpreted 80 COVID-19 chest images under five AI conditions in this retrospective study: no feedback, one-word summary, graph, heatmap, and heatmap + graph. Reader accuracy and eye-tracking data were analyzed to assess diagnostic performance and efficiency. Performance data were analyzed using a generalized mixed model nested for cases within readers assuming a binary distribution and with sandwich estimation; eye-tracking data were analyzed with analysis of variance. Baseline accuracy for detecting COVID-19 without AI was high and remained largely consistent across all AI designs. Fewer than 1% of decisions changed from correct to incorrect (true positive → false negative; true negative → false positive) with AI, while approximately 1% of decisions improved (false negative → true positive; false positive → true negative). More complex AI displays, such as the combined heatmap + graph, were associated with longer interpretation times and increased gaze shifts between the clinical image and AI outputs. Providing well-designed AI output can increase diagnosis accuracy and visual search of chest images. Simpler displays may support faster decision-making, whereas complex visualizations could impose additional cognitive demands to process the additional information. However, accuracy improvements likely outweigh modest increases in viewing time. Optimizing the presentation of AI information is essential to integrate human expertise effectively and create a synergistic human-AI partnership in clinical imaging, where the human remains the ultimate decision-maker.
Developmental dyslexia (DD) is associated with multiple cognitive deficits. This study examined whether digital training of specific cognitive components is feasible and effective, whether parental involvement increases training intensity, and whether a multi-component intervention yields specific and cumulative benefits for reading fluency, spelling, and reading comprehension. We developed a multi-component digital intervention targeting visual-attentional (VA), auditory-phonological (AP), and cross-modal processes. In a pre-registered multiple single-case (within-subject) intervention study (ClinicalTrials.gov: NCT04028310), 144 children with DD aged 8 to 13 years participated. The first two training components (VA vs. AP) were counterbalanced across participants. Each training phase lasted 2 months, with home-based practice 5 days per week for 15 min per session under parental supervision. Before and after the digital intervention phases, children received conventional remediation once per week for 2 months, serving as a within-subject baseline. Training effects were component-specific: VA training led to greater improvements in visual-attentional skills than AP training, whereas AP training yielded greater gains in phoneme awareness than VA training. Across the full three-stage program, cumulative improvements were observed in reading fluency, spelling, and reading comprehension that exceeded gains during conventional remediation alone. Starting with VA training resulted in larger reading fluency gains than starting with AP training. These findings provide a proof of concept that a scalable, home-based, multi-component digital intervention can induce specific cognitive improvements and cumulative gains in reading and spelling skills in children with DD, while remaining feasible for large-scale implementation.
To describe long-term visual outcomes, functional vision, and safety following implantation of the MINI WELL non-diffractive extended depth-of-focus (EDoF) intraocular lens over a two-year follow-up period. This monocentric, retrospective, observational study included patients who underwent cataract surgery with binocular implantation of the MINI WELL non-diffractive EDoF intraocular lens. Visual outcomes were assessed postoperatively over 24 months. Evaluations included monocular and binocular uncorrected and corrected visual acuity at distance, intermediate, and near; defocus curve assessment under photopic conditions; contrast sensitivity; binocular reading performance using Radner Reading Charts; and patient-reported outcomes measured with the Visual Function Questionnaire VF-11R. Safety outcomes included adverse events, secondary surgical interventions, and patient-reported photic phenomena. Twenty-three patients (46 eyes) completed the two-year follow-up and were included in the analysis. At 24 months, monocular and binocular visual acuity was maintained across all tested distances. Binocular uncorrected distance visual acuity of 20/32 Snellen or better was achieved in all patients. Intermediate and near visual acuity were consistent with common daily visual tasks. The binocular defocus curve demonstrated a smooth decline in visual acuity across intermediate defocus levels without distinct focal gaps. Mean binocular reading speed at near distance met established criteria for fluent reading. Contrast sensitivity remained within manufacturer-defined normative ranges across spatial frequencies. Patient-reported outcomes indicated minimal difficulty in vision-dependent daily activities. No intraoperative or postoperative adverse events, secondary surgical interventions, or clinically relevant late complications were reported. Mild photic phenomena were infrequently reported. At two years, implantation of the MINI WELL non-diffractive EDoF intraocular lens was associated with maintained visual outcomes, functional intermediate and near vision, and a favorable safety profile. These findings support the long-term stability of visual performance with this EDoF intraocular lens. Larger prospective and comparative studies are warranted to confirm these observations. ClinicalTrials.gov identifier NCT04801992.
Objective This study aimed to assess the quality and readability of online information available to patients on Google regarding Gilmore's groin. Methods This descriptive cross-sectional study evaluated webpages identified through Google searches using the terms "sports hernia", "athletic pubalgia", "Gilmore's groin", "sportsman's hernia", and "hockey hernia". The first page of results for each search term was screened. Duplicate links, non-functioning pages, and irrelevant results were excluded. Unique webpages meeting the eligibility criteria were analysed. Readability was assessed using the Gunning Fog Index (GFI), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) score. Each webpage was further evaluated for source type, intended audience, presence of relevant media, inclusion of key clinical information, and quality using the Journal of the American Medical Association (JAMA) benchmark criteria. Descriptive statistics were used to summarise the findings. Results A total of 26 unique webpages were included. Hospital or clinic websites accounted for 13 (50%) webpages, and 16 (62%) were primarily directed toward patients. Relevant images were present in 11 (42%) webpages and relevant videos in three (11.5%). Information on cause and symptoms was provided in 26 (100%) webpages, investigations in 22 (85%), treatment in 25 (96%), and prognosis in 15 (58%). With respect to JAMA benchmarks, authorship was reported in 16 (61.5%) webpages, attribution in 13 (50%), disclosure in 21 (81%), and currency in 17 (65%). Mean readability scores were 11.5 for GFI, 9.9 for FKGL, and 43.5 for FRE, indicating that the material was generally written above the recommended reading level for patient education resources. Conclusion Online patient information on Gilmore's groin is widely available but is typically written at a reading level that is too advanced for the general public. Improving readability while maintaining accuracy may enhance patient understanding, support shared decision-making, and improve access to health information.
A network of left frontal and temporal brain areas supports language comprehension and production, implementing computations related to word retrieval and combinatorial linguistic processing. Here, we ask: to what extent are responses to language in this language network stable across task contexts, and how does this stability compare to task sensitivity in the domain-general multiple demand (MD) network? Participants (n = 52) read sentences and nonword lists under six task conditions, including passive reading, reading with a memory probe after each stimulus, and reading and answering questions that require deep semantic engagement. The sentences > nonwords contrast isolated the same set of language-responsive voxels across all tasks; the locations of those voxels were participant-specific, highlighting the value of individual-specific functional localization. We, therefore, conclude that language localization is robust to task variation. We then examined the magnitudes and fine-grained activation patterns in these language-responsive voxels (the language network) and in the domain-general MD network, to test whether task demands modulate linguistic computations and/or recruit a distinct brain system. The language network responded robustly to sentences across all tasks, with somewhat higher responses to semantically engaging tasks. In contrast, the MD network responded to both sentences and nonwords in the presence of a task, which warrants caution when using language paradigms that include task demands, as such paradigms engage two independent networks. A multivariate analysis further revealed that stimulus information is more easily decodable in the language network, whereas task information is more decodable in the MD network. These results suggest that the language and MD networks perform complementary functions during task-driven language comprehension, with the language network primarily extracting information from linguistic input and the MD network determining the appropriate response to the task.
Existing studies show a reliable association between literacy and mental health problems, such as anxiety and depressive symptoms. Much of this research has been conducted with primary school-age children, with less research focusing on adolescents and older adults. The current study included a sample of children (N = 478; Mage = 10.1, SDage = 1.9), adolescents (N = 438; Mage = 15.6, SDage = 1.7), and older adults (N = 111; Mage = 68.5, SDage = 6.0) who completed measures of self-reported literacy difficulties, anxiety (reading anxiety, social anxiety, generalised anxiety, panic, separation anxiety, and obsessive-compulsive symptoms) and depressive symptoms. Analyses included partial correlations and associations compared across age groups. We found significant and moderate-to-strong correlations between literacy and reading anxiety for children (parent and self-report), adolescents (parent and self-report), and older adults. There were moderate and significant associations between literacy and social anxiety for children (self-report) and older adults, but not for adolescents. There were moderate and significant associations between literacy and depressive symptoms for children (self-report). These results show associations between literacy and reading anxiety symptoms at various life stages, including childhood, adolescence, and older adulthood, and between literacy and depressive symptoms for children - highlighting the need for pathways of care to support individuals of all ages.
The increasing use of artificial intelligence (AI) in news production raises important questions about how audiences perceive and respond to AI-generated journalism. This preregistered survey experiment (N = 599, German-speaking Switzerland) examines (i) perceptions of article quality (measured as credibility, readability, and expertise) across news excerpts that were human-written, AI-assisted, or fully AI-generated, and (ii) self-reported intentions to engage following disclosure of AI involvement. Participants rated two short news excerpts before learning how they had been produced. Articles across all conditions were evaluated similarly in perceived quality. After disclosure, participants in the AI-assisted and AI-generated conditions reported a higher willingness to continue reading their assigned articles compared to the control group, but future willingness to read AI-generated news did not differ across conditions. Overall, the findings suggest that readers assess AI-generated and human-written news comparably in quality, while disclosure of AI use can momentarily increase curiosity or interest without yet changing longer-term reading intentions.
Pneumonia virus of mice (PVM), the mouse homolog to respiratory syncytial virus (RSV), is increasingly used as a surrogate model to study pneumovirus pathogenesis in a more natural pathogen-host relationship. Two major strains of PVM, strain 15 and J3666, are currently used in laboratories, with preferences for either one or the other based on the well-documented isolation history of strain 15, or the suggested higher virulence of strain J3666. Using conventional and long-read sequencing, we found that the PVM strain J3666 represents two distinct virus populations, which are defined by the sequence and structure of the G and SH genes encoding the putative attachment and small hydrophobic proteins, in addition to further nucleotide polymorphisms. Specifically, a nucleotide polymorphism at position 65 in the G gene results in either an upstream open reading frame (uORF) preceding the main ORF in frame, or an extension of the major G ORF by 18 codons. The impact of the different forms of the J3666-G genes on PVM was examined by generating recombinant PVMs differing exclusively in the distinctive 5' portion of the respective G gene. This revealed that the population with an extended main G ORF was more virulent than the population with a G gene containing an uORF or the parental virus. The presence of a uORF was associated with decreased expression levels of G, whereas the virus with the extended G ORF appeared to express slightly increased levels of G, which suggests that expression levels of G may modulate virulence. The pneumonia virus of mice strain J3666 is considered a more virulent and more suitable model for severe lower respiratory tract infections. The organization of the gene for the attachment protein G is reported to contain a small upstream open reading frame (uORF) preceding the main G ORF in frame. The translated G protein is predicted to comprise 396 amino acids. We report that this virus strain may be a mixture of two different populations, each with differing virulence. The more virulent population encodes a G protein of potentially 414 amino acids instead of a small uORF. The usage of the first start codon in this G gene organization remains to be determined. Importantly, this organization of the G gene is in line with that of several newly identified pneumoviruses, i.e., canine and swine pneumoviruses. These viruses may comprise a distinct group within the Pneumoviridae family.
As patients increasingly seek medical information online, artificial intelligence (AI) chatbots like NIPRGPT-the most widely available AI tool for Department of Defense (DOD) computer users-offer a novel resource for addressing queries about femoroacetabular impingement (FAI). To date, there have not been any studies evaluating NIPRGPT responses to orthopedic medical questions. The primary objective of this study was to evaluate the accuracy, comprehensiveness, and readability of NIPRGPT's responses to common FAI-related questions. Twelve frequently asked questions (FAQs) regarding FAI were selected from a curated list and posed to NIPRGPT. The accuracy and adequacy of the responses were graded by a panel of board-certified surgeons as excellent (not requiring clarification), satisfactory (requiring minimal clarification), satisfactory (requiring moderate clarification), or unsatisfactory (requiring substantial clarification). Additionally, readability was assessed using the Flesch-Kincaid readability score. Of the 12 responses, four (33.3%) were excellent, requiring no clarification, seven (58.3%) were satisfactory, requiring minimal clarification, and one (8.3%) was satisfactory, requiring moderate clarification. No responses were deemed unsatisfactory. The average quality score was 3.38/4.0. However, the average Flesch-Kincaid readability score was a 19.6 Grade Level, indicating a reading level suited for postgraduate or specialized academic backgrounds. Interobserver agreement was low, with a Krippendorff's alpha of 0.046. NIPRGPT provides answers to FAQs about FAI that are generally accurate and reliable. However, the responses are generated at a complexity level far exceeding the recommended reading level for patient education. While a potentially useful adjunct in military healthcare settings where access may be limited, clinicians must be aware of the high literacy demand placed on patients using this tool.
To investigate the medication literacy status of family caregivers for old people with traumatic hip fractures and analyze the influencing factors. The primary caregivers of old people with traumatic hip fractures admitted to our hospital from January 2021 to December 2023 were selected as the research objects. General information was collected using the electronic medical record system or by interviewing family members. The Chinese Medication Literacy Measurement (ChMLM) was used to assess the medication literacy status of the study subjects. Univariate and multivariate linear regression. were conducted to identify the influencing factors of medication literacy among family caregivers for traumatic old people hip fractures. Spearman correlation analysis was used to assess the relationships between variables. The mean total ChMLM score of 204 participants was 10.80 ± 2.95. Multivariate linear regression showed that age, education level, monthly household income, occupation, number of medications used by the patient, length of hospital stay, and reading medication instructions were independent influencing factors (P < 0.05), explaining 45.1% of the variance. Spearman correlation analysis revealed that age < 45 years, education ≥ junior high school, monthly income ≥ ¥5000, employed status, patient's medications < 5 types, hospital stay ≥ 5 days, and reading medication instructions were positively correlated with higher ChMLM scores (all P < 0.05). The medication literacy status of family caregivers for old people with traumatic hip fractures is poor. Health education should be strengthened based on risk factors to improve the home medication use of patients. These findings suggest that clinical health education should target at-risk caregivers to enhance medication literacy and improve patient safety and outcomes.
Randomized controlled trials (RCTs) play a central role in assessing the benefits and harms of interventions. Incomplete reporting in RCT publications can compromise the verifiability and usefulness of RCTs. SPIRIT and CONSORT reporting guidelines aim to improve the completeness of RCT protocols and results publications, respectively. However, many RCTs are not reported completely. Checking manuscripts automatically could help authors improve the completeness of reports prior to publication. We previously annotated SPIRIT-CONSORT-TM, a corpus of 200 articles (comprising 100 protocol-results publication pairs) using 83 checklist items drawn from SPIRIT 2013 and CONSORT 2010. We also trained machine learning models to automatically assess reporting at the item level. Each checklist item can include multiple constituent elements (i.e., specific details required for that item), and an item might be considered fully reported when all of its elements are present. However, prior work does not explicitly capture or evaluate reporting at the element level. To address this gap, we extended SPIRIT-CONSORT-TM by incorporating element-level annotations and using them to assess reporting completeness (SPIRIT-CONSORT-ELM). We formulated element-level assessment as a machine reading comprehension task, operationalized through 119 questions, where each question targets a specific reporting element within a checklist item. Using the 200 articles included in SPIRIT-CONSORT-TM, two annotators independently answered 119 questions for 50 articles (25 protocol-results pairs) and resolved any discrepancies through discussion; the remaining 150 articles (75 protocol-results pairs) were assessed by a single annotator. We then developed an automated pipeline for element-level assessment using SPIRIT-CONSORT-ELM. The pipeline first applies a PubMedBERT-based model to identify sentences containing item-level reporting information, then it uses a generative large language model (LLM; GPT-5) with chain-of-thought reasoning to answer element-level questions based on the retrieved evidence. Agreement between the two annotators was high (Gwet's AC1: 0.782) and our pipeline achieved high accuracy in identifying element-level reporting evidence (F1: 0.822, Gwet's AC1: 0.796). Ablation studies indicate that chain-of-thought reasoning and the inclusion of illustrative in-context examples modestly improve LLM performance on the machine reading comprehension task. SPIRIT-CONSORT-ELM provides a benchmark for evaluating reporting guideline completeness at the element level, enabling assessment of RCT transparency beyond the simple presence or absence of checklist items and is publicly available at https://osf.io/kznx4/ . The automated pipeline establishes a robust baseline for assessing RCT reporting and demonstrates potential as a practical aid for authors, reviewers, and editors to identify and address gaps in completeness and transparency of RCT reports.
Teacher feedback has been argued to enhance students' learning. Recent studies, however, have shown mixed findings. Some studies reported positive, others negligible, and still others a negative impact of teachers' feedback on students' learning achievement. To clarify these ambiguities, research is needed to understand the motivational mechanisms through which teacher feedback shapes students' learning. Through the lens of the Situated Expectancy-Value Theory (SEVT), this study examined whether and how expectancy and value beliefs mediated the relationship between teacher feedback and learning performance. We conducted a structural equation modelling analysis, linking teacher feedback, expectancy, value, and students' performance, using data from 569,311 students from 80 countries. An inconsistency was identified between the direct and indirect effects of teacher feedback on reading performance. Although the direct relationship between teacher feedback and reading performance was negative, the indirect effect demonstrated that teacher feedback might partially enhance performance by boosting students' expectancy and value beliefs. Our findings shed light on the motivational mechanisms by which teacher feedback shapes academic performance. Feedback is adaptive insofar as it can be leveraged to enhance students' expectancy and value beliefs, thereby suggesting a potential pathway for the positive indirect effect of feedback on achievement. For teacher feedback to be truly effective, it needs to target the enhancement of students' motivation, particularly helping students believe that they can succeed, to recognise the importance of what they are learning, and to derive joy from the learning process.
Physical education combines embodied practice with cognitive learning, yet generative artificial intelligence is increasingly used as an academic support tool. This cross-sectional study investigated associations between self-reported generative AI use frequency, reported VARK learning preference patterns, attitudes toward AI, and Technology Acceptance Model beliefs among undergraduate students in PE classes. A cross-sectional survey was conducted with 1,084 full-time PE undergraduates (age 19-28 years). Participants completed PE-adapted instruments assessing reported learning preferences patterns using the VARK framework, technology acceptance beliefs based on the Technology Acceptance Model, attitudes toward generative AI, and self-reported generative AI use frequency. The TAM and VARK instruments were adapted to PE-relevant academic and movement-related tasks. Data were analyzed using descriptive statistics, reliability testing, ANOVA, and multiple regression models. Students reported moderate acceptance of generative AI and positive attitudes toward its academic use. Kinesthetic learning remained the dominant preference, consistent with the movement-based nature of PE. However, higher self-reported AI usage frequency was associated with higher visual and reading-writing scores and lower kinesthetic scores (β = .34-.40; β = -.39, p < .001). Significant differences emerged across learning-style groups for all TAM and AI attitude dimensions (p < .001, η 2 = .086-.142). Regression analysis showed that perceived usefulness, ease of use, and positive attitudes toward AI were the strongest predictors of AI usage, explaining 48% of variance. Frequent AI engagement was associated with greater concentration in in reported VARK score patterns (β = -.62, p < .001). In this cross-sectional sample, self-reported generative AI use was associated with differences in reported learning preference patterns and with stronger technology acceptance beliefs among higher education physical education students. While kinesthetic scores remained highest overall, more frequent AI use was associated with higher visual and reading-writing scores and lower kinesthetic scores. These findings are associative, not causal, and do not show changes in stable learning preference patterns. Future research should examine whether pedagogically grounded AI integration can support diverse learners, particularly students who report stronger kinesthetic preferences.
To assess and compare the accuracy, readability, and overall performance of large language models (LLMs) in answering questions about functional hypothalamic amenorrhea (FHA) for patients and healthcare professionals. A total of 11 patient-level and 15 clinician-level FHA-related questions were entered separately into four LLMs: ChatGPT 3.5 (free version), ChatGPT 4.0 (updated, paid subscription), Gemini, and OpenEvidence. OpenEvidence was used only for clinician-based questions. Responses were evaluated by three expert reviewers blinded to the LLM used who rated them as accurate and complete, accurate but incomplete, or inaccurate. A fourth reviewer resolved discordant scores. Readability for patient-level questions was assessed using the Flesch Reading Ease Score (FRES) and word count. Lower FRES scores indicate more difficult reading. Accuracy and completeness were compared using odds ratios (95% CI) with ChatGPT 3.5 as the reference model, and differences in readability were analyzed using Friedman's test. LLM performance varied across question types. For patient-level questions, ChatGPT 4.0 achieved the highest accuracy (9 of 11; 82%), followed by ChatGPT 3.5 and Gemini (each 8 of 11; 73%), with no statistically significant differences. Among clinician-level questions, OpenEvidence demonstrated perfect accuracy (15 of 15; 100%), compared with 93% for and 80% for ChatGPT 4.0 and Gemini. Completeness followed similar patterns, with OpenEvidence providing the most complete clinician responses (93%) and ChatGPT 4.0 the most complete patient-level responses (89%). Readability differed significantly among models (p = 0.012), with Gemini producing the most readable patient-level content (median FRES 43.5 [IQR 36.8-53.4]) compared with ChatGPT 3.5 (30.6 [16.8-48.4]) and ChatGPT 4.0 (28.8 [22.1-37.6]). Word counts did not differ significantly (p = 0.39). LLMs demonstrated good overall performance in answering FHA-related questions but often provided incorrect or incomplete information. Fine tuning field-specific data, engineered prompts, and obtaining human-in-the-loop feedback may help improve the accuracy of these models.
Poetry condenses language into minimal forms, evoking emotion, imagery, and aesthetic judgment, yet the neural basis of such evaluations remains poorly understood. We investigated how the brain evaluates two structurally matched but thematically distinct poetic forms: nature-themed Haiku and emotion-themed Senryu. Participants read poems and rated them across five dimensions-aesthetic appeal, vivid imagery, being moved, originality, and creativity-while EEG was recorded. Using multiclass gradient-boosted tree models with SHapley Additive exPlanations, we predicted evaluative ratings from oscillatory neural features across temporal windows and scalp regions. Models outperformed linear baselines and showed limited cross-theme generalization, indicating content-specific neural encoding. Distinct processing patterns emerged: Senryu showed stronger beta-band contributions, whereas Haiku engaged more distributed multifrequency dynamics. Temporal profiles also differed, with Haiku showing sustained engagement across reading and contemplation phases and Senryu showing earlier evaluative resolution during reading. Prestimulus neural activity contributed to prediction of subsequent evaluations, suggesting a role for anticipatory brain states in aesthetic evaluation. Across poems, evaluative dimensions converged on a dominant shared axis that was reliably predicted from neural features. Together, these findings suggest that aesthetic evaluation of poetry reflects an interaction between anticipatory neural states, content-specific oscillatory dynamics, and dimension-specific processes organized around a shared evaluative axis. This work establishes poetry as a tractable model system for studying how the brain constructs meaning and value from minimal linguistic input.
The aim of this study was to examine the relationship between temporal processing and literacy performance in children with hearing aids. The study included a total of 45 children, comprising 21 with typical hearing and 24 who used hearing aids, all of whom were attending the second, third, or fourth grades of primary school. Temporal processing skills were evaluated using the Frequency Pattern Test and the Duration Pattern Test (DPT), whereas reading and writing abilities were assessed through the Literacy Assessment Battery. Children using hearing aids demonstrated statistically significantly lower performance in the DPT and in the test assessing writing skills compared to their typically hearing peers. A positive, significant correlation was found between temporal processing skills and literacy skills. Moreover, temporal processing and literacy performance were observed to be negatively correlated with the age at which the child's first amplification was provided and positively correlated with the duration of auditory rehabilitation. This study found that in children using hearing aids, performance on temporal pattern tests was significantly correlated with reading and writing skills. Furthermore, these findings suggest that early amplification and consistent auditory rehabilitation may be correlated with better temporal processing and literacy skills.