搜索结果：debugging

共找到 20 条结果

高级筛选 ▾

Measuring and mitigating debugging effectiveness decay in code language models.

PubMed2025-12-18作者：Adnan M, Kuhn CCN

The effectiveness of AI debugging follows a predictable exponential decay pattern; most models lose 60-80% of their debugging capability within just 2-3 attempts, despite iterative debugging being a critical capability for practical code generation systems. We introduce the Debugging Decay Index (DDI), a mathematical framework that quantifies when debugging becomes ineffective and predicts intervention points. Our strategic fresh start approach shifts from exploitation to exploration at strategic points in the debugging process, demonstrating that well-timed interventions can rescue the effectiveness of debugging. DDI reveals a fundamental limitation in current AI self-debugging and provides the first systematic metric to gauge LLM-based code generation.

ggplotAgent: a self-debugging multi-modal agent for robust and reproducible scientific visualization.

PubMed2026-01-01作者：Wang Z, Yin Y, Wang J

Creating publication-quality visualizations is essential for bioinformatics but remains a bottleneck for researchers with limited coding expertise. While Large Language Models (LLMs) are proficient at generating code, they often fail in practice due to library dependencies, dataset mismatches, or syntax errors. These issues require manual intervention, slowing data interpretation. We present ggplotAgent, a novel multi-modal, self-debugging artificial intelligence agent that automates publication-ready ggplot2 visualizations. It features a dual-layered framework that resolves code execution errors and uses a vision-enabled agent to verify aesthetic correctness. In benchmarks against the DeepSeek-V3 model, ggplotAgent achieved a 100% code executability rate(versus 85%) and a "Publication-Ready" score of 1.9 (versus 0.7). Surprisingly, it showcased the ability to act as an expert collaborator by intelligently enhancing plots beyond the user's literal prompt, achieving a positive Insight Score of +0.3 over than the baseline (-0.05). These results demonstrate its ability to reliably produce accurate, high-quality visualizations directly from natural language. ggplotAgent is freely accessible as a public web application at https://ggplotagent.databio1.com/ and an offline Streamlit app. The source code is available on GitHub at https://github.com/charlin90/ggplotAgent. This software is distributed under the MIT License.

Bioinformatics advances

查看原文 ↗

QTL mapping, breeding, and debugging Saccharomyces cerevisiae strains through Reiterated Mass Selection and backcrosSing (ReMaSSing).

PubMed2026-02-12作者：de Bem LS, Barreto JA, de Souza DT

Producing second-generation ethanol from lignocellulosic hydrolysates (LCHs) poses significant challenges for Saccharomyces cerevisiae due to the presence of fermentation inhibitors. Quantitative trait loci (QTL) mapping of stress-tolerant S. cerevisiae strains is important for identifying adaptive alleles that can enhance yeast fermentation of LCHs. However, the QTL mapping process is labor-intensive, requiring the screening of numerous recombinants and repeated crossings to improve mapping resolution. We developed Reiterated Mass Selection and backcrosSing (ReMaSSing) to facilitate the identification of adaptive alleles through QTL mapping and to enhance LCH tolerance in yeast strains. ReMaSSing was applied to populations obtained by crossing the stress-resistant yeast PE-2_H4 with the laboratory strain S288C. Using alternative protocols, we selected haploid or diploid populations with dominant markers, enriching millions of segregants carrying adaptive alleles by propagating them in standard or LCH-supplemented media. The enriched pools were then bulk backcrossed with S288C, and germination of millions of spores generated new recombinant populations for subsequent selection cycles. After five rounds of ReMaSSing, whole-genome sequencing and QTL mapping identified key alleles associated with LCH tolerance, linked to VPS70, CAT5, GCY1, UBP2, MKT1/SAL1, HAP1, and PHO84, which influence growth and mitochondrial function in S288C. Mutations in IRA1 and HTA1, unique to our S288C strain, were also mapped, highlighting ReMaSSing's ability to "debug" the S288C background, i.e., to purge detrimental variants through selection. Allele swapping and competition assays confirmed that the identified QTL improved LCH tolerance and growth, with strains combining adaptive alleles performing over 20% better than the parental S288C. Finally, applying ReMaSSing to breed an LCH-tolerant yeast with a xylose-consuming strain produced recombinants with improved fermentation of xylose-enriched LCH. ReMaSSing offers a practical protocol for generating QTL mapping populations to identify adaptive alleles in tolerant strains and correct genetic defects in inferior ones. Notably, recombinant populations and clones derived from ReMaSSing outperformed both parental strains in LCH tolerance and growth. Furthermore, we applied ReMaSSing to breed strains with enhanced LCH tolerance, efficient xylose catabolism, and robust ethanol production. Together, these results demonstrate that ReMaSSing is a powerful tool for engineering industrial yeast strains that integrate desirable traits from multiple parental backgrounds.

Biotechnology for biofuels and bioproducts

Software belief reliability growth model incorporating change point and imperfect debugging based on uncertain differential equation approach.

PubMed2025-12-12作者：Sharma N, Kumar V, Khan AA

Many software reliability growth models (SRGMs) have been proposed by researchers within the context of probability theory to estimate software reliability, remaining number of faults and optimal release time. The Fault Detection Rate (FDR) may vary because of changes in testing strategies. Due to lack of knowledge of software code, the testing team might be unable to rectify the detected faults thereby introducing new faults during the fault correction process. The debugging process is imperfect due to factors like human error, insufficient testing and complex codes resulting in epistemic uncertainty. In this paper, we have proposed a new software belief reliability growth model (SBRGM) using uncertain differential equations to deal with epistemic uncertainty effectively. We have incorporated imperfect debugging and change point based on the approach of belief reliability theory, making this model more accurate as compared to some of the previously developed models. Model parameters estimation methodology is derived using the least square method and Python version 3.10. Calculation of change point is done using empirical data analysis based on the First principle of Derivatives. Three real data sets have been used to validate the proposed model. This research contributes to being more flexible and realistic in dealing with epistemic uncertainty effectively as compared to conventional approaches.

Scientific reports

查看原文 ↗

Large Language Model-Based Interactive Code Generation for Developing a 3D Eye Movement Schematic.

PubMed2026-04-01作者：Hamasaki I, Kunimi K, Shibata K

Introduction Understanding the three-dimensional (3D) geometric relationship between extraocular muscles and the globe is essential for strabismus management. Conventional educational tools are static, and existing 3D biomechanical software requires highly specialized skills, making routine clinical use difficult. Furthermore, image-generating artificial intelligence (AI) frequently produces anatomically incorrect outputs (hallucinations). This study aimed to develop a structurally coherent, interactive 3D eye movement schematic as a proof-of-concept, using the coding capabilities of a large language model (LLM). Methods We used an LLM to generate web-based 3D schematic code (HTML and JavaScript/Three.js) exclusively through natural language dialogue. To prevent anatomical errors, we explicitly defined anatomical parameters based on standard literature (e.g., 12-mm scleral radius) and employed mathematical constraints, including quaternions for rotation and spherical linear interpolation for muscle paths, within the prompts. The generated code was rendered in a web browser, and an iterative process of prompt refinement and debugging was conducted until two board-certified ophthalmologists confirmed the schematic's structural validity. Results A functional, interactive 3D eye movement schematic was successfully developed. In our 10-trial evaluation, generating an acceptable schematic required an average of 7.4 prompt inputs per session. While complex spatial instructions had a lower success rate (40%) due to AI hallucinations, iterative prompt repetition and specific local debugging instructions resolved these issues. The final schematic provided a structurally coherent representation of the globe, the four rectus muscles, and the annulus of Zinn. It featured a slider interface enabling real-time, kinematic visualizations of eye rotations, muscle deformations, and optic nerve bending without structural failure. Conclusions Translating anatomical descriptions into mathematical spatial logic via LLMs enables the creation of structurally sound 3D medical schematics. This logical spatial construction approach democratizes the development of interactive educational tools. It allows healthcare providers without programming expertise to intuitively generate customizable 3D educational materials for patient consultations and foundational medical education through natural language dialogue.

Cureus

Metacognitive strategy use in GenAI-supported academic reading: a qualitative study of postgraduate students in UK higher education.

PubMed2026-01-01作者：Dai Y

In recent years, generative artificial intelligence (GenAI) tools such as ChatGPT have been increasingly integrated into academic reading in higher education. Although GenAI can support processing complex academic texts, its effective use requires learners to employ metacognitive strategies to avoid uncritical reliance. However, how second language (L2) learners use such strategies in GenAI-supported academic reading remains underexplored. Situated in UK higher education, this qualitative study examines how 12 postgraduate L2 students employ metacognitive strategies when using ChatGPT for English academic reading. Data from interviews and retrospective reflections were thematically analyzed, while chat logs were used as supplementary descriptive evidence. The findings identify five categories of metacognitive strategies, namely planning, monitoring, evaluating, information management, and debugging. While many strategies align with prior academic reading research, others are specific to the GenAI context, particularly debugging practices such as correcting GenAI errors and developing personalized prompt templates. Differences were also observed across learners with varying language proficiency, especially in verification and prompt refinement behaviors. This study contributes by providing a qualitative account of metacognitive regulation in GenAI-supported academic reading and extending metacognitive strategy frameworks to GenAI-mediated learning environments.

Frontiers in psychology

查看原文 ↗

spatiAlytica: Viewer-Grounded Multimodal Agentic System for Interactive Spatial Omics Analysis.

PubMed2026-05-04作者：Das A, Zhang K, Song J

Spatial transcriptomics and proteomics map tissue architecture and cellular interactions, but analysis remains limited by programming demands and text-centered AI agents that lack viewer grounding and cross-turn context. We present spatiAlytica, a viewer-centric multimodal interactive agentic system embedded in the Napari viewer that enables non-programmer biologists to perform iterative, hypothesis-driven spatial omics analysis via natural language. spatiAlytica couples viewer-state serialization, agentic memory, biological concept-to-data-field mapping, code generation and debugging, Spatial VQA, and grounded interpretation to support an exploratory analysis and interpretive reasoning workflow. We introduce spatiAlyticaBench, a comprehensive benchmark spanning 222 single-turn spatial analytical coding questions, 178 multi-turn sequential workflow questions, and 7,350 image-grounded reasoning questions. spatiAlytica outperformed strong agentic baselines, while using less time and tokens. Case studies across Kaposi's sarcoma, colorectal cancer, and ovarian cancer recapitulated known spatial patterns and uncovered progressive CD8 T-cell dysfunction during KS progression.

bioRxiv : the preprint server for biology

查看原文 ↗

Cross-cultural validation of the Chinese version of the nurses' ethical decision-making around end-of-life care scale: a cross-sectional study.

PubMed2026-02-24作者：Liu S, Xu D, Wen S

In end-of-life care, nurses’ ability to make sound ethical decisions is critical to safeguarding patients’ dignity and quality of life. However, China still lacks a measurement tool tailored to this specific context for assessing such competence. This study, therefore, aimed to localize the Nurses’ Ethical Decision-Making around End-of-Life Care Scale (NEDM-EOLCS) into Chinese and to examine its psychometric characteristics among Chinese nurses. A cross-sectional study design was conducted. The study was conducted between October 2024 and December 2024. The Chinese version of the NEDM-EOLCS scale was initially developed using the Brislin translation model, cross-cultural debugging, and a pre-survey tailored to Chinese linguistic and cultural contexts. 450 nurses completed the Chinese version of the NEDM-EOLCS scale. Exploratory factor analysis (EFA) was employed to analyse the data from Group 1 (n = 225) in order to elucidate the factor structure, whereas Group 2 data (n = 225) were subjected to confirmatory factor analysis (CFA) to verify the model’s suitability; additionally, convergent validity, discriminant validity, and reliability tests were carried out. A total of 450 nurses participated in the survey. The Scale-level Content Validity Index (S-CVI) was 0.98; EFA extracted three factors and explained 61.816% of the total variance; CFA confirmed that all the goodness-of-fit indices were acceptable. The Cronbach’s alpha of the Chinese version of the NEDM-EOLCS was 0.962, and the retest reliability coefficient was 0.896. The Chinese version of NEDM-EOLCS had 55 items in 3 dimensions. The Chinese version of NEDM-EOLCS is scientifically reasonable and has good reliability and validity. It can be used to investigate Chinese nurses’ ethical decision-making around end-of-life care.

BMC nursing

查看原文 ↗

Leveraging Case-Based Learning Exercises in Pharmacology Courses to Promote AI Readiness Among Student Pharmacists.

PubMed2026-03-01作者：Munusamy S, Rao S, Rajagopalan V

This study aimed to improve student pharmacists' confidence and metacognitive awareness in using artificial intelligence (AI) tools through case-based learning in pharmacology. Generative AI tools were integrated into pharmacology coursework to foster AI readiness, guiding students to triangulate AI-derived information with evidence-based drug information resources. Second-year student pharmacists (P2) from 3 colleges of pharmacy completed 2 case-based assignments in their pharmacology courses. Upon orientation to effective prompt writing, students gathered AI-generated pharmacological information and verified it using evidence-based drug references. Pre and postintervention surveys assessed students' confidence in using AI tools, whereas a metacognition survey (Metacognitive Awareness Inventory) evaluated their planning, monitoring, debugging, and evaluation skills. Eighty-five students (66%) completed the preintervention survey, and 59 (46%) completed the postintervention survey. Overall confidence in using AI tools significantly increased from 64.0 ± 12.9% to 89.4 ± 3.1%. Metacognition survey results showed that most students (74.6% to 94.9%) planned, monitored, debugged, and evaluated use of AI; notably, 94.5% identified strategies they would reuse. Students identified academic integrity concerns (69.5%), reliability (52.5%), and ethical issues (50.8%) as primary barriers to AI adoption. Students indicated their likelihood to use AI for concept comprehension and generating study guides. They recommended additional AI training in course activities and clear academic integrity guidelines to support AI use in pharmacy education. Leveraging case-based assignments to foster AI competency can effectively help student pharmacists gain confidence and metacognitive skills. Addressing concerns about academic integrity and reliability will be essential for the effective adoption of AI in pharmacy curricula.

American journal of pharmaceutical education

IGTG&R: An Intent Analysis-Guided Unit Test Generation and Refinement Framework.

PubMed2026-01-09作者：Liu X, Zhang Y

Code coverage-guided unit test generation (CGTG) and large language model-based test generation (LLMTG) are two principal approaches for the generation of unit tests. Each of these approaches has its inherent advantages and drawbacks. Tests generated by CGTG have been shown to exhibit high code coverage and high executability. However, they lack the capacity to comprehend code intent, which results in an inability to identify deviations between code implementation and design intent (i.e., functional defects). Conversely, although LLMTG demonstrates an advantage in terms of code intent analysis, it is generally characterized by low executability and necessitates iterative debugging. In order to enhance the ability of unit test generation to identify functional defects, a novel framework has been proposed, entitled the intent analysis-guided unit test generation and refinement (IGTG&R) model. The IGTG&R model consists of a two-stage process for test generation. In the first stage, we introduce coverage path entropy to enhance CGTG to achieve high executability and code coverage of test cases. The second stage refines the test cases using LLMs to identify functional defects. We quantify and verify the interference of incorrect code implementation on intent analysis through conditional entropy. In order to reduce this interference, the focal method body is excluded from the code context information during intent analysis. Using these two-stage process, IGTG&R achieves a more profound comprehension of the intent of the code and the identification of functional defects. The IGTG&R model has been demonstrated to achieve an identification rate of functional defects ranging from 65% to 89%, with an execution success rate of 100% and a code coverage rate of 75.8%. This indicates that IGTG&R is superior to the CGTG and LLMTG approaches in multiple aspects.

Entropy (Basel, Switzerland)

查看原文 ↗

Reliability and Validity of the Chinese Version of Peer Mental Health Stigmatization Scale Among Adolescents.

PubMed2026-05-01作者：Hu X, Wang Y, Lu D

Mental health stigma (MHS) presents a significant barrier to help-seeking and adversely affects the quality of life and support for adolescents with mental health difficulties, yet culturally adapted assessment tools for adolescents in China remain scarce. The objectives were to translate the Peer Mental Health Stigmatization Scale (PMHSS) into Chinese and evaluate its psychometric properties, including reliability and criterion validity. The Chinese version of PMHSS (C-PMHSS) was developed through forward and backward translation, synthesis, comparison and cross-cultural debugging. Psychometric properties were evaluated in a stratified cluster sample of 530 adolescents (13-18 years). Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) assessed structural validity, while reliability was tested through Cronbach's alpha and test-retest correlations. Factor analyzes confirmed a two-factor negative subscale (57.74% variance; χ²/df = 2.303, RMSEA = 0.071) and trifactorial positive subscale (70.32% variance; χ²/df = 2.143, RMSEA = 0.066). The C-PMHSS demonstrated strong internal consistency (α = 0.83) and test-retest reliability (r = 0.86). Significant MHS variations emerged across age, grade, and sex (p < 0.001). The C-PMHSS demonstrates robust psychometric properties, establishing itself as the first psychometrically validated Chinese instrument for early MHS identifying among adolescents. It holds promise for the early identification of adolescents with elevated mental health stigma and for guiding tailored interventions to reduce MHS and promote adolescent mental wellbeing.

Journal of child and adolescent psychiatric nursing : official publication of the Association of Child and Adolescent Psychiatric Nurses, Inc

Development and Validation of the Intimate Partner Violence Nursing Competency Scale (IPVNCS): A Psychometric Tool to Strengthen Clinical Detection and Intervention.

PubMed2026-01-26作者：Casero-Benavente D, Mudarra-García N, Charneco-Salguero G

Background: Intimate partner violence (IPV) represents a major public health problem in Europe, with significant physical, psychological, and social consequences. Nurses are often the first professionals capable of detecting early signs of IPV, yet they lack validated instruments to assess their clinical competency in detection, evaluation, documentation, and intervention. This study aimed to develop and validate the Intimate Partner Violence Nursing Competency Scale (IPVNCS), aligned with the Nursing Intervention Classification (NIC 6403). Methods: A cross-sectional psychometric study was conducted among registered nurses in the Community of Madrid. A 30-item Likert-type self-administered instrument (1-5 scale) was developed based on NANDA, NIC 6403, and NOC frameworks. A total of 202 nurses participated. Reliability was assessed through Cronbach's alpha. Construct validity was examined using exploratory factor analysis (EFA) with Promax rotation and confirmatory factor analysis (CFA) using AMOS 26. Ethical approval was obtained (CEU San Pablo, code 843/24/104). Results: After item refinement, 26 items remained across four dimensions: (1) Intervention and Referral, (2) Detection and Assessment, (3) Documentation and Recording-keeping, (4) Psychosocial Support. The instrument showed excellent reliability (α = 0.97). KMO was 0.947 and Bartlett's test was significant (p < 0.001). CFA demonstrated satisfactory fit: χ2/df = 2.066, RMSEA = 0.073, CFI = 0.92, TLI = 0.91, NFI = 0.86. The final model adequately represented the latent structure. After debugging, its psychometric properties were significantly improved. Four redundant items were eliminated, achieving internal consistency (α = 0.97), a KMO value of 0.947 and a significant Bartlett's test of sphericity. It showed a better fit, according to χ2/df = (2.066); Parsimony = (720.736); RMR (0.0529; RMSEA (0.073); NFI (0.860); TLI (0.910) and CFI (0.920). The final model provides an adequate representation of the latent structure of the data. This study provides initial evidence of construct validity and internal consistency reliability of the IPVNCS. Conclusions: The IPVNCS is a valid and reliable tool to assess nursing competencies for clinical management of IPV. It supports structured evaluation across four core nursing domains, enabling improved educational planning, clinical decision-making, and quality of care for victims. The scale fills a gap in clinical nursing assessment tools and can support protocol development in emergency, primary care, and hospital settings.

Journal of clinical medicine

Symmetry-guided explainable deep learning for colon cancer diagnosis: model benchmarking, cross-validation, statistical analysis, and explainability via ablation studies.

PubMed2026-01-01作者：Solanki A, Gopani D, Mangla M

Histopathological tissue reveals natural radial and bilateral symmetry in glandular structures, which becomes progressively disrupted during malignant transformation. Leveraging this observation, this work presents a VGG16-based deep learning model enriched with symmetry-aware interpretation for early detection of Colon Adenocarcinoma. The traditional approaches are not straightforward enough and acts as "black boxes" diminishing their clinical adoption and acceptance in real-world scenario. Current research work uses the most recent breakthroughs in deep learning on medical imaging and integrates Explainable AI strategies such as LIME, SHAP, and Grad-CAM into the model to interpret how cancer-induced symmetry distortions influence model decisions. This work is experimented on a balanced dataset of 10,000 histopathological scans, including 5,000 Colon Adenocarcinoma tissue samples and 5,000 Benign Colon Tissue samples. This research aims to shed light on how benign tissues preserve consistent symmetric glandular patterns; while cancerous samples exhibit pronounced asymmetry, irregular boundaries, and disrupted structural repetition. Authors further aim to quantify these differences using lightweight 2D symmetry indices, demonstrating a clear separation between normal and malignant tissues. Current research presents a highly precise model for the diagnosis of colon cancer using a VGG16 CNN that achieves an encouraging test accuracy of 99.85%. The model exhibited very high precision, recall, and F1-scores for both classes, normal and cancer, as demonstrated by the classification report. Among various XAI techniques, Grad-CAM demonstrated speed and scalability making it an appropriate choice for its large-scale deployment in healthcare. SHAP, though computationally costly, offered theoretical robustness and great insight. LIME was handy in local interpretability, especially convenient in debugging individual predictions.

Frontiers in artificial intelligence

Rules railroad: Syntax-inspired diagrams for visualizing and understanding rule-based model specifications.

PubMed2026-03-01作者：Patel RJ, Blinov ML

Rule-based modeling provides a powerful framework for describing and simulating biochemical systems composed of multi-site molecules and multi-molecular species. By encoding molecular interactions as rules rather than enumerating all possible species, this approach naturally accounts for the combinatorial complexity of connectivity within chemical species. Despite these advantages, visualization of such models remains challenging. Existing approaches, such as contact maps, give a high-level overview of possible sites and interactions but lack explicit representation of dynamic processes, while traditional rule cartoons split reactants and products across a reaction arrow, separating molecular context from transformation. We introduce Rules Railroad (RRR) diagrams, a novel diagrammatic representation of rule-based model specification. Each RRR diagram encapsulates a single rule as a continuous flow diagram with embedded actions, including binding, unbinding, and state changes. Inspired by classical railroad (syntax) diagrams used to represent formal grammars, RRR diagrams encode both the structural context and the transformations of a rule in a unified format, more compact compared to a classical visualization approach of presenting a single rule as a reactant-product pair. This integration reduces ambiguity, enhances readability, and provides a systematic, human- and machine-readable visualization of any rule-based system. RRR diagrams are precise, and suitable for debugging, communication, and education.

PLoS computational biology

查看原文 ↗

Adaptive job recrafting of gig workers: Concept, measurement, and validation of its impact on job satisfaction.

PubMed2026-02-01作者：Lin Q, Sun R, Zhu Q

In the gig economy dominated by algorithmic control, online labor platforms tend to reduce work resources while increasing job demands, leading gig workers to face weakened agency and declining job satisfaction. This study integrates adaptive structuration theory and job crafting theory to propose the concept of adaptive job recrafting for gig workers who perceived algorithmic control, systematically exploring its connotations and impact mechanisms through a multi-stage mixed-methods research design. First, the grounded theory method is applied to construct a four-dimensional model of adaptive job recrafting comprising algorithm task debugging, socio-technical mutual construction, collaborative skill expansion, and identity cognition evolution. Subsequently, a 19-item measurement instrument was developed following rigorous scale development procedures, with reliability and validity confirmed through empirical testing. Finally, based on the Job Demands-Resources model, a longitudinal survey was conducted to empirically test the positive impact mechanism of adaptive job recrafting on job satisfaction, revealing the partial mediating role of work engagement and the moderating role of psychological capital. This study transcends the limitations of traditional job crafting theory by unveiling the adaptive mechanisms of human-technology interaction under algorithmic control. The practical implications provide evidence for platforms to optimize algorithmic design (such as reserving crafting spaces and incorporating performance evaluation) and for gig workers to construct systematic crafting strategies, helping to achieve a dynamic balance between control and empowerment.

Acta psychologica

查看原文 ↗

Illuminating LLM Coding Agents: Visual Analytics for Deeper Understanding and Enhancement.

PubMed2026-05-19作者：Wang J, Chen Y, Pan M

Coding agents powered by large language models (LLMs) have gained traction for automating code generation through iterative problem-solving with minimal human involvement. Despite the emergence of various frameworks, e.g., LangChain, AutoML, and AIDE, ML scientists still struggle to effectively review and adjust the agents' coding process. The current approach of manually inspecting individual outputs is inefficient, making it difficult to track code evolution, compare coding iterations, and identify improvement opportunities. To address this challenge, we introduce a visual analytics system designed to enhance the examination of coding agent behaviors. Focusing on the AIDE framework, our system supports comparative analysis across three levels: (1) Code-Level Analysis, which reveals how the agent debugs and refines its code over iterations; (2) Process-Level Analysis, which contrasts different solution-seeking processes explored by the agent; and (3) LLM-Level Analysis, which highlights variations in coding behavior across different LLMs. By integrating these perspectives, our system enables ML scientists to gain a structured understanding of agent behaviors, facilitating more effective debugging and prompt engineering. Through case studies using coding agents to tackle popular Kaggle competitions, we demonstrate how our system provides valuable insights into the iterative coding process.

IEEE transactions on visualization and computer graphics

查看原文 ↗

How generative AI is shaping research software development and maintenance at a research-intensive university.

PubMed2026-01-01作者：Besser SA, Jensen EA, Katz DS

Generative artificial intelligence is spreading rapidly across academic research, yet its role in the development and maintenance of research software remains insufficiently characterized. A six week, institutional review board approved, anonymized online survey of faculty and research staff at a large research intensive university in late 2024 (n = 251). Branching survey questions distinguished general users of research software from those who create or maintain it. Quantitative associations were examined using chi square or Fisher's exact tests, and free text descriptions of generative AI use in software development were analyzed thematically. Overall, 29% of respondents reported using generative AI for at least one research task. Within the subsample of active research software developers, 33% reported using generative AI for software development and 51% indicated continued or planned future use. No statistically significant associations were found for age, recency of highest degree, or external funding. Gender was significantly associated with generative AI use for software development, with higher uptake among men than women (41% versus 15%; χ 2(1)=5.03, p=.025). Reported generative AI uses clustered around four practical roles: generating initial code and queries, supporting debugging and testing, transforming data or commands via natural language prompts, and reducing cognitive burden in repetitive or complex tasks. At a large research intensive university, generative AI adoption in research software development is already common among active developers and is expected to expand. The observed gender disparity signals a potential equity risk as tool assisted development becomes normalized. These findings provide an empirical baseline for multi institution replication and for evaluating how generative AI may reshape the organization and distribution of research software work.

Open research Europe

查看原文 ↗

Real-time digital hardware trigger system for the Central Detector of the Taishan Antineutrino Observatory.

PubMed2026-01-01作者：Wang Y, Cao P, Zhu Y

The Taishan Antineutrino Observatory (TAO) is a high-energy resolution reactor antineutrino experiment designed to measure the fine structure of the reactor antineutrino energy spectrum. It employs silicon photomultipliers (SiPMs) to detect photons produced by secondary particles from antineutrino interactions in a gadolinium-doped liquid scintillator. The physics event rate of the TAO is ∼520 Hz. However, the use of 4024 SiPM arrays results in a high dark noise event rate, leading to a total event rate of up to 1 GHz. This presents a significant challenge in the trigger system design: how to accurately and efficiently select rare effective physics events in real-time amidst a vast amount of noise. This paper introduces a fully digital hardware trigger system. The system features a flexible, reconfigurable two-level processing architecture, combined with a real-time triggering algorithm based on the multiplicity trigger criterion. The trigger system has been tested with the simulation data, and a preliminary joint test with the detector system has been completed. The results of the simulation test with a single module suggest that the trigger system can accurately extract the 1 kHz simulation physics events from the substantial amount of dark noise and upload the triggered data to the DAQ system. Besides, in the preliminary joint test, the trigger system accurately extract the given effective physics event data while compressing the hit rate of dark noise from 2 MHz to 500 Hz. The trigger system has been successfully installed and deployed at the TAO experimental site. It has undergone integrated debugging with the full-scale detector and Front-End Electronics (FEC), and preliminary data acquisition tests have been completed. The design objectives of the triggering system have been fulfilled, demonstrating its correctness and reliability in practical application scenarios.

The Review of scientific instruments

Bridging awareness and behavior: decoding implicit metacognitive behaviors in AI-assisted programming via fine-grained log analysis.

PubMed2026-01-01作者：Hou X, Liu Z, Wang G

While generative artificial intelligence offers transformative potential for programming education, its impact on students' internal cognitive and behavioral patterns remains underexplored. This study aims to address this "black box" issue by investigating how AI-driven interventions influence metacognitive regulation and self-regulated learning. A randomized controlled trial was conducted with 122 Computer Science undergraduates (mean age = 19.6 years; 28.2% female) from a university in China. Participants were assigned to an AI-assisted intervention group (n = 62) or a control group (n = 60) within a Python programming course. Using a customized Jupyter environment, an integrated autonomous AI agent monitored real-time behavioral logs and triggered non-directive, process-oriented prompts based on specific algorithmic thresholds. Data collection integrated fine-grained log analysis with standardized assessments to quantify implicit planning, monitoring, and regulation processes. The AI intervention significantly optimized learning behaviors, facilitating a shift from impulsive "trial-and-error" approaches to deliberate planning and superior debugging precision. These behavioral improvements were accompanied by significant gains in both academic performance and subjective metacognitive awareness compared to the control group. The findings confirm that when designed as a process-oriented scaffold, AI functions as a catalyst for self-regulated learning rather than a passive crutch. This study highlights the role of AI as a psychological scaffold that supports metacognitive regulation, providing an evidence-based blueprint for the design of effective learning environments in educational psychology.

Frontiers in psychology

查看原文 ↗

ClinPreAI: An Agentic AI System for Early Postpartum Depression Risk Prediction from Multimodal EHR Data.

PubMed2026-03-18作者：Palacios D, Aras S, Zhong Y

Postpartum depression (PPD) affects 10-15% of individuals annually, yet early identification and treatment remains challenging. We introduce ClinPreAI, a novel agentic AI system that autonomously designs, implements, and evaluates machine learning solutions for PPD risk prediction using multimodal electronic health record data. We analyzed data from 4,161 pregnant individuals hospitalized prior to delivery for medical or obstetrical complications at Texas Children's Hospital (2012-2025), extracting 27 structured clinical variables and social worker notes. The primary outcome was Edinburgh Postnatal Depression Scale (EPDS) score ≥10 (31.0% prevalence) within 6 months after delivery, indicating clinically significant depressive symptoms. ClinPreAI operates through five specialized modules that iteratively refine predictive models through autonomous experimentation. ClinPreAI demonstrated strong performance across modalities. On structured data, it achieved F1: 0.68 ± 0.03, outperforming traditional AutoML (F1: 0.64 ± 0.02) and commercial solutions (AWS Canvas F1: 0.54-0.55). On multimodal data, ClinPreAI achieved F1: 0.65 ± 0.04, matching custom LLM-XGBoost (F1: 0.65 ± 0.01) and outperforming zero-shot models (Claude Opus F1: 0.51-0.52). This represents the first application of agentic AI to perinatal mental health prediction. Our results demonstrate that autonomous AI agents can democratize sophisticated predictive modeling in clinical settings, which is particularly valuable where domain experts lack ML training. By automating experimentation and debugging, agentic systems lower barriers to developing robust clinical prediction tools while maintaining interpretability.

medRxiv : the preprint server for health sciences

查看原文 ↗