Large Language Models have revolutionized information processing, yet their reliability is severely compromised by faithfulness hallucinations. While current approaches attempt to mitigate this issue through node-level adjustments or coarse suppression, they often overlook the distributed nature of neural information, leading to imprecise interventions. Recognizing that hallucinations propagate through specific forward transmission pathways like an infection, we aim to surgically block this flow using precise structural analysis. To leverage this, we propose Lancet, a novel framework that achieves precise neural intervention by leveraging structural entropy and hallucination difference ratios. Lancet first locates hallucination-prone neurons via gradient-driven contrastive analysis, then maps their propagation pathways by minimizing structural entropy, and finally implements a hierarchical intervention strategy that preserves general model capabilities. Comprehensive evaluations across hallucination benchmark datasets demonstrate that Lancet significantly outperforms state-of-the-art methods, validating the effectiveness of our surgical approach to neural intervention.
A dynamic view of mass assembly is essential for understanding the formation of massive stars and clusters. Interpreting evolutionary diagnostics from Galactic-wide surveys, however, requires careful control of distance and environmental variations. The G316.8 filament provides an ideal laboratory: a 14-pc nearly linear structure composed of three contiguous subregions with comparable molecular gas reservoirs (~10,000 $M_\odot$ each) but spanning a clear evolutionary sequence from an infrared dark cloud (young) through a massive young stellar object (intermediate) to an HII region (evolved). As part of the Linear filament and nested cluster evolution tomography (LANCET) project, we mapped the full filament with the Atacama Compact Array at 1.3 mm, achieving 0.08 pc resolution over 17.1 pc$^2$. Combined with Herschel and APEX/ArTéMiS data, we derived high-resolution temperature and column-density maps. We quantify structural evolution using dense-fragment statistics, column-density PDFs, and $Δ$-variance analysis. From young to evolved regions, the maximum fragment mass increases from 8 to 490 $M_\odot$, while the dense-gas mass fraction ($>0.5$ g cm$^{-2}$) rises from 0.4% to 9.
Diffusion models are widely used for image editing tasks. Existing editing methods often design a representation manipulation procedure by curating an edit direction in the text embedding or score space. However, such a procedure faces a key challenge: overestimating the edit strength harms visual consistency while underestimating it fails the editing task. Notably, each source image may require a different editing strength, and it is costly to search for an appropriate strength via trial-and-error. To address this challenge, we propose Concept Lancet (CoLan), a zero-shot plug-and-play framework for principled representation manipulation in diffusion-based image editing. At inference time, we decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. This allows us to accurately estimate the presence of concepts in each image, which informs the edit. Based on the editing task (replace/add/remove), we perform a customized concept transplant process to impose the corresponding editing direction. To sufficiently model the concept space, we curate a conceptual representation
The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters. However, it faces the challenge of extended all-to-all communication latency during the training process. Existing methods attempt to mitigate this issue by overlapping all-to-all with expert computation. Yet, these methods frequently fall short of achieving sufficient overlap, consequently restricting the potential for performance enhancements. In our study, we extend the scope of this challenge by considering overlap at the broader training graph level. During the forward pass, we enable non-MoE computations to overlap with all-to-all through careful partitioning and pipelining. In the backward pass, we achieve overlap with all-to-all by scheduling gradient weight computations. We implement these techniques in Lancet, a system using compiler-based optimization to automatically enhance MoE model training. Our extensive evaluation reveals that Lancet significantly reduces the time devoted to non-overlapping communication, by as much as 77%. Moreover, it achieves a notable end-to-end speedup of up to 1.3 times when compared to the state-of-the-art solutions.
The construction of a novel surgical instrument is considered, which is also a probing device providing a signal to the measuring equipment, which after its interpretation allows to obtain useful information about the section quality and the biomaterial properties. We propose here some formalized considerations on the possibility of its implementation for different variables registration. The idea is also extrapolated into the field of micrurgy which refers to the microelectrode techniques and the local potential registration in situ.
Recent advancements in AI alignment techniques have significantly improved the alignment of large language models (LLMs) with static human preferences. However, the dynamic nature of human preferences can render some prior training data outdated or even erroneous, ultimately causing LLMs to deviate from contemporary human preferences and societal norms. Existing methodologies, whether they involve the curation of new data for continual alignment or the manual correction of outdated data for re-alignment, demand costly human resources. To address this challenge, we propose a novel approach, Large Language Model Behavior Correction with Influence Function Recall and Post-Training (LANCET), which requires no human involvement. LANCET consists of two phases: (1) using influence functions to identify the training data that significantly impact undesirable model outputs, and (2) applying an Influence function-driven Bregman Optimization (IBO) technique to adjust the model's behavior based on these influence distributions. Our experiments demonstrate that LANCET effectively and efficiently correct inappropriate behaviors of LLMs. Furthermore, LANCET can outperform methods that rely on col
Fetal ultrasound is the cornerstone of antenatal care, and accurate recognition of a small set of standard anatomical planes underpins biometry, growth surveillance, and detection of structural anomalies. Deep learning classifiers now match or exceed expert accuracy on curated benchmarks, but most remain opaque and miscalibrated, leaving clinicians without the calibrated confidence or faithful explanations needed for safe decision support. We systematically reviewed 78 studies published between January 1, 2015 and April 30, 2026 that paired automated fetal plane classification with explainability or predictive uncertainty quantification, following PRISMA 2020. Pooled balanced accuracy across six standard planes was 0.93 (95% CI 0.91 to 0.95), but only 19 studies (24%) reported calibration and 14 (18%) reported selective prediction. We propose CALIB-XFUS, a 22-item reporting framework that operationalises calibration, explanation faithfulness, and fairness for regulated fetal ultrasound artificial intelligence. The framework spans six domains: clinical task and indication for use; dataset provenance and representativeness; model and training pipeline; calibration and selective predi
The academic journal zoning system is central to evaluating research talent, funding, and institutions. The CAS journal partition system, one of East Asia's most widely used tools, will cease operation in March 2026, creating a policy gap. Existing alternatives have major limitations: JCR depends on paid databases and excludes conferences; Scimago/CiteScore relies on Elsevier proprietary data; expert-based rankings such as CCF and CORE lack quantitative foundations and update slowly. This paper proposes the General Science Ranking (GSR), a multidimensional bibliometric framework built entirely on open-source data. GSR covers 500 computer science venues (397 journals and 103 conferences) and 500 medical journals using OpenAlex and Semantic Scholar. Scores combine four indicators: field-weighted citation impact (FWCI), two-year impact factor (IF2), five-year h-index (h5), and citation CAGR. For CS conferences lacking citation time-series data, IF2-approx was estimated from calibration on 1.41 million OpenAlex journal papers. Rankings adopt fixed quotas: Q1 (1-50), Q2 (51-100), Q3 (101-200), and Q4 (201+). All code and data are open source. In CS rankings, conferences and journals eac
The early detection of esophagogastric junction adenocarcinoma (EGJA) is crucial for improving patient prognosis, yet its current diagnosis is highly operator-dependent. This paper aims to make the first attempt to develop an artificial intelligence (AI) foundation model-based method for both screening and staging diagnosis of EGJA using endoscopic images. In this cohort and learning study, we conducted a multicentre study across seven Chinese hospitals between December 28, 2016 and December 30, 2024. It comprises 12,302 images from 1,546 patients; 8,249 of them were employed for model training, while the remaining were divided into the held-out (112 patients, 914 images), external (230 patients, 1,539 images), and prospective (198 patients, 1,600 images) test sets for evaluation. The proposed model employs DINOv2 (a vision foundation model) and ResNet50 (a convolutional neural network) to extract features of global appearance and local details of endoscopic images for EGJA staging diagnosis. Our model demonstrates satisfactory performance for EGJA staging diagnosis across three test sets, achieving an accuracy of 0.9256, 0.8895, and 0.8956, respectively. In contrast, among represe
According to the Lancet report on the global burden of disease published in October 2020, air pollution is among the five highest risk factors for global health, reducing life expectancy on average by 20 months. This paper describes a data-driven method for establishing causal relationships within and between two multivariate time series data streams derived from wearable sensors: personal exposure to airborne particulate matter of aerodynamic sizes less than 2.5um (PM2.5) gathered from the Airspeck monitor worn on the person and continuous respiratory rate (breaths per minute) measured by the Respeck monitor worn as a plaster on the chest. Results are presented for a cohort of 113 asthmatic adolescents using the PCMCI+ algorithm to learn the short-term causal relationships between lags of \pm exposure and respiratory rate. We consider causal effects up to a maximum delay of 8 hours, using data at both a 1 minute and 15 minute resolution in different experiments. For the first time a personalised exposure-response relationship between PM2.5 exposure and respiratory rate has been demonstrated to exist for short-term effects in asthmatic adolescents during their everyday lives. Our r
Background: Liver diseases present a significant global health challenge and often require costly, invasive diagnostics. Electrocardiography (ECG), a widely available and non-invasive tool, can enable the detection of liver disease by capturing cardiovascular-hepatic interactions. Methods: We trained tree-based machine learning models on ECG features to detect liver diseases using two large datasets: MIMIC-IV-ECG (467,729 patients, 2008-2019) and ECG-View II (775,535 patients, 1994-2013). The task was framed as binary classification, with performance evaluated via the area under the receiver operating characteristic curve (AUROC). To improve interpretability, we applied explainability methods to identify key predictive features. Findings: The models showed strong predictive performance with good generalizability. For example, AUROCs for alcoholic liver disease (K70) were 0.8025 (95% confidence interval (CI), 0.8020-0.8035) internally and 0.7644 (95% CI, 0.7641-0.7649) externally; for hepatic failure (K72), scores were 0.7404 (95% CI, 0.7389-0.7415) and 0.7498 (95% CI, 0.7494-0.7509), respectively. The explainability analysis consistently identified age and prolonged QTc intervals (
Large language models (LLMs) have exhibited remarkable capabilities across various domains and tasks, pushing the boundaries of our knowledge in learning and cognition. The latest model, OpenAI's o1, stands out as the first LLM with an internalized chain-of-thought technique using reinforcement learning strategies. While it has demonstrated surprisingly strong capabilities on various general language tasks, its performance in specialized fields such as medicine remains unknown. To this end, this report provides a comprehensive exploration of o1 on different medical scenarios, examining 3 key aspects: understanding, reasoning, and multilinguality. Specifically, our evaluation encompasses 6 tasks using data from 37 medical datasets, including two newly constructed and more challenging question-answering (QA) tasks based on professional medical quizzes from the New England Journal of Medicine (NEJM) and The Lancet. These datasets offer greater clinical relevance compared to standard medical QA benchmarks such as MedQA, translating more effectively into real-world clinical utility. Our analysis of o1 suggests that the enhanced reasoning ability of LLMs may (significantly) benefit their
Despite the ubiquitous use over the past 150 years, the functions of the current medical needle are facilitated only by mechanical shear and cutting by the needle tip,i.e.the lancet. In this study, we demonstrate how nonlinear ultrasonics (NLU) extends the functionality of the medical needle far beyond its present capability. The NLU actions were found to be localized to the proximity of the needle tip, the SonoLancet, but the effects extend several millimeters from the physical needle boundary. The observed nonlinear phenomena, transient cavitation, fluid streams, translation of micro- and nanoparticles and atomization, were quantitatively characterized. In the fine-needle biopsy application, the SonoLancet contributed to obtaining tissue cores with increase in tissue yield by 3-6x in different tissue types compared to conventional needle biopsy technique using the same 21G needle. In conclusion, the SonoLancet could be of interest to several other medical applications, including drug or gene delivery, cell modulation, and minimally invasive surgical procedures.
Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and r
Benchmarking and monitoring urban design and transport features is critical to achieving local and international health and sustainability goals. However, most urban indicator frameworks use coarse spatial scales that only allow between-city comparisons or require expensive, technical, local spatial analyses for within-city comparisons. This study developed a reusable open-source urban indicator computational framework using open data to enable consistent local and global comparative analyses. We demonstrate this framework by calculating spatial indicators - for 25 diverse cities in 19 countries - of urban design and transport features that support health and sustainability. We link these indicators to cities' policy contexts and identify populations living above and below critical thresholds for physical activity through walking. Efforts to broaden participation in crowdsourcing data and to calculate globally consistent indicators are essential for planning evidence-informed urban interventions, monitoring policy impacts, and learning lessons from peer cities to achieve health, equity, and sustainability goals.
It has been recently demonstrated that use of ultrasound increases the tissue yield in ultrasound-enhanced fine-needle aspiration biopsy (USeFNAB) as compared to conventional fine-needle aspiration biopsy (FNAB). To date, the association between bevel geometry and needle tip action has not been widely explored. In this study, we studied the needle resonance characteristics and deflection magnitude of various needle bevel geometries with varying bevel lengths. With a conventional lancet, having a 3.9 mm long bevel, the tip deflection efficiency in air and water was 220 and 105 micrometres per Watt, respectively. This was higher in comparison to an axi-symmetric tip, having a bevel length of 4 mm, which achieved a deflection efficiency of 180 and 80 micrometres per Watt in air and water, respectively. This study emphasised the importance of relationship between flexural modulus of bevel geometry in the context of various insertion media and, thus, could provide understanding on approaches to control post-puncture cutting action by modifying the needle bevel geometry, essential for the USeFNAB application.
Stroke is the top leading causes of death in China (Zhou et al. The Lancet 2019). A dataset from Shanxi Province is used to identify the risk of each patient's at four states low/medium/high/attack and provide the state transition tendency through a SHAP DeepExplainer. To improve the accuracy on an imbalance sample set, the Quadratic Interactive Deep Neural Network (QIDNN) model is first proposed by flexible selecting and appending of quadratic interactive features. The experimental results showed that the QIDNN model with 7 interactive features achieve the state-of-art accuracy $83.25\%$. Blood pressure, physical inactivity, smoking, weight and total cholesterol are the top five important features. Then, for the sake of high recall on the most urgent state, attack state, the stroke occurrence prediction is taken as an auxiliary objective to benefit from multi-objective optimization. The prediction accuracy was promoted, meanwhile the recall of the attack state was improved by $24.9\%$ (to $84.83\%$) compared to QIDNN (from $67.93\%$) with same features. The prediction model and analysis tool in this paper not only gave the theoretical optimized prediction method, but also provided
The COVID-19 pandemic is the most significant global crisis since World War II that affected almost all the countries of our planet. To control the COVID-19 pandemic outbreak, it is necessary to understand how the virus is transmitted to a susceptible individual and eventually spread in the community. The primary transmission pathway of COVID-19 is human-to-human transmission through infectious droplets. However, a recent study by Greenhalgh et al. (Lancet: 397:1603-1605, 2021) demonstrates 10 scientific reasons behind the airborne transmission of SARS-COV-2. In the present study, we introduce a novel mathematical model of COVID-19 that considers the transmission of free viruses in the air besides the transmission of direct contact with an infected person. The basic reproduction number of the epidemic model is calculated using the next-generation operator method and observed that it depends on both the transmission rate of direct contact and free virus contact. The local and global stability of disease-free equilibrium (DFE) is well established. Analytically it is found that there is a forward bifurcation between the DFE and an endemic equilibrium using central manifold theory. Nex
The accuracy of published medical research is critical both for scientists, physicians and patients who rely on these results. But the fundamental belief in the medical literature was called into serious question by a paper suggesting most published medical research is false. Here we adapt estimation methods from the genomics community to the problem of estimating the rate of false positives in the medical literature using reported P-values as the data. We then collect P-values from the abstracts of all 77,430 papers published in The Lancet, The Journal of the American Medical Association, The New England Journal of Medicine, The British Medical Journal, and The American Journal of Epidemiology between 2000 and 2010. We estimate that the overall rate of false positives among reported results is 14% (s.d. 1%), contrary to previous claims. We also find there is not a significant increase in the estimated rate of reported false positive results over time (0.5% more FP per year, P = 0.18) or with respect to journal submissions (0.1% more FP per 100 submissions, P = 0.48). Statistical analysis must allow for false positives in order to make claims on the basis of noisy data. But our ana
Randomised controlled trials aim to assess the impact of one (or more) health interventions relative to other standard interventions. RCTs sometimes use an ordinal outcome, which is an endpoint that comprises of multiple, monotonically ordered categories that are not necessarily separated by a quantifiable distance. Ordinal outcomes are appealing in clinical settings as disease states can represent meaningful categories that may be of clinical importance. They can also retain information and increase statistical power compared to dichotomised outcomes. Target parameters for ordinal outcomes in RCTs may vary depending on the nature of the research question, the modelling assumptions, and the expertise of the data analyst. The aim of this scoping review is to systematically describe the use of ordinal outcomes in contemporary RCTs. Specifically, we aim to (i) identify which target parameters are of interest in trials that use an ordinal outcome; (ii) describe how ordinal outcomes are analysed in RCTs to estimate a treatment effect; and (iii) describe whether RCTs that use an ordinal outcome adequately report key methodological aspects specific to the analysis of the outcome. Results