共找到 20 条结果
We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
Analyzing large volumes of case law to uncover evolving legal principles, across multiple cases, on a given topic is a demanding task for legal professionals. Structured topical reports provide an effective solution by summarizing key issues, principles, and judgments, enabling comprehensive legal analysis on a particular topic. While prior works have advanced query-based individual case summarization, none have extended to automatically generating multi-case structured reports. To address this, we introduce LexGenie, an automated LLM-based pipeline designed to create structured reports using the entire body of case law on user-specified topics within the European Court of Human Rights jurisdiction. LexGenie retrieves, clusters, and organizes relevant passages by topic to generate a structured outline and cohesive content for each section. Expert evaluation confirms LexGenie's utility in producing structured reports that enhance efficient, scalable legal analysis.
Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports but are rarely evaluated for this task. We introduce CaseReportBench, an expert-annotated dataset for dense information extraction of case reports, focusing on IEMs. Using this dataset, we assess various models and prompting strategies, introducing novel approaches such as category-specific prompting and subheading-filtered data integration. Zero-shot chain-of-thought prompting offers little advantage over standard zero-shot prompting. Category-specific prompting improves alignment with the benchmark. The open-source model Qwen2.5-7B outperforms GPT-4o for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management. We also highlight areas for improvement, such as LLMs' limitations in recogni
Background: A large number of neurology case reports have been published, but it is a challenging task for human medical experts to explore all of these publications. Text mining offers a computational approach to investigate neurology literature and capture meaningful patterns. The overarching goal of this study is to provide a new perspective on case reports of neurological disease and syndrome analysis over the last six decades using text mining. Methods: We extracted diseases and syndromes (DsSs) from more than 65,000 neurology case reports from 66 journals in PubMed over the last six decades from 1955 to 2017. Text mining was applied to reports on the detected DsSs to investigate high-frequency DsSs, categorize them, and explore the linear trends over the 63-year time frame. Results: The text mining methods explored high-frequency neurologic DsSs and their trends and the relationships between them from 1955 to 2017. We detected more than 18,000 unique DsSs and found 10 categories of neurologic DsSs. While the trend analysis showed the increasing trends in the case reports for top-10 high-frequency DsSs, the categories had mixed trends. Conclusion: Our study provided new insigh
Maintaining reliable UI test suites in large-scale enterprise applications is a persistent and costly challenge. We present an industrial case study of a multi-agent autonomous testing system evaluated using anonymized execution data from a production-like enterprise UI testing prototype. The application features several hundred dynamic UI elements per screen. Built on a large language model with LangGraph orchestration, Playwright execution, and a RAG knowledge base, the system evolves from human-directed testing toward High-autonomy feature discovery and test execution: given no explicit test targets, it discovers over 100 testable features across 10 UI screens, dynamically expands coverage by an additional 15--30 features through runtime DOM analysis, and iteratively repairs failing tests without human intervention. We analyzed 300 consecutive autonomous execution reports encompassing 636 individual test-case executions across 10 distinct scenario families. The system achieved a 70% repair convergence rate at the scenario-family level, with a mean of 3.4 repair iterations to convergence. However, only 10% of scenario families succeeded on first attempt, 38% of reports failed to
The QUIJOTE-MFI Northern Hemisphere Wide-Survey has provided maps of the sky above declinations $-30^\circ$ at 11, 13, 17 and 19$\,$GHz. These data are combined with ancillary data to produce Spectral Energy Distributions in intensity in the frequency range 0.4--3\,000$\,$GHz on a sample of 52 candidate compact sources harbouring anomalous microwave emission (AME). We apply a component separation analysis at 1$^\circ$ scale on the full sample from which we identify 44 sources with high AME significance. We explore correlations between different fitted parameters on this last sample. QUIJOTE-MFI data contribute to notably improve the characterisation of the AME spectrum, and its separation from the other components. In particular, ignoring the 10--20\,GHz data produces on average an underestimation of the AME amplitude, and an overestimation of the free-free component. We find an average AME peak frequency of 23.6 $\pm$ 3.6$\,$GHz, about 4$\,$GHz lower than the value reported in previous studies. The strongest correlation is found between the peak flux density of the thermal dust and of the AME component. A mild correlation is found between the AME emissivity ($A_{\rm AME}/τ_{250}$)
High Bandwidth Memory with Processing-in-Memory (HBM-PIM) offers an opportunity to reduce data movement by executing computation directly inside memory, but current commercial platforms expose limited instruction sets and require specialized software stacks. In this work, we investigate whether HBM-PIM can serve as a backend for ISA-level matrix acceleration, using the RISC-V Attached Matrix Extension (AME) as a semantic reference. We propose a PEP-based execution model that maps AME element-wise and matrix instructions to HBM-PIM micro-kernels and data instructions in memory operations. Differently from SoA HBM-PIM, we introduce a reduction-free outer-product dataflow that enables accumulation entirely within memory despite the lack of native reduction support. Our approach supports end-to-end execution of element-wise operations, GEMV, and GEMM in PIM mode, minimizing host involvement and off-chip transfers. An experimental evaluation on Samsung Aquabolt-XL shows that AME matrix tile multiplication achieves up to 14.9 GFLOP/s (59.4 FLOP/cycle) on a single HBM pseudo-channel.
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Su
With the growth of global maritime transportation, energy optimization has become crucial for reducing costs and ensuring operational efficiency. Shaft power is the mechanical power transmitted from the engine to the shaft and directly impacts fuel consumption, making its accurate prediction a paramount step in optimizing vessel performance. Power consumption is highly correlated with ship parameters such as speed and shaft rotation per minute, as well as weather and sea conditions. Frequent access to this operational data can improve prediction accuracy. However, obtaining high-quality sensor data is often infeasible and costly, making alternative sources such as noon reports a viable option. In this paper, we propose a transfer learning-based approach for predicting vessels shaft power, where a model is initially trained on high-frequency data from a vessel and then fine-tuned with low-frequency daily noon reports from other vessels. We tested our approach on sister vessels (identical dimensions and configurations), a similar vessel (slightly larger with a different engine), and a different vessel (distinct dimensions and configurations). The experiments showed that the mean abso
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports.
We argue that the special extremal slice inside an AdS black hole is dual to an absolutely maximally entangled (AME) state. We demonstrate this by confirming the $n$-independence of holographic $n$-th Renyi entropies for any bi-partite subsystems. Our result gives an AME state in an infinite-volume system, where the local bond dimension is set by the black hole entropy. In particular, our construction provides concrete support from the gravity side for the emergence of random structures and an infinite-dimensional Hilbert space in recent non-isometric holographic codes.
On-device agents on smartphones increasingly require continuously evolving memory to support personalized, context-aware, and long-term behaviors. To meet both privacy and responsiveness demands, user data is embedded as vectors and stored in a vector database for fast similarity search. However, most existing vector databases target server-class environments. When ported directly to smartphones, two gaps emerge: (G1) a mismatch between mobile SoC constraints and vector-database assumptions, including tight bandwidth budgets, limited on-chip memory, and stricter data type and layout constraints; and (G2) a workload mismatch, because on-device usage resembles a continuously learning memory, in which queries must coexist with frequent inserts, deletions, and ongoing index maintenance. To address these challenges, we propose AME, an on-device Agentic Memory Engine co-designed with modern smartphone SoCs. AME introduces two key techniques: (1) a hardware-aware, high-efficiency matrix pipeline that maximizes compute-unit utilization and exploits multi-level on-chip storage to sustain high throughput; and (2) a hardware- and workload-aware scheduling scheme that coordinates querying, ins
Anomalous Microwave Emission (AME) is a component of diffuse Galactic radiation observed at frequencies in the range $\approx 10$-60 GHz. AME was first detected in 1996 and recognised as an additional component of emission in 1997. Since then, AME has been observed by a range of experiments and in a variety of environments. AME is spatially correlated with far-IR thermal dust emission but cannot be explained by synchrotron or free-free emission mechanisms, and is far in excess of the emission contributed by thermal dust emission with the power-law opacity consistent with the observed emission at sub-mm wavelengths. Polarization observations have shown that AME is very weakly polarized ($\lesssim 1$%). The most natural explanation for AME is rotational emission from ultra-small dust grains ("spinning dust"), first postulated in 1957. Magnetic dipole radiation from thermal fluctuations in the magnetization of magnetic grain materials may also be contributing to the AME, particularly at higher frequencies ($\gtrsim 50$ GHz). AME is also an important foreground for Cosmic Microwave Background analyses. This paper presents a review and the current state-of-play in AME research, which wa
Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the importance of exposing all data during training. Our contributions set the stage for accurately detecting IoT vulnerabilities from issue reports, similar to non-IoT systems.
Screening mammography is high volume, time sensitive, and documentation heavy. Radiologists must translate subtle visual findings into consistent BI-RADS assessments, breast density categories, and structured narrative reports. While recent Vision Language Models (VLMs) enable image-to-text reporting, many rely on closed cloud systems or tightly coupled architectures that limit privacy, reproducibility, and adaptability. We present MammoWise, a local multi-model pipeline that transforms open source VLMs into mammogram report generators and multi-task classifiers. MammoWise supports any Ollama-hosted VLM and mammography dataset, and enables zero-shot, few-shot, and Chain-of-Thought prompting, with optional multimodal Retrieval Augmented Generation (RAG) using a vector database for case-specific context. We evaluate MedGemma, LLaVA-Med, and Qwen2.5-VL on VinDr-Mammo and DMID datasets, assessing report quality (BERTScore, ROUGE-L), BI-RADS classification, breast density, and key findings. Report generation is consistently strong and improves with few-shot prompting and RAG. Classification is feasible but sensitive to model and dataset choice. Parameter-efficient fine-tuning (QLoRA) of
We present a new method to obtain stellar properties for stars exhibiting solar-like oscillations in an easy, fast, and transparent way. The method, called Asteroseismology Made Easy (AME), can determine stellar masses, mean-densities, radii, and surface gravities, as well as estimate ages. In this writing we present AME as a visual and powerful tool which could be useful; in particular in the light of the large number of exoplanets being found. AME consists of a set of figures from which the stellar parameters are deduced. These figures are made from a grid of stellar evolutionary models that cover masses ranging from 0.7 Msun to 1.6 Msun in steps of 0.1 Msun and metallicities in the interval -0.3 dex <= [Fe/H] <= +0.3 dex in increments of 0.1 dex. The stellar evolutionary models are computed using the Modules for Experiments in Stellar Astrophysics (MESA) code with simple input physics. We have compared the results from AME with results for three groups of stars; stars with radii determined from interferometry (and measured parallaxes), stars with radii determined from measurements of their parallaxes (and calculated angular diameters), and stars with results based on the m
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to
Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLMs) offer potential for improving RE efficiency through automated comprehension and commenting, but cloud-hosted, closed-weight models pose privacy and security risks and cannot be used in closed-network facilities. We evaluate parameter-efficient fine-tuned local LLMs for assisting with x86 RE tasks in these settings. Eight open-weight models across the CodeLlama, Qwen2.5-Coder, and CodeGemma series are fine-tuned on a custom curated dataset of 5,981 x86 assembly examples. We evaluate them quantitatively and identify the fine-tuned Qwen2.5-Coder-7B as the top performer, which we name REx86. REx86 reduces test-set cross-entropy loss by 64.2% and improves semantic cosine similarity against ground truth by 20.3\% over its base model. In a limited user case study (n=43), REx86 significantly enhanced line-level code understanding (p = 0.031) and increased the correct-solve rate from 31% to 53% (p = 0.189), though the latter did not reach statistical significance. Qualitative analysis shows more accur
In 1984 Edward Witten proposed that an extremely dense form of matter composed of up, down, and strange quarks may be stable at zero pressure (Witten, 1984). Massive nuggets of such dense matter, if they exist, may pass through the Earth and be detectable by the seismic signals they generate (de Rujula and Glashow, 1984). With this motivation we investigated over 1 million seismic data reports to the U.S. Geological Survey for the years 1990-1993 not associated with epicentral sources. We report two results: (1) with an average of about 0.16 unassociated reports per minute after data cuts, we found a significant excess over statistical expectation for sets with ten or more reports in ten minutes; and (2) in spite of a very small a priori probability from random reports, we found one set of reports with arrival times and other features appropriate to signals from an epilinear source. This event has the properties predicted for the passage of a nugget of strange quark matter (SQM) through the earth, although there is no direct confirmation from other phenomenologies.
The Planck 28.5 GHz maps were searched for potential Anomalous Microwave Emission (AME) regions on the scale of $\sim3^{\circ}$ or smaller, and several new regions of interest were selected. Ancillary data at both lower and higher frequencies were used to construct spectral energy distributions (SEDs), which seem to confirm an excess consistent with spinning dust models. Here we present higher resolution observations of two of these new regions with the Arcminute Microkelvin Imager Small Array (AMI SA) between 14 and 18 GHz to test for the presence of a compact ($\sim$10 arcmin or smaller) component. For AME-G107.1+5.2, dominated by the {\sc Hii} region S140, we find evidence for the characteristic rising spectrum associated with the either the spinning dust mechanism for AME or an ultra/hyper-compact \textsc{Hii} region across the AMI frequency band, however for AME-G173.6+2.8 we find no evidence for AME on scales of $\sim 2-10$ arcmin.