The exponential increase in scientific literature and online information necessitates efficient methods for extracting knowledge from textual data. Natural language processing (NLP) plays a crucial role in addressing this challenge, particularly in text classification tasks. While large language models (LLMs) have achieved remarkable success in NLP, their accuracy can suffer in domain-specific contexts due to specialized vocabulary, unique grammatical structures, and imbalanced data distributions. In this systematic literature review (SLR), we investigate the utilization of pre-trained language models (PLMs) for domain-specific text classification. We systematically review 41 articles published between 2018 and January 2024, adhering to the PRISMA statement (preferred reporting items for systematic reviews and meta-analyses). This review methodology involved rigorous inclusion criteria and a multi-step selection process employing AI-powered tools. We delve into the evolution of text classification techniques and differentiate between traditional and modern approaches. We emphasize transformer-based models and explore the challenges and considerations associated with using LLMs for
Context: Blockchain and AI are increasingly explored to enhance trustworthiness in software engineering (SE), particularly in supporting software evolution tasks. Method: We conducted a systematic literature review (SLR) using a predefined protocol with clear eligibility criteria to ensure transparency, reproducibility, and minimized bias, synthesizing research on blockchain-enabled trust in AI-driven SE tools and processes. Results: Most studies focus on integrating AI in SE, with only 31% explicitly addressing trustworthiness. Our review highlights six recent studies exploring blockchain-based approaches to reinforce reliability, transparency, and accountability in AI-assisted SE tasks. Conclusion: Blockchain enhances trust by ensuring data immutability, model transparency, and lifecycle accountability, including federated learning with blockchain consensus and private data verification. However, inconsistent definitions of trust and limited real-world testing remain major challenges. Future work must develop measurable, reproducible trust frameworks to enable reliable, secure, and compliant AI-driven SE ecosystems, including applications involving large language models.
Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability to control the fraction of possible factor combinations used during training. We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination. We hope that this benchmark will help advance visual systematic c
Systematic polar codes are shown to outperform non-systematic polar codes in terms of the bit-error-rate (BER) performance. However theoretically the mechanism behind the better performance of systematic polar codes is not yet clear. In this paper, we set the theoretical framework to analyze the performance of systematic polar codes. The exact evaluation of the BER of systematic polar codes conditioned on the BER of non-systematic polar codes involves in $2^{NR}$ terms where $N$ is the code block length and $R$ is the code rate, resulting in a prohibitive number of computations for large block lengths. By analyzing the polar code construction and the successive-cancellation (SC) decoding process, we use a statistical model to quantify the advantage of systematic polar codes over non-systematic polar codes, so called the systematic gain in this paper. A composite model is proposed to approximate the dominant error cases in the SC decoding process. This composite model divides the errors into independent regions and coupled regions, controlled by a coupling coefficient. Based on this model, the systematic gain can be conveniently calculated. Numerical simulations are provided in the
Software documentation is essential for program comprehension, developer onboarding, code review, and long-term maintenance. Yet producing quality documentation manually is time-consuming and frequently yields incomplete or inconsistent results. Large language models (LLMs) offer a promising solution by automatically generating natural language descriptions from source code, helping developers understand code more efficiently, facilitating maintenance, and supporting downstream activities such as defect localization and commit message generation. However, the effectiveness of LLMs in documentation tasks critically depends on how they are prompted. Properly structured instructions can substantially improve model performance, making prompt engineering-the design of input prompts to guide model behavior-a foundational technique in LLM-based software engineering. Approaches such as few-shot prompting, chain-of-thought reasoning, retrieval-augmented generation, and zero-shot learning show promise for code summarization, yet current research remains fragmented. There is limited understanding of which prompting strategies work best, for which models, and under what conditions. Moreover, e
Fetal ultrasound is the cornerstone of antenatal care, and accurate recognition of a small set of standard anatomical planes underpins biometry, growth surveillance, and detection of structural anomalies. Deep learning classifiers now match or exceed expert accuracy on curated benchmarks, but most remain opaque and miscalibrated, leaving clinicians without the calibrated confidence or faithful explanations needed for safe decision support. We systematically reviewed 78 studies published between January 1, 2015 and April 30, 2026 that paired automated fetal plane classification with explainability or predictive uncertainty quantification, following PRISMA 2020. Pooled balanced accuracy across six standard planes was 0.93 (95% CI 0.91 to 0.95), but only 19 studies (24%) reported calibration and 14 (18%) reported selective prediction. We propose CALIB-XFUS, a 22-item reporting framework that operationalises calibration, explanation faithfulness, and fairness for regulated fetal ultrasound artificial intelligence. The framework spans six domains: clinical task and indication for use; dataset provenance and representativeness; model and training pipeline; calibration and selective predi
New UAV technologies and the NewSpace era are transforming Earth Observation missions and data acquisition. Numerous small platforms generate large data volume, straining bandwidth and requiring onboard decision-making to transmit high-quality information in time. While Machine Learning allows real-time autonomous processing, FPGAs balance performance with adaptability to mission-specific requirements, enabling onboard deployment. This review systematically analyzes 68 experiments deploying ML models on FPGAs for Remote Sensing applications. We introduce two distinct taxonomies to capture both efficient model architectures and FPGA implementation strategies. For transparency and reproducibility, we follow PRISMA 2020 guidelines and share all data and code at https://github.com/CedricLeon/Survey_RS-ML-FPGA.
We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blueprints. While LLMs demonstrate remarkable capabilities across diverse tasks, they often fail to maintain adherence to complex, use-case-specific instructions during multi-turn conversations, presenting challenges for business-critical applications. ARQs address this limitation by guiding LLMs through systematic reasoning steps with targeted queries that reinstate critical instructions and facilitate intermediate reasoning throughout the completion process. In extensive testing within Parlant, our framework for reliable customer-facing agents in which ARQs were born out of necessity, they achieved a 90.2% success rate across 87 test scenarios, outperforming both Chain-of-Thought reasoning (86.1%) and direct response generation (81.5%). ARQs showed particular strength in addressing persistent failure modes like guideline re-application and hallucination prevention. Our analysis also revealed that ARQs can potentially be more computationally efficient than free-form reasoning when carefu
Given the increasing demands in computer programming education and the rapid advancement of large language models (LLMs), LLMs play a critical role in programming education. This study provides a systematic review of selected empirical studies on LLMs in computer programming education, published from 2023 to March 2024. The data for this review were collected from Web of Science (SCI/SSCI), SCOPUS, and EBSCOhost databases, as well as three conference proceedings specialized in computer programming education. In total, 42 studies met the selection criteria and were reviewed using methods, including bibliometric analysis, thematic analysis, and structural topic modeling. This study offers an overview of the current state of LLMs in computer programming education research. It outlines LLMs' applications, benefits, limitations, concerns, and implications for future research and practices, establishing connections between LLMs and their practical use in computer programming education. This review also provides examples and valuable insights for instructional designers, instructors, and learners. Additionally, a conceptual framework is proposed to guide education practitioners in integra
This paper focuses on the problem of evolving Boolean functions of odd sizes with high nonlinearity, a property of cryptographic relevance. Despite its simple formulation, this problem turns out to be remarkably difficult. We perform a systematic evaluation by considering three solution encodings and four problem instances, analyzing how well different types of evolutionary algorithms behave in finding a maximally nonlinear Boolean function. Our results show that genetic programming generally outperforms other evolutionary algorithms, although it falls short of the best-known results achieved by ad-hoc heuristics. Interestingly, by adding local search and restricting the space to rotation symmetric Boolean functions, we show that a genetic algorithm with the bitstring encoding manages to evolve a $9$-variable Boolean function with nonlinearity 241.
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interaction, mobile service robots can achieve more flexible understanding, adaptive behavior, and robust task execution in dynamic real-world environments. Despite this progress, embodied AI for mobile service robots continues to face fundamental challenges related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment. In this paper, we present the first systematic review focused specifically on the integration of foundation models in mobile service robotics. We analyze how recent advances in foundation models address these core challenges through language-conditioned control, multimodal sensor fusion, uncertainty-aware reaso
The Industry 5.0 transition highlights EU efforts to design intelligent devices that can work alongside humans to enhance human capabilities, and such vision aligns with user preferences and needs to feel safe while collaborating with such systems take priority. This demands a human-centric research vision and requires a societal and educational shift in how we perceive technological advancements. To better understand this perspective, we conducted a systematic literature review focusing on understanding how trust and trustworthiness can be key aspects of supporting this move towards Industry 5.0. This review aims to overview the most common methodologies and measurements and collect insights about barriers and facilitators for fostering trustworthy HRI. After a rigorous quality assessment following the Systematic Reviews and Meta-Analyses guidelines, using rigorous inclusion criteria and screening by at least two reviewers, 34 articles were included in the review. The findings underscores the significance of trust and safety as foundational elements for promoting secure and trustworthy human-machine cooperation. Confirm that almost 30% of the revised articles do not present a defi
Background: Evidence synthesis facilitates evidence-based medicine. This task becomes increasingly difficult to accomplished with applying computational solutions, since the medical literature grows at astonishing rates. Objective: This study evaluates an information retrieval-driven workflow, CASMA, to enhance the efficiency, transparency, and reproducibility of systematic reviews. Endometriosis recurrence serves as the ideal case due to its complex and ambiguous literature. Methods: The hybrid approach integrates PRISMA guidelines with fuzzy matching and regular expression (regex) to facilitate semi-automated deduplication and filtered records before manual screening. The workflow synthesised evidence from randomised controlled trials on the efficacy of a subclass of gonadotropin-releasing hormone agonists (GnRH-a). A modified splitting method addressed unit-of-analysis errors in multi-arm trials. Results: The workflow sharply reduced the screening workload, taking only 11 days to fetch and filter 33,444 records. Seven eligible RCTs were synthesized (841 patients). The pooled random-effects model yielded a Risk Ratio (RR) of $0.64$ ($95\%$ CI $0.48$ to $0.86$), demonstrating a $3
Diffuse Reflectance Spectroscopy has demonstrated a strong aptitude for identifying and differentiating biological tissues. However, the broadband and smooth nature of these signals require algorithmic processing, as they are often difficult for the human eye to distinguish. The implementation of machine learning models for this task has demonstrated high levels of diagnostic accuracies and led to a wide range of proposed methodologies for applications in various illnesses and conditions. In this systematic review, we summarise the state of the art of these applications, highlight current gaps in research and identify future directions. This review was conducted in accordance with the PRISMA guidelines. 77 studies were retrieved and in-depth analysis was conducted. It is concluded that diffuse reflectance spectroscopy and machine learning have strong potential for tissue differentiation in clinical applications, but more rigorous sample stratification in tandem with in-vivo validation and explainable algorithm development is required going forward.
Data valuation and data monetisation are complex subjects but essential to most organisations today. Unfortunately, they still lack standard procedures and frameworks for organisations to follow. In this survey, we introduce the reader to the concepts by providing the definitions and the background required to better understand data, monetisation strategies, and finally metrics and KPIs used in these strategies. We have conducted a systematic literature review on metrics and KPIs used in data valuation and monetisation, in every aspect of an organisation's business, and by a variety of stakeholders. We provide an expansive list of such metrics and KPIs with 162 references. We then categorise all the metrics and KPIs found into a large taxonomy, following the Balanced Scorecard (BSC) approach with further subclustering to cover every aspect of an organisation's business. This taxonomy will help every level of data management understand the complex landscape of the domain. We also discuss the difficulty in creating a standard framework for data valuation and data monetisation and the major challenges the domain is currently facing.
Relative distances between a high-redshift sample of Type Ia supernovae (SNe~Ia), anchored to a low-redshift sample, have been instrumental in drawing insights on the nature of the dark energy driving the accelerated expansion of the universe. A combination (hereafter called SBC) of the SNe~Ia with baryon acoustic oscillations (BAO) from the Dark Energy Spectroscopic Instrument (DESI) and the cosmic microwave background (CMB) recently indicated deviations from the standard interpretation of dark energy as a cosmological constant. In this paper, we analyse various systematic uncertainties in the distance measurement of SNe~Ia and their impact on the inferred dark energy properties in the canonical Chevallier-Polarski-Linder (CPL) model. We model systematic effects like photometric calibration, progenitor and dust evolution, and uncertainty in the galactic extinction law. We find that all the dominant systematic errors shift the dark energy inference towards the DESI 2024 results from an underlying $Λ$CDM cosmology. A small change in the calibration, and change in the Milky Way dust, can give rise to systematic-driven shifts on $w_0$-$w_a$ constraints, comparable to the deviation rep
Accurate, non-invasive flow measurement is imperative for efficient water resource management and leak detection in distribution systems. Despite the advent of diverse external sensing technologies, a paucity of consolidated evidence exists regarding their comparative performance, energy efficiency, and applicability in varied operational contexts. The document delineates the protocol for a systematic literature review (SLR) that aims to identify, evaluate, and synthesize the extant evidence on non-invasive flow monitoring techniques for piped networks. Adhering to the Kitchenham methodology, the review will investigate the accuracy, precision, and energy consumption of prevailing solutions, such as ultrasonic and accelerometer-based systems. The analysis will also assess the impact of signal processing and machine learning (ML) algorithms on enhancing system capabilities. The objective of this study is to map the state-of-the-art, identify key research gaps, and provide an empirical foundation to direct future research toward operational deployment.
This paper systematically reviews the research progress and application prospects of machine learning technologies in the field of polymer materials. Currently, machine learning methods are developing rapidly in polymer material research; although they have significantly accelerated material prediction and design, their complexity has also caused difficulties in understanding and application for researchers in traditional fields. In response to the above issues, this paper first analyzes the inherent challenges in the research and development of polymer materials, including structural complexity and the limitations of traditional trial-and-error methods. To address these problems, it focuses on introducing key basic technologies such as molecular descriptors and feature representation, data standardization and cleaning, and records a number of high-quality polymer databases. Subsequently, it elaborates on the key role of machine learning in polymer property prediction and material design, covering the specific applications of algorithms such as traditional machine learning, deep learning, and transfer learning; further, it deeply expounds on data-driven design strategies, such as r
Game theory is a foundational framework for analyzing strategic interactions, and its intersection with large language models (LLMs) is a rapidly growing field. However, existing surveys mainly focus narrowly on using game theory to evaluate LLM behavior. This paper provides the first comprehensive survey of the bidirectional relationship between Game Theory and LLMs. We propose a novel taxonomy that categorizes the research in this intersection into four distinct perspectives: (1) evaluating LLMs in game-based scenarios; (2) improving LLMs using game-theoretic concepts for better interpretability and alignment; (3) modeling the competitive landscape of LLM development and its societal impact; and (4) leveraging LLMs to advance game models and to solve corresponding game theory problems. Furthermore, we identify key challenges and outline future research directions. By systematically investigating this interdisciplinary landscape, our survey highlights the mutual influence of game theory and LLMs, fostering progress at the intersection of these fields.
Outliers have been widely observed in Large Language Models (LLMs), significantly impacting model performance and posing challenges for model compression. Understanding the functionality and formation mechanisms of these outliers is critically important. Existing works, however, largely focus on reducing the impact of outliers from an algorithmic perspective, lacking an in-depth investigation into their causes and roles. In this work, we provide a detailed analysis of the formation process, underlying causes, and functions of outliers in LLMs. We define and categorize three types of outliers-activation outliers, weight outliers, and attention outliers-and analyze their distributions across different dimensions, uncovering inherent connections between their occurrences and their ultimate influence on the attention mechanism. Based on these observations, we hypothesize and explore the mechanisms by which these outliers arise and function, demonstrating through theoretical derivations and experiments that they emerge due to the self-attention mechanism's softmax operation. These outliers act as implicit context-aware scaling factors within the attention mechanism. As these outliers st