As part of the process of resolving issues submitted by users via bug reports, Android developers attempt to reproduce and observe the failures described by the bug report. Due to the low-quality of bug reports and the complexity of modern apps, the reproduction process is non-trivial and time-consuming. Therefore, automatic approaches that can help reproduce Android bug reports are in great need. However, current approaches to help developers automatically reproduce bug reports are only able to handle limited forms of natural language text and struggle to successfully reproduce failures for which the initial bug report had missing or imprecise steps. In this paper, we introduce a new fully automated Android bug report reproduction approach that addresses these limitations. Our approach accomplishes this by leveraging natural language process techniques to more holistically and accurately analyze the natural language in Android bug reports and designing new techniques, based on reinforcement learning, to guide the search for successful reproducing steps. We conducted an empirical evaluation of our approach on 77 real world bug reports. Our approach achieved 67% precision and 77% re
Occurrence reporting is a commonly used method in safety management systems to obtain insight in the prevalence of hazards and accident scenarios. In support of safety data analysis, reports are often categorized according to a taxonomy. However, the processing of the reports can require significant effort from safety analysts and a common problem is interrater variability in labeling processes. Also, in some cases, reports are not processed according to a taxonomy, or the taxonomy does not fully cover the contents of the documents. This paper explores various Natural Language Processing (NLP) methods to support the analysis of aviation safety occurrence reports. In particular, the problems studied are the automatic labeling of reports using a classification model, extracting the latent topics in a collection of texts using a topic model and the automatic generation of probable cause texts. Experimental results showed that (i) under the right conditions the labeling of occurrence reports can be effectively automated with a transformer-based classifier, (ii) topic modeling can be useful for finding the topics present in a collection of reports, and (iii) using a summarization model
In exterior calculus on smooth manifolds, the exterior derivative and wedge product are natural with respect to smooth maps between manifolds, that is, these operations commute with pullback. In discrete exterior calculus (DEC), simplicial cochains play the role of discrete forms, the coboundary operator serves as the discrete exterior derivative, and the antisymmetrized cup product provides a discrete wedge product. We show that these discrete operations in DEC are natural with respect to abstract simplicial maps. A second contribution is a new averaging interpretation of the discrete wedge product in DEC. We also show that this wedge product is the same as Wilson's cochain product defined using Whitney and de Rham maps.
We study an infinite countable iteration of the natural product between ordinals. We present an "effective" way to compute this countable natural product, in the non trivial cases the result depends only on the natural sum of the degrees of the factors, where the degree of a nonzero ordinal is the largest exponent in its Cantor normal form representation. Thus we are able to lift former results about infinitary sums to infinitary products. Finally, we provide an order-theoretical characterization of the infinite natural product, this characterization merges in a nontrivial way a theorem by Carruth describing the natural product of two ordinals and a known description of the ordinal product of a possibly infinite sequence of ordinals.
Orthogonal Gradient Descent (OGD) has emerged as a powerful method for continual learning. However, its Euclidean projections do not leverage the underlying information-geometric structure of the problem, which can lead to suboptimal convergence in learning tasks. To address this, we propose incorporating the natural gradient into OGD and present \textbf{ONG (Orthogonal Natural Gradient Descent)}. ONG preconditions each new task-specific gradient with an efficient EKFAC approximation of the inverse Fisher information matrix, yielding updates that follow the steepest descent direction under a Riemannian metric. To preserve performance on previously learned tasks, ONG projects these natural gradients onto the orthogonal complement of prior tasks' natural gradients. We provide an initial theoretical justification for this procedure, introduce the Orthogonal Natural Gradient Descent (ONG) algorithm, and present preliminary results on the Permuted and Rotated MNIST benchmarks. Our preliminary results, however, indicate that a naive combination of natural gradients and orthogonal projections has potential issues. This finding has motivated continued future work focused on robustly reconc
A classical t-tensor product expander is a natural way of formalising correlated walks of t particles on a regular expander graph. A quantum t-tensor product expander is a completely positive trace preserving map that is a straightforward analogue of a classical t-tensor product expander. Interest in these maps arises from the fact that iterating a quantum t-tensor product expander gives us a unitary t-design, which has many applications to quantum computation and information. We show that the zigzag product of a high dimensional quantum expander (i.e. t = 1) of moderate degree with a moderate dimensional quantum t-tensor product expander of low degree gives us a high dimensional quantum t-tensor product expander of low degree. Previously such a result was known only for quantum expanders i.e. t = 1. Using the zigzag product we give efficient constructions of quantum t-tensor product expanders in dimension D where t = polylog(D). We then show how replacing the zigzag product by the generalised zigzag product leads to almost-Ramanujan quantum tensor product expanders i.e. having near-optimal almost quadratic tradeoff between the degree and the second largest singular value. Both the
This position paper proposes a conceptual framework for the design of Natural Language Generation (NLG) systems that follow efficient and effective production strategies in order to achieve complex communicative goals. In this general framework, efficiency is characterised as the parsimonious regulation of production and comprehension costs while effectiveness is measured with respect to task-oriented and contextually grounded communicative goals. We provide concrete suggestions for the estimation of goals, costs, and utility via modern statistical methods, demonstrating applications of our framework to the classic pragmatic task of visually grounded referential games and to abstractive text summarisation, two popular generation tasks with real-world applications. In sum, we advocate for the development of NLG systems that learn to make pragmatic production decisions from experience, by reasoning about goals, costs, and utility in a human-like way.
Product states, unentangled tensor products of single qubits, are a ubiquitous ansatz in quantum computation, including for state-of-the-art Hamiltonian approximation algorithms. A natural question is whether we should expect to efficiently solve product state problems on any interesting families of Hamiltonians. We completely classify the complexity of finding minimum-energy product states for Hamiltonians defined by any fixed set of allowed 2-qubit interactions. Our results follow a line of work classifying the complexity of solving Hamiltonian problems and classical constraint satisfaction problems based on the allowed constraints. We prove that estimating the minimum energy of a product state is in P if and only if all allowed interactions are 1-local, and NP-complete otherwise. Equivalently, any family of non-trivial two-body interactions generates Hamiltonians with NP-complete product-state problems. Our hardness constructions only require coupling strengths of constant magnitude. A crucial component of our proofs is a collection of hardness results for a new variant of the Vector Max-Cut problem, which should be of independent interest. Our definition involves sums of distan
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports.
Bug reports are a popular target for natural language processing (NLP). However, bug reports often contain artifacts such as code snippets, log outputs and stack traces. These artifacts not only inflate the bug reports with noise, but often constitute a real problem for the NLP approach at hand and have to be removed. In this paper, we present a machine learning based approach to classify content into natural language and artifacts at line level implemented in Python. We show how data from GitHub issue trackers can be used for automated training set generation, and present a custom preprocessing approach for bug reports. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds. We cross evaluated our model against a foreign dataset and a foreign R model for the same task. The Python implementation of our model and our datasets are made publicly available under an open source license.
In recent years, with rising consumer demand, fresh products have gained increasing attention, leading to rapid growth in the fresh food market. However, due to their perishable nature and sensitivity to storage conditions, fresh products are vulnerable to damage during transportation. Improper handling, excessive transit times, and physical impacts can result in significant losses. As a result, enhancing the efficiency of fresh product distribution while maintaining quality has become critical to the further development of the fresh food industry. Using Y chain supermarket as a case study, this paper investigates the logistics of fresh product distribution, identifying current challenges and inefficiencies. Through literature review, expert interviews, and comparative analysis, the study offers strategic recommendations for optimizing fresh product delivery routes to improve distribution efficiency and product quality.
Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the importance of exposing all data during training. Our contributions set the stage for accurately detecting IoT vulnerabilities from issue reports, similar to non-IoT systems.
We define a two-variable $p$-adic Asai $L$-function for a finite-slope family of Hilbert modular forms over a real quadratic field (with one component of the weight, and the cyclotomic twist variable, varying independently); and a two-variable ``twisted triple product'' $L$-function, interpolating the central $L$-value of the tensor product of such a family with a family of elliptic modular forms. The former construction generalizes a construction due to Grossi, Zerbes and the second author for ordinary families; the latter is a counterpart of the twisted triple product $L$-function of arXiv:2401.13230, but differs in that it interpolates classical $L$-values in a different range of weights, in which the dominant weight comes from the Hilbert modular form. Our construction relies on a ``nearly-overconvergent'' version of higher Coleman theory for Hilbert modular surfaces.
This report details our methodology and results developed for the Multilingual E-commerce Search Competition. The problem aims to recognize relevance between user queries versus product items in a multilingual context and improve recommendation performance on e-commerce platforms. Utilizing Large Language Models (LLMs) and their capabilities in other tasks, our data-centric method achieved the highest score compared to other solutions during the competition. Final leaderboard is publised at https://alibaba-international-cikm2025.github.io. The source code for our project is published at https://github.com/nhtlongcs/e-commerce-product-search.
With the growth of global maritime transportation, energy optimization has become crucial for reducing costs and ensuring operational efficiency. Shaft power is the mechanical power transmitted from the engine to the shaft and directly impacts fuel consumption, making its accurate prediction a paramount step in optimizing vessel performance. Power consumption is highly correlated with ship parameters such as speed and shaft rotation per minute, as well as weather and sea conditions. Frequent access to this operational data can improve prediction accuracy. However, obtaining high-quality sensor data is often infeasible and costly, making alternative sources such as noon reports a viable option. In this paper, we propose a transfer learning-based approach for predicting vessels shaft power, where a model is initially trained on high-frequency data from a vessel and then fine-tuned with low-frequency daily noon reports from other vessels. We tested our approach on sister vessels (identical dimensions and configurations), a similar vessel (slightly larger with a different engine), and a different vessel (distinct dimensions and configurations). The experiments showed that the mean abso
We show that for a C*-algebra A and a discrete group G with an action of G on A, the reduced crossed product C*-algebra possesses a natural generalization of the convolution product, which we suggest should be named the Hadamard product. We show that this product has a natural Stinespring representation and we lift some known results on block Schur products to this setting, but we also show that the block Schur product is a special case of the Hadamard product in a crossed product algebra.
In this paper, we prove a dihedral extremality and rigidity theorem for a large class of codimension zero submanifolds with polyhedral boundary in warped product manifolds. We remark that the spaces considered in this paper are not necessarily warped product manifolds themselves. In particular, the results of this paper are applicable to submanifolds (of warped product manifolds) with faces that are neither orthogonal nor parallel to the radial direction of the warped product metric. Generally speaking, the dihedral rigidity results require the leaf of the underlying warped space to have positive Ricci curvature and the warping function to be strictly log-concave. Nevertheless, we prove a dihedral rigidity theorem for a large class of hyperbolic polyhedra, where the leaf of the underlying warped product space is flat and the warping function is not strictly log-concave.
Modern neural models capture rich priors and have complementary knowledge over shared data domains, e.g., images and videos. Integrating diverse knowledge from multiple sources -- including visual generative models, visual language models, and sources with human-crafted knowledge such as graphics engines and physics simulators -- remains under-explored. We propose a Product of Experts (PoE) framework that performs inference-time knowledge composition from heterogeneous models. This training-free approach samples from the product distribution across experts via Annealed Importance Sampling (AIS). Our framework shows practical benefits in image and video synthesis tasks, yielding better controllability than monolithic methods and additionally providing flexible user interfaces for specifying visual generation goals.
The aim of this note is to introduce a notion of dynamical entropy, which we call infinite-product entropy, for probability measures on (countable) infinite cartesian product of any measurable space with itself. The idea behind the definition is that any infinite product space may be considered as a type of dynamical object. We have considered in a previous note a similar idea in topological dynamics to define a notion of dynamical entropy for arbitrary subsets of infinite products of compact topological spaces. We consider some basic properties of infinite-product entropy, e.g. shift invariance, convexity, subadditivity with respect to product of probability measures, the behavior with respect to dilation and restriction. We show that for a translation invariant probability measure the infinite-product entropy coincides with the usual entropy of a shift transformation. We consider some basic examples and computations. We also consider a variational inequality related to infinite-product entropy and topological entropy of subsets of infinite product spaces.
Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk management strategies. As the volume of CTI reports continues to surge, the demand for automated tools to streamline report generation becomes increasingly apparent. While Natural Language Processing techniques have shown potential in handling text data, they often struggle to address the complexity of diverse data sources and their intricate interrelationships. Moreover, established paradigms like STIX have emerged as de facto standards within the CTI community, emphasizing the formal categorization of entities and relations to facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic Generation of Intelligence Reports), a transformative Natural Language Generation tool specifically designed to address the pressing challenges in the realm of CTI reporting. AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports from formal representations of entity graphs. AGIR utilizes a two-stage pipeline by combining the advantages of template-based approaches and the capabilities of Large Language Models such as