As large language models (LLMs) are more frequently used in retrieval-augmented generation pipelines, it is increasingly relevant to study their behavior under knowledge conflicts. Thus far, the role of the source of the retrieved information has gone unexamined. We address this gap with a novel framework to investigate how source preferences affect LLM resolution of inter-context knowledge conflicts in English, motivated by interdisciplinary research on credibility. By using synthetic sources, we study preferences for different types of sources without inheriting the biases of specific real-world sources. With a comprehensive, tightly-controlled evaluation of 13 open-weight LLMs, we find that LLMs prefer institutionally-corroborated information (e.g., government or newspaper sources) over information from people and social media. However, these source preferences can be reversed by simply repeating information from less credible sources. To mitigate repetition effects and maintain consistent preferences, we propose a novel method that reduces repetition bias by up to 79.2%, while also maintaining at least 72.5% of original preferences. We release all data and code to encourage fut
Audio synthesis has broad applications in multimedia. Recent advancements have made it possible to generate relevant audios from inputs describing an audio scene, such as images or texts. However, the immersiveness and expressiveness of the generation are limited. One possible problem is that existing methods solely rely on the global scene and overlook details of local sounding objects (i.e., sound sources). To address this issue, we propose a Sound Source-Aware Audio (SS2A) generator. SS2A is able to locally perceive multimodal sound sources from a scene with visual detection and cross-modality translation. It then contrastively learns a Cross-Modal Sound Source (CMSS) Manifold to semantically disambiguate each source. Finally, we attentively mix their CMSS semantics into a rich audio representation, from which a pretrained audio generator outputs the sound. To model the CMSS manifold, we curate a novel single-sound-source visual-audio dataset VGGS3 from VGGSound. We also design a Sound Source Matching Score to clearly measure localized audio relevance. With the effectiveness of explicit sound source modeling, SS2A achieves state-of-the-art performance in extensive image-to-audio
We present starkiller, an open-source Python package for forward-modeling flux retrieval from integral field unit spectrograph (IFU) datacubes. Starkiller simultaneously provides stellar spectral classification, relative velocity, and line-of-sight extinction for all sources in a catalog, alongside a source-subtracted datacube. It performs synthetic difference imaging by simulating all catalog sources in the field of view, using the catalog for positions and fluxes to scale stellar models, independent of the datacube. This differencing method is particularly powerful for subtracting both point-sources and trailed or even streaked sources from extended astronomical objects. We demonstrate starkiller's effectiveness in improving observations of extended sources in dense stellar fields for VLT/MUSE observations of comets, asteroids and nebulae. We also show that starkiller can treat satellite-impacted VLT/MUSE observations. The package could be applied to tasks as varied as dust extinction in clusters and stellar variability; the stellar modeling using Gaia fluxes is provided as a standalone function. The techniques can be expanded to imagers and to other IFUs.
Private randomness is a fundamental resource for cryptography, security proofs, and information processing. Quantum devices offer a unique advantage by amplifying weak randomness sources in regimes unattainable by classical means. A central theoretical model for such sources is the Santha-Vazirani (SV) model, yet identifying natural processes that satisfy this model remains a major challenge. Here we take three steps toward addressing this problem. First, we introduce an axiomatic framework for quantifying weak randomness, providing a unified basis for estimating an SV-type source. Second, we develop SVTest, a general-purpose software tool for estimating the SV parameter of an arbitrary data sequence. Third, we apply this framework to both engineered and natural sources. Using data from a self-certifying commercial quantum random number generator with guaranteed min-entropy as a benchmark, we validate the accuracy and limitations of our estimation method. We then analyze geophysical signals associated with seismic activity and find that, depending on the discretization, both earthquakes and local seismic noise can exhibit SV-type randomness. Our results indicate that geophysical ph
We define and investigate source-modality monitoring -- the ability of multimodal models to track and communicate the input source from which pieces of information originate. We consider source-modality monitoring as an instance of the more general binding problem, and evaluate the extent to which models exploit syntactic vs. semantic signals in order to bind words like image in a user-provided prompt to specific components of their input and context (i.e., actual images). Across experiments spanning 11 vision-language models (VLMs) performing target-modality information retrieval tasks, we find that both syntactic and semantic signals play an important role, but that the latter tend to outweigh the former in cases when modalities are highly distinct distributionally. We discuss the implications of these findings for model robustness, and in the context of increasingly multimodal agentic systems.
In this paper, we investigate Source-free Open-partial Domain Adaptation (SF-OPDA), which addresses the situation where there exist both domain and category shifts between source and target domains. Under the SF-OPDA setting, which aims to address data privacy concerns, the model cannot access source data anymore during target adaptation. We propose a novel training scheme to learn a (n+1)-way classifier to predict the n source classes and the unknown class, where samples of only known source categories are available for training. Furthermore, for target adaptation, we simply adopt a weighted entropy minimization to adapt the source pretrained model to the unlabeled target domain without source data. In experiments, we show our simple method surpasses current OPDA approaches which demand source data during adaptation. When augmented with a closed-set domain adaptation approach during target adaptation, our source-free method further outperforms the current state-of-the-art OPDA method by 2.5%, 7.2% and 13% on Office-31, Office-Home and VisDA respectively.
In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task based on Dirac likelihood functions. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the source separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.
Recent gravitational wave detections from black hole mergers have underscored the critical role black hole perturbation theory and the Teukolsky equation play in understanding the behaviour of black holes. The separable nature of the Teukolsky equation has long been leveraged to study the vacuum linear Teukolsky equation; however, as theory and measurements advance, solving the sourced Teukolsky equation is becoming a frontier of research. In particular, second-order calculations, such as in quasi-normal mode and self-force problems, have extended sources. This paper presents a novel method for analytically separating the Teukolsky equation's source, aimed to improve efficiency. Separating the source is a non-trivial problem due to the angular and radial mixing of generic quantities in Kerr spacetime. We provide a proof-of-concept demonstration of our method and show that it is accurate, separating the Teukolsky source produced by the stress-energy tensor of an ideal gas cloud surrounding a Kerr black hole. The detailed application of our method is provided in an accompanying \textit{Mathematica} notebook. Our approach opens up a new avenue for accurate black hole perturbation theo
We detect a highly significant excess of X-ray (2RXS) and radio (NVSS, GMRT, VLSSr) catalog sources when stacked around MCXC galaxy clusters and groups, narrowly confined within $\lesssim100\mathrm{\,kpc}$ of the $\sim2.4 R_{500}$ virial shock radius (inferred from previous continuum stacking), with similar X-ray ($\sim4σ$ for $443$ clusters) and radio ($\sim4σ$ for $485$ clusters) characteristics ($>5σ$ joint). The excess sources show $10-100$ kpc scales, $L_X(0.1-2.4\mbox{ keV})\simeq10^{42-43}\mathrm{\,erg\,s^{-1}}$ or $νL_ν(ν=1.4\mbox{ GHz}) \simeq 10^{40-41}\mathrm{\,erg\,s^{-1}}$ luminosities, and a preferentially radial radio-polarization. The narrow localization and properties of the excess identify these sources not as AGN, often invoked speculatively for excess X-ray sources at cluster outskirts, but rather as infalling gaseous clumps interacting with the virial shock, probably galactic halos and possibly outflow remnants. The local excess of such discrete, radio-to-$γ$-ray sources around an object can probe its virial shock also at high redshifts and sub-cluster scales.
Existing research on source tracing of audio deepfake systems has focused primarily on the closed-set scenario, while studies that evaluate open-set performance are limited to a small number of unseen systems. Due to the large number of emerging audio deepfake systems, robust open-set source tracing is critical. We leverage the protocol of the Interspeech 2025 special session on source tracing to evaluate methods for improving open-set source tracing performance. We introduce a novel adaptation to the energy score for out-of-distribution (OOD) detection, softmax energy (SME). We find that replacing the typical temperature-scaled energy score with SME provides a relative average improvement of 31% in the standard FPR95 (false positive rate at true positive rate of 95%) measure. We further explore SME-guided training as well as copy synthesis, codec, and reverberation augmentations, yielding an FPR95 of 8.3%.
Music Source Restoration (MSR) targets recovery of original, unprocessed instrument stems from fully mixed and mastered audio, where production effects and distribution artifacts violate common linear-mixture assumptions. This technical report presents the CP-JKU team's system for the MSR ICASSP Challenge 2025. Our approach decomposes MSR into separation and restoration. First, a single BandSplit-RoFormer separator predicts eight stems plus an auxiliary other stem, and is trained with a three-stage curriculum that progresses from 4-stem warm-start fine-tuning (with LoRA) to 8-stem extension via head expansion. Second, we apply a HiFi++ GAN waveform restorer trained as a generalist and then specialized into eight instrument-specific experts.
Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods, making it especially relevant in the context of intelligent vehicles. It utilizes a well-trained source model and unlabeled target data to achieve adaptation in the target domain. However, in the absence of source data and target labels, current solutions cannot sufficiently reduce the impact of domain shift and fully leverage the information from the target data. In this paper, we propose an end-to-end source-free domain adaptation semantic segmentation method via Importance-Aware and Prototype-Contrast (IAPC) learning. The proposed IAPC framework effectively extracts domain-invariant knowledge from the well-trained source model and learns domain-specific knowledge from the unlabeled target domain. Specifically, considering the problem of domain shift in the prediction of the target domain by the source model, we put forward an importance-aware mechanism for the biased target prediction probability dis
Ambisonics is a scene-based spatial audio format that has several useful features compared to object-based formats, such as efficient whole scene rotation and versatility. However, it does not provide direct access to the individual source signals, so that these have to be separated from the mixture when required. Typically, this is done with linear spherical harmonics (SH) beamforming. In this paper, we explore deep-learning-based source separation on static Ambisonics mixtures. In contrast to most source separation approaches, which separate a fixed number of sources of specific sound types, we focus on separating arbitrary sound from specific directions. Specifically, we propose three operating modes that combine a source separation neural network with SH beamforming: refinement, implicit, and mixed mode. We show that a neural network can implicitly associate conditioning directions with the spatial information contained in the Ambisonics scene to extract specific sources. We evaluate the performance of the three proposed approaches and compare them to SH beamforming on musical mixtures generated with the musdb18 dataset, as well as with mixtures generated with the FUSS dataset
In this paper, we introduce a novel numerical method for reconstructing the trajectory within three-dimensional space, where both the emission moment and spatial location of the point source are unknown. Our approach relies solely on measuring the time of arrival at five or seven properly chosen observation points. By utilizing the distinctive geometric configuration of these five or seven observation points, we establish the uniqueness of the trajectory and emission moment of the point source through rigorous mathematical proofs. Moreover, we analyze the stability of our proposed method. The effectiveness of the method is also verified by numerical experiments.
We aim to evaluate the possibility of improving the ICRS realization starting from the ICRF2 catalogue by investigating the coordinate time series of radio sources observed by VLBI between 1979 and 2016. Sources with long observational history are selected as the candidates and the least squares fits with special handling of the weights are performed to derive the linear drifts of the source coordinates. Then the sources are sorted based on the normalized linear drift (i) over the whole sky and (ii) in four homolographic areas divided by declinations. The axial stability of the reference system and sky distribution defined by the selected sources are evaluated, which are acted as the criterion for the final source lists. With our improved source selection scheme, two groups of sources are proposed and considered suitable for defining a more stable and homogeneous celestial reference system compared to the current ICRF2. The number of sources in the final lists are 323 and 294, respectively, and the global rotation of the axes derived from apparent motion of the sources are about two times better than the ICRF2.
Two key contributions presented in this paper are: i) A method for building a dataset containing source code features extracted from source files taken from Open Source Software (OSS) and associated bug reports, ii) A predictive model for estimating defectiveness of a given source code. These artifacts can be useful for building tools and techniques pertaining to several automated software engineering areas such as bug localization, code review, and recommendation and program repair. In order to achieve our goal, we first extract coding style information (e.g. related to programming language constructs used in the source code) for source code files present on GitHub. Then the information available in bug reports (if any) associated with these source code files are extracted. Thus fetched un(/ semi)-structured information is then transformed into a structured knowledge base. We considered more than 30400 source code files from 20 different GitHub repositories with about 14950 associated bug reports across 4 bug tracking portals. The source code files considered are written in four programming languages (viz., C, C++, Java, and Python) and belong to different types of applications. A
Quantum Wells (QW) are of great importance in optoelectronic devices such as LEDs and LASERs, being the emissive layers.Simulating the quantum particles in different QW topologies like rectangular finite potential wells, multiple potential wells, and triangular biased potential well heterojunctions enables faster modeling, theoretical characterization, and more. QVNTVS performs energy level and wavefunction calculations, recombination probability, transition energy, and optical emission computations quickly and accurately. Contrasting with the existing simulators, QVNTVS is an open-source project and can produce solutions for niche problems like potential wells under an electric field, heterojunctions, recombination, and transition matrices. QVNTVS simulates QWs by solving the Time-Independent Schrödinger Equation for different potential profiles in a discretized space using the finite-difference method and computes the properties of the device using the extracted information from the solution. The results align with the analytical calculations and the experimental data.
The extended source effect on microlensing magnification is non-negligible and must be taken into account for in an analysis of microlensing. However, the evaluation of the extended source magnification is numerically expensive because it includes the two-dimensional integral over source profile. Various studies have developed methods to reduce this integral down to the one-dimensional-integral or integral-free form, which adopt some approximations or depend on the exact form of the source profile, e.g. disk, linear/quadratic limb-darkening profile. In this paper, we develop a new method to evaluate the extended source magnification based on fast Fourier transformation (FFT), which does not adopt any approximations and is applicable to any source profiles. Our implementation of the FFT based method enables the fast evaluation of the extended source magnification as fast as $\sim1$ msec (CPU time on a laptop) and guarantees an accuracy better than 0.3%. The FFT based method can be used for the template fitting to a huge data set of light curves from the existing and upcoming surveys.
While Unsupervised Domain Adaptation (UDA) algorithms, i.e., there are only labeled data from source domains, have been actively studied in recent years, most algorithms and theoretical results focus on Single-source Unsupervised Domain Adaptation (SUDA). However, in the practical scenario, labeled data can be typically collected from multiple diverse sources, and they might be different not only from the target domain but also from each other. Thus, domain adapters from multiple sources should not be modeled in the same way. Recent deep learning based Multi-source Unsupervised Domain Adaptation (MUDA) algorithms focus on extracting common domain-invariant representations for all domains by aligning distribution of all pairs of source and target domains in a common feature space. However, it is often very hard to extract the same domain-invariant representations for all domains in MUDA. In addition, these methods match distributions without considering domain-specific decision boundaries between classes. To solve these problems, we propose a new framework with two alignment stages for MUDA which not only respectively aligns the distributions of each pair of source and target domain
Free/Open Source Software (FOSS) enables large-scale reuse of preexisting software components. The main drawback is increased complexity in software supply chain management. A common approach to tame such complexity is automated open source compliance, which consists in automating the verication of adherence to various open source management best practices about license obligation fulllment, vulnerability tracking, software composition analysis, and nearby concerns.We consider the problem of auditing a source code base to determine which of its parts have been published before, which is an important building block of automated open source compliance toolchains. Indeed, if source code allegedly developed in house is recognized as having been previously published elsewhere, alerts should be raised to investigate where it comes from and whether this entails that additional obligations shall be fullled before product shipment.We propose an ecient approach for prior publication identication that relies on a knowledge base of known source code artifacts linked together in a global Merkle direct acyclic graph and a dedicated discovery protocol. We introduce swh-scanner, a source code scan