In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of (1) a weighted Mean Square Error (wMSE) loss and (2) a Cross-Entropy loss incorporating soft labels. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL loss in scenarios like knowledge distillation by breaking its asymmetric optimization property along with a smoother weight function. This modification effectively alleviates convergence challenges in optimization, particularly for classes with high predicted scores in soft labels. Secondly, we introduce class-wise global information into KL/DKL to reduce bias arising from individual samples. With these two enhancements, we derive the Generalized Kullback-Leibler (GKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100, ImageNet, and vision-language datasets, focusing on adversarial training, and knowledge distillation tasks. Specifically, we achieve new state-of-the-art adversarial robustness on the public leaderboard
We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.
Low-light image enhancement (LLIE) aims to improve the visual quality of images captured under poor lighting conditions. In supervised LLIE research, there exists a significant yet often overlooked inconsistency between the overall brightness of an enhanced image and its ground truth counterpart, referred to as brightness mismatch in this study. Brightness mismatch negatively impact supervised LLIE models by misleading model training. However, this issue is largely neglected in current research. In this context, we propose the GT-mean loss, a simple yet effective loss function directly modeling the mean values of images from a probabilistic perspective. The GT-mean loss is flexible, as it extends existing supervised LLIE loss functions into the GT-mean form with minimal additional computational costs. Extensive experiments demonstrate that the incorporation of the GT-mean loss results in consistent performance improvements across various methods and datasets.
We investigate the measurement incompatibility of continuous-variable systems with infinite-dimensional Hilbert spaces under the influence of pure losses, a fundamental noise source in quantum optics, and a significant challenge for long-distance quantum communication. We show that loss channels with transmissivities less than $1/n$ make any set of $n$ measurements compatible. Additionally, we design a set of measurements that remains incompatible even under extreme losses, where the number of measurements in the set increases with the amount of loss. These measurements rely on on-off photodetectors and linear optics, making them feasible for implementation under realistic laboratory conditions. Furthermore, we demonstrate that no loss channel can break the incompatibility of all measurements. As a result, quantum steering remains achievable in the presence of pure loss.
Heterogeneous and monolithic integration of the versatile low loss silicon nitride platform with low temperature materials such as silicon electronics and photonics, III-V compound semiconductors, lithium niobate, organics, and glasses, has been inhibited by the need for high temperature annealing as well as the need for different process flows for thin and thick waveguides. New techniques are needed to maintain the state-of-the-art losses, nonlinear properties, and CMOS compatible processes while enabling this next generation of 3D silicon nitride integration. We report a significant advance in silicon nitride integrated photonics, demonstrating the lowest losses to date for an anneal-free process at a maximum temperature of 250 C, with the same deuterated silane based fabrication flow, for nitride and oxide, for an order of magnitude range in nitride thickness without requiring stress mitigation or polishing. We report record low losses for anneal-free nitride core and oxide cladding, enabling 1.77 dB/m loss and 14.9 million Q for 80 nm nitride core waveguides, more than half an order magnitude lower loss than previously reported 270 C processes, and 8.66 dB/m loss and 4.03 milli
Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image sp
We study transmission stabilization of optical solitons against emission of radiation in nonlinear optical waveguides in the presence of weak linear gain-loss, cubic loss, and the collisional Raman frequency shift. We first show how the collisional Raman frequency shift perturbation arises in three different physical setups. We then show by numerical simulations with a perturbed nonlinear Schrödinger (NLS) model that transmission in waveguides with weak frequency-independent linear gain is unstable. The radiative instability is stronger than the radiative instabilities that were observed in earlier studies for soliton transmission in the presence of weak linear gain, cubic loss, and various frequency-shifting physical mechanisms. In particular, the Fourier spectrum of the radiation is significantly more spiky and broadband than the radiation's Fourier spectra in earlier studies. Moreover, we demonstrate by numerical simulations with another perturbed NLS model that transmission in waveguides with weak frequency-dependent linear gain-loss, cubic loss, and the collisional Raman frequency shift is stable. Despite the stronger radiative instability in the corresponding waveguide setup
We analyze loss development in NAIC Schedule P loss triangles using functional data analysis methods. Adopting the functional viewpoint, our dataset comprises 3300+ curves of incremental loss ratios (ILR) of workers' compensation lines over 24 accident years. Relying on functional data depth, we first study similarities and differences in development patterns based on company-specific covariates, as well as identify anomalous ILR curves. The exploratory findings motivate the probabilistic forecasting framework developed in the second half of the paper. We propose a functional model to complete partially developed ILR curves based on partial least squares regression of PCA scores. Coupling the above with functional bootstrapping allows us to quantify future ILR uncertainty jointly across all future lags. We demonstrate that our method has much better probabilistic scores relative to Chain Ladder and in particular can provide accurate functional predictive intervals.
Point cloud completion networks are conventionally trained to minimize the disparities between the completed point cloud and the ground-truth counterpart. However, an incomplete object-level point cloud can have multiple valid completion solutions when it is examined in isolation. This one-to-many mapping issue can cause contradictory supervision signals to the network because the loss function may produce different values for identical input-output pairs of the network. In many cases, this issue could adversely affect the network optimization process. In this work, we propose to enhance the conventional learning objective using a novel completion consistency loss to mitigate the one-to-many mapping problem. Specifically, the proposed consistency loss ensure that a point cloud completion network generates a coherent completion solution for incomplete objects originating from the same source point cloud. Experimental results across multiple well-established datasets and benchmarks demonstrated the proposed completion consistency loss have excellent capability to enhance the completion performance of various existing networks without any modification to the design of the networks. Th
In recent years, the shortcomings of Bayesian posteriors as inferential devices have received increased attention. A popular strategy for fixing them has been to instead target a Gibbs measure based on losses that connect a parameter of interest to observed data. However, existing theory for such inference procedures assumes these losses are analytically available, while in many situations these losses must be stochastically estimated using pseudo-observations. In such cases, we show that when standard Markov Chain Monte Carlo algorithms are used to produce posterior samples, the resulting posterior exhibits strong dependence on the number of pseudo-observations: unless the number of pseudo-observations diverge sufficiently fast the resulting posterior will concentrate very slowly. However, we show that in many situations it is feasible to alleviate this dependence entirely using a modified piecewise deterministic Markov process (PDMP) sampler, and we formally and empirically show that these samplers produce posterior draws that have no dependence on the number of pseudo-observations used to estimate the loss within the Gibbs Measure. We apply our results to three examples that fea
The study of dependence between random variables under external influences is a challenging problem in multivariate analysis. We address this by proposing a novel semi-parametric approach for conditional copula models using Bayesian additive regression trees (BART) models. BART is becoming a popular approach in statistical modelling due to its simple ensemble type formulation complemented by its ability to provide inferential insights. Although BART allows us to model complex functional relationships, it tends to suffer from overfitting. In this article, we exploit a loss-based prior for the tree topology that is designed to reduce the tree complexity. In addition, we propose a novel adaptive Reversible Jump Markov Chain Monte Carlo algorithm that is ergodic in nature and requires very few assumptions allowing us to model complex and non-smooth likelihood functions with ease. Moreover, we show that our method can efficiently recover the true tree structure and approximate a complex conditional copula parameter, and that our adaptive routine can explore the true likelihood region under a sub-optimal proposal variance. Lastly, we provide case studies concerning the effect of gross do
Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well-known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Dielectric loss is known to limit state-of-the-art superconducting qubit lifetimes. Recent experiments imply upper bounds on bulk dielectric loss tangents on the order of $100$ parts-per-billion, but because these inferences are drawn from fully fabricated devices with many loss channels, they do not definitively implicate or exonerate the dielectric. To resolve this ambiguity, we have devised a measurement method capable of separating and resolving bulk dielectric loss with a sensitivity at the level of $5$ parts per billion. The method, which we call the dielectric dipper, involves the in-situ insertion of a dielectric sample into a high-quality microwave cavity mode. Smoothly varying the sample's participation in the cavity mode enables a differential measurement of the sample's dielectric loss tangent. The dielectric dipper can probe the low-power behavior of dielectrics at cryogenic temperatures, and does so without the need for any lithographic process, enabling controlled comparisons of substrate materials and processing techniques. We demonstrate the method with measurements of EFG sapphire, from which we infer a bulk loss tangent of $62(7) \times 10^{-9}$ and a substrate-a
In the field of Natural Language Processing, there are many tasks that can be tackled effectively using the cross-entropy (CE) loss function. However, the task of dialog generation poses unique challenges for CE loss. This is because CE loss assumes that, for any given input, the only possible output is the one available as the ground truth in the training dataset. But, in dialog generation, there can be multiple valid responses (for a given context) that not only have different surface forms but can also be semantically different. Furthermore, CE loss computation for the dialog generation task does not take the input context into consideration and, hence, it grades the response irrespective of the context. To grade the generated response for qualities like relevance, engagingness, etc., the loss function should depend on both the context and the generated response. To address these limitations, this paper proposes CORAL, a novel loss function based on a reinforcement learning (RL) view of the dialog generation task with a reward function that estimates human preference for generated responses while considering both the context and the response. Furthermore, to overcome challenges
Statisticians often face the choice between using probability models or a paradigm defined by minimising a loss function. Both approaches are useful and, if the loss can be re-cast into a proper probability model, there are many tools to decide which model or loss is more appropriate for the observed data, in the sense of explaining the data's nature. However, when the loss leads to an improper model, there are no principled ways to guide this choice. We address this task by combining the Hyvärinen score, which naturally targets infinitesimal relative probabilities, and general Bayesian updating, which provides a unifying framework for inference on losses and models. Specifically we propose the H-score, a general Bayesian selection criterion and prove that it consistently selects the (possibly improper) model closest to the data-generating truth in Fisher's divergence. We also prove that an associated H-posterior consistently learns optimal hyper-parameters featuring in loss functions, including a challenging tempering parameter in generalised Bayesian inference. As salient examples, we consider robust regression and non-parametric density estimation where popular loss functions de
In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification. In contrast to the commonly used loss functions for speech enhancement such as the L2 loss, the VoiceID loss is based on the feedback from a speaker verification model to generate a ratio mask. The generated ratio mask is multiplied pointwise with the original spectrogram to filter out unnecessary components for speaker verification. In the experiments, we observed that the enhancement network, after training with the VoiceID loss, is able to ignore a substantial amount of time-frequency bins, such as those dominated by noise, for verification. The resulting model consistently improves the speaker verification system on both clean and noisy conditions.
We present a theory of expected utility with state-dependent linear utility functions for monetary returns, that incorporates the possibility of loss-aversion. Our results relate to first order stochastic dominance, mean-preserving spread, increasing-concave linear utility profiles and risk aversion. As an application of the expected utility theory developed here, we analyze the contract that a monopolist would offer in an insurance market that allowed for partial coverage of loss.
There is no such thing as a perfect dataset. In some datasets, deep neural networks discover underlying heuristics that allow them to take shortcuts in the learning process, resulting in poor generalization capability. Instead of using standard cross-entropy, we explore whether a modulated version of cross-entropy called focal loss can constrain the model so as not to use heuristics and improve generalization performance. Our experiments in natural language inference show that focal loss has a regularizing impact on the learning process, increasing accuracy on out-of-distribution data, but slightly decreasing performance on in-distribution data. Despite the improved out-of-distribution performance, we demonstrate the shortcomings of focal loss and its inferiority in comparison to the performance of methods such as unbiased focal loss and self-debiasing ensembles.
We present the IEEE-IS2 2024 Music Packet Loss Concealment Challenge. We begin by detailing the challenge rules, followed by an overview of the provided baseline system, the blind test set, and the evaluation methodology used to determine the final ranking. This inaugural edition aimed to foster collaboration between researchers and practitioners from the fields of signal processing, machine learning, and networked music performance, while also laying the groundwork for future advancements in packet loss concealment for music signals.
Beam loss is a critical issue in high-intensity accelerators, and much effort is expended during both the design and operation phases to minimize the loss and to keep it to manageable levels. As new accelerators become ever more powerful, beam loss becomes even more critical. Linacs for H- ion beams, such as the one at the Oak Ridge Spallation Neutron Source, have many more loss mechanisms compared to H+ (proton) linacs, such as the one being designed for the European Spallation Neutron Source. Interesting H- beam loss mechanisms include residual gas stripping, H+ capture and acceleration, field stripping, black-body radiation and the recently discovered intra-beam stripping mechanism. Beam halo formation, and ion source or RF turn on/off transients, are examples of beam loss mechanisms that are common for both H+ and H- accelerators. Machine protection systems play an important role in limiting the beam loss.