With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called **EnsemW2S**, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively evaluate the generalization performance of both the ensemble of weak experts and the subsequent strong student model across in-distribution (ID) and out-of-distribution (OOD) datasets. For OOD, we specifically introduce question difficulty as an additional dimension for defining distributional shifts. Our empi
We study the uniform-in-time weak propagation of chaos for the consensus-based optimization (CBO) method on a bounded searching domain. We apply the methodology for studying long-time behaviors of interacting particle systems developed in the work of Delarue and Tse (ArXiv:2104.14973). Our work shows that the weak error has order $O(N^{-1})$ uniformly in time, where $N$ denotes the number of particles. The main strategy behind the proofs are the decomposition of the weak errors using the linearized Fokker-Planck equations and the exponential decay of their Sobolev norms. Consequently, our result leads to the joint convergence of the empirical distribution of the CBO particle system to the Dirac-delta distribution at the global minimizer in population size and running time in Wasserstein-type metrics.
We study the weak convergence of a generic tamed Euler-Maruyama scheme for kinetic stochastic differential equations (SDEs) with integrable drifts. We show that the marginal density of the considered scheme converges at rate 1/2 to the corresponding marginal density of the SDE. The convergence rate is independent from the criticality gap, which is new compared to previous results.
We present a comprehensive study on the spatio temporal weak measurement of a chiral ultrafast optical pulse. We create a chiral vector wave packet by transmitting ultrashort laser pulse via a birefringent or magneto-optic medium. Employing time-resolved leakage radiation microscopy, we examine how the real and imaginary components of the weak value parameter ($ε$) influence pulse propagation over time. Our technique allows us to detect and categorize the temporal polarization fluctuation in a $75$ fs pulse with an excellent repeatability. The achieved experimental results demonstrate a satisfactory consistency with the theoretical predictions.
A precise definition of "weak [quantum] measurements" and "weak value" (of a quantum observable) is offered, and simple finite dimensional examples are given showing that weak values are not unique and therefore probably do not correspond to any physical attribute of the system being "weakly" measured, contrary to impressions given by most of the literature on weak measurements. A possible mathematical error in the seminal paper introducing "weak values" is explicitly identified. A mathematically rigorous argument obtains results similar to, and more general than, the main result of that paper and concludes that even in the infinite-dimensional context of that paper, weak values are not unique. This implies that the "usual" formula for weak values is not universal, but can apply only to specific physical situations. The paper is written in a more pedagogical and informal style than is usual in the research literature in the hope that it might serve as an introduction to weak values.
Recently an operator space version of type and cotype, namely type $(p,H)$ and cotype $(q,H)$ of operator spaces for $1\leq p \leq 2\leq q \leq \infty$ and a subquadratic and homogeneous Hilbetian operator space $H$ were introduced and investigated by the author. In this paper we define weak type $(2,H)$ (resp. weak cotype $(2,H)$) of operator spaces, which lies strictly between type $(2,H)$ (resp. cotype $(2,H)$) and type $(p,H)$ for all $1\leq p <2$ (resp. cotype $(q,H)$ for all $2<q \leq \infty$). This is an analogue of weak type 2 and weak cotype 2 in the Banach space case, so we develop analogous equivalent formulations. We also consider weak-$H$ space, spaces with weak type $(2,H)$ and weak cotype $(2,H^*)$ simultaneously and establish corresponding equivalent formulations.
Let A be a commutative ring, and let \a = \frak{a} be a finitely generated ideal in it. It is known that a necessary and sufficient condition for the derived \a-torsion and \a-adic completion functors to be nicely behaved is the weak proregularity of \a. In particular, the MGM Equivalence holds. Because weak proregularity is defined in terms of elements of the ring (specifically, it involves limits of Koszul complexes), it is not suitable for noncommutative ring theory. In this paper we introduce a new condition on a torsion class T in a module category: weak stability. Our first main theorem is that in the commutative case, the ideal \a is weakly proregular if and only if the corresponding torsion class T_{\a} is weakly stable. We then study weak stability of torsion classes in module categories over noncommutative rings. There are three main theorems in this context: - For a torsion class T that is weakly stable, quasi-compact and finite dimensional, the right derived torsion functor is isomorphic to a left derived tensor functor. - The Noncommutative MGM Equivalence, under the same assumptions on T. - A theorem about symmetric derived torsion for complexes of bimodules. This las
We propose a new generative model of projected cosmic mass density maps inferred from weak gravitational lensing observations of distant galaxies (weak lensing mass maps). We construct the model based on a neural style transfer so that it can transform Gaussian weak lensing mass maps into deeply non-Gaussian counterparts as predicted in ray-tracing lensing simulations. We develop an unpaired image-to-image translation method with Cycle-Consistent Generative Adversarial Networks (Cycle GAN), which learn efficient mapping from an input domain to a target domain. Our model is designed to enjoy important advantages; it is trainable with no need for paired simulation data, flexible to make the input domain visually meaningful, and expandable to rapidly-produce a map with a larger sky coverage than training data without additional learning. Using 10,000 lensing simulations, we find that appropriate labeling of training data based on field variance allows the model to reproduce a correct scatter in summary statistics for weak lensing mass maps. Compared with a popular log-normal model, our model improves in predicting the statistical natures of three-point correlations and local propertie
Metacirculants are a rich resource of many families of interesting graphs, and weak metacirculants are generalizations of them. A graph is called a {\em split weak metacirculant} if it has a vertex-transitive split metacyclic automorphism group. In two recent papers, it is shown that a graph of prime power order is a metacirculant if and only if it is a split weak metacirculant. Let $m$ is a positive integer. In this paper, we first give a sufficient condition for the existence of split weak metacirculants of order $m$ which are not metacirculants. This is then used to give a sufficient and necessary condition for the existence of split weak metacirculants of order $n$ which are not metacirculants, where $n$ is a product of two prime-powers. As byproducts, we construct infinitely many split weak metacirculant graphs which are not metacirculant graphs, and answer an open question reported in the literature.
We consider the approximation of entropy solutions of nonlinear hyperbolic conservation laws using neural networks. We provide explicit computations that highlight why classical PINNs will not work for discontinuous solutions to nonlinear hyperbolic conservation laws and show that weak (dual) norms of the PDE residual should be used in the loss functional. This approach has been termed "weak PINNs" recently. We suggest some modifications to weak PINNs that make their training easier, which leads to smaller errors with less training, as shown by numerical experiments. Additionally, we extend wPINNs to scalar conservation laws with weak boundary data and to systems of hyperbolic conservation laws. We perform numerical experiments in order to assess the accuracy and efficiency of the extended method.
We investigate disorder-induced localization in metals that break time-reversal and inversion symmetries through their energy dispersion, $ε_{k} eqε_{-k}$, but lack Berry phases. In the perturbative regime of disorder, we show that weak localization is suppressed due to a mismatch of the Fermi velocities of left and right movers. To substantiate this analytical result, we perform quench numerics on chains shorter than the Anderson localization length -- the latter computed and verified to be finite using the recursive Green's function method -- and find a sharp rise in the saturation value of the participation ratio due to band asymmetry, indicating a tendency to delocalize. Interestingly, for weak disorder strength $η$, we see a better fit to the scaling behavior $ξ\propto1/η^{2}$ for asymmetric bands than conventional symmetric ones.
This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty variational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per iteration which can however be prohibitively expensive. In contrast, our proposed methods involves two stepsizes and only requires one additional oracle evaluation per iteration. We show that it is possible to keep one fixed stepsize while it is only the second stepsize that is taken to be diminishing, making it interesting even in the monotone setting. Almost sure convergence is established and we provide a unified analysis for this family of schemes which contains a nonlinear generalization of the celebrated primal dual hybrid gradient algorithm.
We show that a weak projective measurement of photon arrival time can be realized by controllable two photon interferences with photons from short-time reference pulses at a polarization beam splitter. The weak value of the projector on the arrival time defined by the reference pulse can be obtained from the coincidence rates conditioned by a specific output measurement. If the weak measurement is followed by a measurement of frequency, the coincidence counts reveal the complete temporal coherence of the single photon wavefunction. Significantly, the weak values of the input state can also be obtained at higher measurement strengths, so that correlations between weak measurements on separate photons can be observed and evaluated without difficulty. The method can thus be used to directly observe the non-classical statistics of time-energy entangled photons.
A weak measurement consists in coupling a system to a probe in such a way that constructive interference generates a large output. So far, only the average output of the probe and its variance were studied. Here, the characteristic function for the moments of the output is provided. The outputs considered are not limited to the eigenstates of the pointer or of its conjugate variable, so that the results apply to any observable $\Hat{o}$ of the probe. Furthermore, a family of well behaved complex quantities, the normal weak values, is introduced, in terms of which the statistics of the weak measurement can be described. It is shown that, within a good approximation, the whole statistics of weak measurement is described by a complex parameter, the weak value, and a real one.
For integers $k\geq 2$, we study two differential operators on harmonic weak Maass forms of weight $2-k$. The operator $ξ_{2-k}$ (resp. $D^{k-1}$) defines a map to the space of weight $k$ cusp forms (resp. weakly holomorphic modular forms). We leverage these operators to study coefficients of harmonic weak Maass forms. Although generic harmonic weak Maass forms are expected to have transcendental coefficients, we show that those forms which are "dual" under $ξ_{2-k}$ to newforms with vanishing Hecke eigenvalues (such as CM forms) have algebraic coefficients. Using regularized inner products, we also characterize the image of $D^{k-1}$.
The weak decays of hyperons and hypernuclei are studied from the chiral symmetry viewpoint. The soft pion relations are useful in understanding the isospin properties of the weak hyperon decays. Recent development on the short-range part of the $\Lam N\to NN$ weak transitions shows fairly good account of the weak decays of hypernuclei, though it fails to explain the $n/p$ ratio. The $\pip$ decays of light hypernuclei are studied in the soft pion approach. They are related to the $\DI=3/2$ amplitudes of the nonmesonic decay.
We define a weak bimonad as a monad T on a monoidal category M with the property that the Eilenberg-Moore category M^T is monoidal and the forgetful functor from M^T to M is separable Frobenius. Whenever M is also Cauchy complete, a simple set of axioms is provided, that characterizes the monoidal structure of M^T as a weak lifting of the monoidal structure of M . The relation to bimonads, and the relation to weak bimonoids in a braided monoidal category are revealed. We also discuss antipodes, obtaining the notion of weak Hopf monad.
This review forms the Weak Lensing part of the Saas-Fee Advanced Course on Gravitational Lensing. It describes the basicsm applications and results of weak lensing. Contents: (1) Introduction (2) The principles of weak gravitational lensing (3) Observational issues and challenges (4) Clusters of galaxies: Introduction, and strong lensing (5) Mass reconstructions from weak lensing (6) Cosmic shear -- lensing by the LSS (7) Large-scale structure lensing: results (8) The mass of, and associated with galaxies (9) Additional issues in cosmic shear (10) Concluding remarks.
We examine scatter and bias in weak lensing selected clusters, employing both an analytic model of dark matter haloes and numerical mock data of weak lensing cluster surveys. We pay special attention to effects of the diversity of dark matter distributions within clusters. We find that peak heights of the lensing convergence map correlates rather poorly with the virial mass of haloes. The correlation is tighter for the spherical overdensity mass with a higher mean interior density. We examine the dependence of the halo shape on the peak heights, and find that the rms scatter caused by the halo diversity scales linearly with the peak heights with the proportionality factor of 0.1-0.2. The noise originated from the halo shape is found to be comparable to the source galaxy shape noise and the cosmic shear noise. We find the significant halo orientation bias, i.e., weak lensing selected clusters on average have their major axes aligned with the line-of-sight direction. We compute the orientation bias using an analytic triaxial halo model and obtain results quite consistent with the ray-tracing results. We develop a prescription to analytically compute the number count of weak lensing p
A statistical analysis of optimal universal cloning shows that it is possible to identify an ideal (but non-positive) copying process that faithfully maps all properties of the original Hilbert space onto two separate quantum systems. The joint probabilities for non-commuting measurements on separate clones then correspond to the real parts of the complex joint probabilities observed in weak measurements on a single system, where the measurements on the two clones replace the corresponding sequence of weak measurement and post-selection. The imaginary parts of weak measurement statics can be obtained by replacing the cloning process with a partial swap operation. A controlled-swap operation combines both processes, making the complete weak measurement statistics accessible as a well-defined contribution to the joint probabilities of fully resolved projective measurements on the two output systems.