共找到 20 条结果
Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the training computation consumption is only 9.25% of a dense model at the same parameter scale. Yuan 2.0-M32 demonstrates competitive capability on coding, math, and various domains of expertise, with only 3.7B active parameters of 40B in total, and 7.4 GFlops forward computation per token, both of which are only 1/19 of Llama3-70B. Yuan 2.0-M32 surpass Llama3-70B on MATH and ARC-Challenge benchmark, with accuracy of 55.89 and 95.8 respectively. The models and source codes of Yuan 2.0-M32 are released at Github1.
From the Bretton Woods agreement in 1944 till the present day, the US dollar has been the dominant currency in the world trade. However, the rise of the Chinese economy led recently to the emergence of trade transactions in Chinese yuan. Here, we analyze mathematically how the structure of the international trade flows would favor a country to trade whether in US dollar or in Chinese yuan. The computation of the trade currency preference is based on the world trade network built from the 2010-2020 UN Comtrade data. The preference of a country to trade in US dollar or Chinese yuan is determined by two multiplicative factors: the relative weight of trade volume exchanged by the country with its direct trade partners, and the relative weight of its trade partners in the global international trade. The performed analysis, based on Ising spin interactions on the world trade network, shows that, from 2010 to present, a transition took place, and the majority of the world countries would have now a preference to trade in Chinese yuan if one only consider the world trade network structure.
This paper reports the first multi-channel joint analysis to identify the properties of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ via the electron-positron annihilation process $e^{+}e^{-}\toπ^{+}T_{c\bar{c}}(4020)^{-}+c.c$. A partial wave analysis is performed simultaneously in three decay channels $T_{c\bar{c}}(4020)^{-}\to {D}^{*0}D^{*-}$, $π^{-}J/ψ$, and $π^{-}h_{c}$, based on data samples taken at $\sqrt{s}=4.395$ and $4.416\,\mathrm{GeV}$ with an integrated luminosity of $1598.9\,\mathrm{pb}^{-1}$ collected with the BESIII detector operating on the BEPCII collider. For the first time, the spin-parity of the $T_{c\bar{c}}(4020)^{-}$ is determined to be $J^{P}=1^{+}$ with a significance $11.7σ$. Pole positions are extracted on the Riemann sheets with three branch points in the complex energy plane. Furthermore, the relative branching fractions are obtained as $\mathcal{B}[T_{c\bar{c}}(4020)^{-}\toπ^{-}J/ψ]/\mathcal{B}[T_{c\bar{c}}(4020)^{-}\to{D}^{*0}D^{*-}]=(3.6\pm0.6\pm1.6)\times10^{-3}$ and $\mathcal{B}[T_{c\bar{c}}(4020)^{-}\toπ^{-}h_{c}]/\mathcal{B}[T_{c\bar{c}}(4020)^{-}\to{D}^{*0}D^{*-}]=(8.9\pm1.3\pm2.3)\times10^{-2}$, where the first uncertainties are stati
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date.
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages.
We search for the reaction channel $e^+ e^- \to ηη\,J/ψ$ in a data sample with center-of-mass energies from 4.226 to 4.950~GeV, which was collected by the BESIII detector operating at the Beijing Electron Positron Collider (BEPCII). The data analysis is performed with two different methods, exclusively and semi-inclusively, which enabling a comparison and combination of the results. Only in a few cases a statistical significance of the cross sections with more than $3σ$ is observed with one of the methods. Only at 4.750~GeV the significance of the cross section measurement is 8.9$σ$ (observation) with the exclusive analysis and 3.4$σ$ (evidence) with the semi-inclusive analysis. Therefore, the corresponding upper limits of the cross section at the 90% confidence level are determined. The energy dependent results show clear deviations from the the line shape expected from three-body phase space alone. Since the statistical significance for almost all center-of-mass energies is low, the upper limits for the reaction channel $e^+ e^- \to ηη\,J/ψ$ also serve as limits for the existence of a possible isospin partner to the charmonium-like isospin triplet $Z_{\rm c}(3900)$ which decays t
We study a joint routing-assignment optimization problem in which a set of items must be paired one-to-one with a set of placeholders while simultaneously determining a Hamiltonian cycle that visits every node exactly once. Both the assignment and routing decisions are optimized jointly to minimize the total travel cost. In this work, we propose a method to solve this problem using an exact MIP formulation with Gurobi, including cutting-plane subtour elimination. With analysis of the computational complexity and through extensive experiments, we analyze the computational limitations of this approach as the problem size grows and reveal the challenges associated with the need for more efficient algorithms for larger instances. The dataset, formulations, and experimental results provided here can serve as benchmarks for future studies in this research area. GitHub repository: https://github.com/QL-YUAN/Joint-Assignment-Routing-Optimization
The indicator matrix plays an important role in machine learning, but optimizing it is an NP-hard problem. We propose a new relaxation of the indicator matrix and prove that this relaxation forms a manifold, which we call the Relaxed Indicator Matrix Manifold (RIM manifold). Based on Riemannian geometry, we develop a Riemannian toolbox for optimization on the RIM manifold. Specifically, we provide several methods of Retraction, including a fast Retraction method to obtain geodesics. We point out that the RIM manifold is a generalization of the double stochastic manifold, and it is much faster than existing methods on the double stochastic manifold, which has a complexity of \( \mathcal{O}(n^3) \), while RIM manifold optimization is \( \mathcal{O}(n) \) and often yields better results. We conducted extensive experiments, including image denoising, with millions of variables to support our conclusion, and applied the RIM manifold to Ratio Cut, we provide a rigorous convergence proof and achieve clustering results that outperform the state-of-the-art methods. Our Code in \href{https://github.com/Yuan-Jinghui/Riemannian-Optimization-on-Relaxed-Indicator-Matrix-Manifold}{here}.
This paper re-organizes Vojta's proof of the Mordell conjecture (i.e. Faltings' theorem) in terms of Arakelov geometry. A new ingredient is to replace an application of Gillet--Soule's arithmetic Riemannn--Roch theorem by that of Yuan's arithmetic Siu inequality.
Vision-Language Models (VLM) exhibit strong reasoning capabilities, showing promise for end-to-end autonomous driving systems. Chain-of-Thought (CoT), as VLM's widely used reasoning strategy, is facing critical challenges. Existing textual CoT has a large gap between text semantic space and trajectory physical space. Although the recent approach utilizes future image to replace text as CoT process, it lacks clear planning-oriented objective guidance to generate images with accurate scene evolution. To address these, we innovatively propose MindDriver, a progressive multimodal reasoning framework that enables VLM to imitate human-like progressive thinking for autonomous driving. MindDriver presents semantic understanding, semantic-to-physical space imagination, and physical-space trajectory planning. To achieve aligned reasoning processes in MindDriver, we develop a feedback-guided automatic data annotation pipeline to generate aligned multimodal reasoning training data. Furthermore, we develop a progressive reinforcement fine-tuning method to optimize the alignment through progressive high- level reward-based learning. MindDriver demonstrates superior performance in both nuScences
Methods based on diffusion backbones have recently revolutionized novel view synthesis (NVS). However, those models require pretrained 2D diffusion checkpoints (e.g., Stable Diffusion) as the basis for geometrical priors. Since such checkpoints require exorbitant amounts of data and compute to train, this greatly limits the scalability of diffusion-based NVS models. We present Next-Scale Autoregression Conditioned by View (ArchonView), a method that significantly exceeds state-of-the-art methods despite being trained from scratch with 3D rendering data only and no 2D pretraining. We achieve this by incorporating both global (pose-augmented semantics) and local (multi-scale hierarchical encodings) conditioning into a backbone based on the next-scale autoregression paradigm. Our model also exhibits robust performance even for difficult camera poses where previous methods fail, and is several times faster in inference speed compared to diffusion. We experimentally verify that performance scales with model and dataset size, and conduct extensive demonstration of our method's synthesis quality across several tasks. Our code is open-sourced at https://github.com/Shiran-Yuan/ArchonView.
Using an $e^+e^-$ sample of $20.3\,\rm fb^{-1}$ collected at the center-of-mass energy $\sqrt{s}=$ 3.773 GeV with the BESIII detector, we report measurements of several four-body hadronic decays of the $D$ mesons. The absolute branching fractions are determined to be ${\mathcal B}(D^0\to K^0_S K^+K^-π^0 )=( 18.4^{+2.6}_{-2.5}\pm 2.4)\times 10^{-5}$, ${\mathcal B}(D^0\to K^0_S K^0_S K^-π^+ )=( 12.9^{+1.7}_{-1.6}\pm 2.5)\times 10^{-5}$, ${\mathcal B}(D^0\to K^0_S K^0_S K^+π^-)=(5.7^{+1.2}_{-1.1}\pm 1.3)\times 10^{-5}$, ${\mathcal B}(D^0\to K^+K^-K^-π^+ )=(17.4^{+1.8}_{-1.7}\pm { 2.2})\times 10^{-5}$, and ${\mathcal B}(D^+\to K^0_S K^+K^-π^+)=(13.8^{+2.4}_{-2.2}\pm 2.5)\times 10^{-5}$. Furthermore, significant $φ$ signals are found in the decay channels involving $K^+K^-$ pair, and the corresponding branching fractions are measured as ${\mathcal B}(D^0\to φK^0_Sπ^0 )=( 22.7^{+5.4}_{-5.1}\pm 3.7)\times 10^{-5}$, ${\mathcal B}(D^0\to φK^-π^+ )=(25.2^{+3.5}_{-3.3}\pm 4.6)\times 10^{-5}$, ${\mathcal B}(D^+\to φK^0_Sπ^+)=(16.5 ^{+6.0}_{-5.3}\pm 2.6 )\times 10^{-5}$. The branching fractions of $D^0\to K^0_S K^+K^-π^0$, $D^0\to φK^0_Sπ^0$, and $D^+\to φK^0_S π^+$ are measured for the first t
We analyze the decay of $η\rightarrow \ell^+\ell^-(\ell=e, μ)$ via $J/ψ\rightarrowγη'$ and $η'\rightarrowπ^+π^-η$, based on (10087 $\pm$ 44) $\times$ 10$^{6}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage rings. The branching fraction of $η\rightarrowμ^+ μ^-$ is measured to be $(5.8 \pm 1.0_{\rm stat} \pm 0.2_{\rm syst}) \times 10^{-6}$, which is consistent with the previous measurements and theoretical expectations. In addition, no significant $η\to e^+e^-$ signal is observed in the $e^+ e^-$ invariant mass spectrum, and an improved upper limit of ${\cal B}(η\to e^+ e^-) < 2.2 \times 10^{-7}$ is set at 90\% confidence level.
The study of the charmed baryons is crucial for investigating the strong and weak interactions in the Standard Model and for gaining insights into the internal structure of baryons. In an $e^+e^-$ experiment the lightest charmed baryon, $Λ_c^+$, can be produced in pairs through the single photon annihilation process. This process can be described by two complex electromagnetic form factors. The presence of a non-zero relative phase between these form factors gives rise to a transverse polarization of the charmed baryon and provides additional constraints on the dynamic parameters in the decays. In this article, we present the first observation of the transverse polarization of $Λ_{c}^{+}$ in the reaction $e^+e^- \to Λ_c^{+}\barΛ_c^-$, based on $6.4~\text{fb}^{-1}$ of $e^{+}e^{-}$ annihilation data collected at center-of-mass energies between $4600$ MeV and $4951$ MeV with the BESIII detector. The decay asymmetry parameters in the decays $Λ_c^+ \to pK_S^0$, $Λπ^+$, $Σ^0π^+$, and $Σ^+π^0$ are simultaneously extracted from the joint angular distributions. From these parameters, both the weak and strong phase shifts are extracted, and several $C\!P$ observables are tested. The obtained
This paper works on heuristic solver for joint assignment and routing optimization problem. Study on previous works shows that MIP based exact solvers can only provide efficient solutions for small to moderate size problems, due to exponentially growing computational complexity. This paper proposes to start with high quality initial guess through Hungarian algorithm based assignment and heuristic cycle merging algorithm. Subsequently, the solution is improved based on a proposed shaking algorithm to improve the assignment and routing sequence. In addition, the shaking approach also enables the Simulated Annealing algorithm to further improve the solution, which is very difficult if it is purely based on random sampling updates of item and placeholder sequences. Extensive experimental validation comparing with ground truth from the previously shared database shows that the introduced solver is much more efficient than the Gurobi solver especially for large size problems, with a 1000 node pair problem being solved within 1 min in Python implementation. The solution accuracy is within a percent in general as compared with ground truth in database. Although there are spaces for the pro
Federated Learning (FL) presents a promising avenue for collaborative model training among medical centers, facilitating knowledge exchange without compromising data privacy. However, vanilla FL is prone to server failures and rarely achieves optimal performance on all participating sites due to heterogeneous data distributions among them. To overcome these challenges, we propose Gossip Contrastive Mutual Learning (GCML), a unified framework to optimize personalized models in a decentralized environment, where Gossip Protocol is employed for flexible and robust peer-to-peer communication. To make efficient and reliable knowledge exchange in each communication without the global knowledge across all the sites, we introduce deep contrast mutual learning (DCML), a simple yet effective scheme to encourage knowledge transfer between the incoming and local models through collaborative training on local data. By integrating DCML with other efforts to optimize site-specific models by leveraging useful information from peers, we evaluated the performance and efficiency of the proposed method on three publicly available datasets with different segmentation tasks. Our extensive experimental r
Using $e^+e^-$ collision data at 19 center-of-mass energies ranging from $4.396$ to $4.951~\mathrm{GeV}$ corresponding to a total integrated luminosity of $8.86~{\rm fb}^{-1}$ collected by the BESIII detector, the process $e^+e^-\to K^{0}K^-π^+ J/ψ+c.c.$ is observed for the first time, with a statistical significance of $9.4σ$ summing up all the data samples. For this process, the cross section and the upper limit at the $90\%$ confidence level are reported at each of the 19 center-of-mass energies.~No statistically significant vector structures are observed in the cross section line shape, nor are any intermediate states of $Kπ$, $K\bar{K}$, $K\bar{K}π$, $KJ/ψ$, $πJ/ψ$, and $KπJ/ψ$ seen at individual energy points or in the combined data sample.
The electromagnetic structure of the nucleon can be determined from the scattering of electrons off a nucleon target. However, to study its axial structure, neutrino beams are required. The results from these experiments should be extrapolated to zero energy-momentum transfers to access the static properties of the nucleon. For baryons with strange quarks, hyperons, the static limit can instead be approached in semi-leptonic decays, which give direct access to the weak magnetism and axial-vector coupling strengths that are inaccessible in electromagnetic interactions. The axial-vector coupling as while weak magnetism coupling and the overall normalization, given by form factor $f_1$, are being determined with increased precision from the theory of strong interactions using a first principles formulation on the space--time lattice. Furthermore, the probability of the semi-leptonic hyperon decay is approximately proportional to $|V_{us}|^2\cdot (f_1^2+3g_1^2)$, where $V_{us}$ is the CKM matrix element responsible for the transition between an $s$ and a $u$ quark. Current determinations of $|V_{us}|$ come from kaon decays, but the results are not consistent and could indicate a deviat
A novel measurement technique of strong-phase differences between the decay amplitudes of $D^0$ and $\bar{D}^0$ mesons is introduced which exploits quantum-correlated $D\bar{D}$ pairs produced by $e^+e^-$ collisions at energies above the $ψ(3770)$ production threshold, where $D\bar{D}$ pairs are produced in both even and odd eigenstates of the charge-conjugation symmetry. Employing this technique, the first determination of a $D^0$-$\bar{D^0}$ relative strong phase is reported with such data samples. The strong-phase difference between $D^0\to K^-π^+$ and $\bar{D}^0\to K^-π^+$ decays, $δ^{D}_{Kπ}$, is measured to be $δ^{D}_{Kπ}=\left(192.8^{+11.0 + 1.9}_{-12.4 -2.4}\right)^\circ$, using a dataset corresponding to an integrated luminosity of 7.13 $\text{fb}^{-1}$ collected at center-of-mass energies between $4.13-4.23 \text{ GeV}$ by the BESIII experiment.
Using $(10087\pm44)\times10^{6}$$J/ψ$ events collected with the BESIII detector operating at the BEPCII storage ring in $2009$, $2012$, $2018$, and $2019$, we perform a search for the reaction $Ξ^0n\rightarrowΛΛX$, where $X$ denotes any additional final particles. Given the highly suppressed phase space for producing extra pions, the $X$ consists of either nothing or a photon, corresponding to the processes $Ξ^0 n \rightarrow ΛΛ$ and $Ξ^{0}n\rightarrowΛΣ^0\rightarrowΛΛγ$. The $Ξ^0$ comes from the decay of $J/ψ\rightarrowΞ^0\barΞ^0$, while the neutron originates from material of the beam pipe. A signal is observed for the first time with a statistical significance of 6.4$σ$. The cross section for the reaction $Ξ^0+{^9\rm{Be}}\rightarrowΛ+Λ+X$ is measured to be $(43.6\pm10.5_{\text{stat}}\pm11.1_{\text{syst}})$ mb at $P_{Ξ^0}\approx0.818$ GeV/$c$, where the first uncertainty is statistical and the second systematic. No significant $H$-dibaryon signal is observed in the $ΛΛ$ final state.