搜索结果：yuan

共找到 20 条结果

高级筛选 ▾

Yuan 2.0-M32: Mixture of Experts with Attention Router

arXiv2024-05-28作者：Shaohua Wu, Jiangang Luo, Xi Chen

Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the training computation consumption is only 9.25% of a dense model at the same parameter scale. Yuan 2.0-M32 demonstrates competitive capability on coding, math, and various domains of expertise, with only 3.7B active parameters of 40B in total, and 7.4 GFlops forward computation per token, both of which are only 1/19 of Llama3-70B. Yuan 2.0-M32 surpass Llama3-70B on MATH and ARC-Challenge benchmark, with accuracy of 55.89 and 95.8 respectively. The models and source codes of Yuan 2.0-M32 are released at Github1.

Strong counterexamples to a supersaturation question of Ma-Yuan

arXiv2026-06-08作者：Wanfang Chen, Long-Tu Yuan

For a graph $F$, let $h_F(n,q)$ be the minimum number of copies of $F$ in an $n$-vertex graph with $\mathrm{ex}(n,F)+q$ edges, where $\mathrm{ex}(n,F)$ is the maximum number of edges in an $n$-vertex $F$-free graph. Let $c(n,F)$ be the minimum number of copies obtained by adding one edge to an extremal $F$-free graph. Mubayi's supersaturation conjecture predicts, under a stability hypothesis, that $h_F(n,q)\ge q\,c(n,F)$. Ma and Yuan recently constructed stable graph counterexamples for every fixed $q\ge4$; they asked whether the one-edge equality $h_F(n,1)=c(n,F)$ might still hold for every graph $F$ containing a cycle. We give a negative answer to their question. For each integer $t\ge6$, let $H_t$ be obtained from the $t$-vertex path by replacing each edge with a $3t$-page book, using disjoint page vertices for different path edges. Then $h_{H_t}(n,1)<c(n,H_t)$ for infinitely many values of $n$. Moreover, by taking $t$ large, the ratio $h_{H_t}(n,1)/c(n,H_t)$ can be made arbitrarily small along infinitely many values of $n$.

搜索结果：yuan

Yuan 2.0-M32: Mixture of Experts with Attention Router

Strong counterexamples to a supersaturation question of Ma-Yuan

Quantitativity in the Mordell Conjecture

SP-Mind: An Autonomous Reasoning Agent for Spatial Proteomics Analysis

Observation of an Altered $a_{0}(980)$ Line shape in $D^{+} \rightarrow π^{+}ηη$

Search for the reaction channel $e^+ e^- \to ηη\,J/ψ$ and the isospin partner of the $Z_c(3900)$ at center-of-mass energies $\sqrt{s} = 4.226-4.950$ GeV

Multi-channel joint analysis of the exotic charmonium-like state $T_{c\bar{c}}(4020)$

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

On Vojta's proof of the Mordell conjecture

Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach

Riemannian Optimization on Relaxed Indicator Matrix Manifold

Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers

Assignment-Routing Optimization with Cutting-Plane Subtour Elimination: Solver and Benchmark Dataset

Measurement of the branching fraction of $η\to μ^+ μ^-$ and search for $η\to e^+ e^-$

Search for a bound state of $Λ_{c}\barΣ_{c}$ near threshold

Assignment-Routing Optimization : Efficient Heuristic Solver with Shaking Algorithm

Study of the Magnetic Dipole Transition of $J/ψ\toγη_c$ via $η_c\to p\bar{p}$

Determination of CKM matrix element and axial vector form factors from weak decays of quantum-entangled strange baryons

Amplitude analysis and branching fraction measurement of the decay $D^0 \to K^0_Sπ^0π^0$

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities