共找到 20 条结果
Residual connections are central to modern deep learning architectures, enabling the training of very deep networks by mitigating gradient vanishing. Hyper-Connections recently generalized residual connections by introducing multiple connection strengths at different depths, thereby addressing the seesaw effect between gradient vanishing and representation collapse. However, Hyper-Connections increase memory access costs by expanding the width of hidden states. In this paper, we propose Frac-Connections, a novel approach that divides hidden states into multiple parts rather than expanding their width. Frac-Connections retain partial benefits of Hyper-Connections while reducing memory consumption. To validate their effectiveness, we conduct large-scale experiments on language tasks, with the largest being a 7B MoE model trained on up to 3T tokens, demonstrating that Frac-Connections significantly outperform residual connections.
The theory of connections is at the very core of differential geometry. Discovered by Levi-Civita and Christoffel and later studied by Cartan, Koszul, and others, connections appear in their most general form under the name of Ehresmann connections. An Ehresmann connection consists of a splitting of the tangent bundle of a submersion into the vertical sub-bundle and a given horizontal distribution. In this paper, we generalize Ehresmann connection to a categorical setting called tangent categories. Initially introduced by Rosický in 1984 and later generalized by Cockett and the first author in 2014, tangent categories provide a categorical framework to study geometry that extends well beyond smooth manifolds, including algebraic geometry and non-commutative geometry. In this paper we introduce and study Ehresmann connections in the context of tangent categories. We give various equivalent formulations in term of full and abstract connections and prove that they generalize Koszul connections. We also define parallel transport and curvature for such connections, and prove the structural equation and the Bianchi identity for the curvature.
For Hermitian connections on a Hermitian complex line bundle over a Riemannian manifold $(X,g)$, we can define the ``volume", which can be considered to be the ``mirror" of the standard volume for submanifolds. We call the critical points minimal connections. In this paper, (1) we prove monotonicity formulas for minimal connections with respect to some versions of volume functionals under certain conditions on $\dim X$ and the curvature of $g$. These formulas would be important in bubbling analysis. As a corollary, we obtain the vanishing theorem for minimal connections on the odd dimensional Euclidean space. (2) We see that the formal ``large radius limit" of the defining equation of minimal connections is that of Yang--Mills connections. Then the existence theorem of minimal connections is proved for a ``sufficiently large" metric. (3) We can consider deformed Donaldson--Thomas (dDT) connections on $G_2$-manifolds as ``mirrors" of calibrated (associative) submanifolds. We show that dDT connections are minimal connections, just as calibrated submanifolds are minimal submanifolds. By the argument specific to dDT connections, we obtain the stronger monotonicity formulas and vanishin
We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.
Endocrine-disrupting chemicals (EDCs) threaten human health, ecosystems, and biodiversity by interfering with hormonal signaling pathways conserved across vertebrates. Traditional in vivo assays are costly and time-consuming, limiting their capacity to screen the growing number of chemicals. To address this, we developed a deep learning-based QSAR model to predict estrogen receptor (ER) binding molecules. Using a curated dataset of 224 compounds and 2,944 molecular descriptors and fingerprints, a deep neural network (DNN) incorporating dropout and batch normalization was trained and validated. The model achieved training and test accuracies of 96.65% and 91.30%, respectively, with an ROC-AUC of 0.81, a precision of 0.82, and a recall of 0.88 for the active class. Molecular docking against estrogen receptor (PDB ID: 5TOA) confirmed that several predicted compounds exhibited binding comparable to Estradiol, sharing key interactions. This model enables rapid screening of potential EDCs, supporting efficient chemical risk assessment and contributing to biodiversity conservation by identifying compounds that may disrupt reproduction and population stability in humans and wildlife.
We examine the heap of linear connections on anchored vector bundles and Lie algebroids. Naturally, this covers the example of affine connections on a manifold. We present some new interpretations of classical results via this ternary structure of connections. Endomorphisms of linear connections are studied, and their ternary structure, in particular the endomorphism truss, is explicitly presented. We remark that the use of ternary structures in differential geometry is novel and that the endomorphism truss of linear connections provides a concrete geometric example of a truss.
We find an explicit formula for the gamma vector in terms of the input polynomial in a way that extends it to arbitrary polynomials. More specifically, we find explicit linear combination in terms of coefficients of the input polynomial (using Catalan numbers and binomial coefficients) and an expression involving the derivative of the input polynomial. The first expression suggests connections to common Coxeter group/noncrossing partition structures in existing gamma positivity examples. In the case where the input is the $h$-polynomial of a simplicial complex, this gives an interpretation of the gamma vector as a measure of differences in local and global contributions. We also apply them to connect signs/inequalities of (shifts of) the gamma vector to upper/lower bound conditions on coefficients of the input polynomial. Finally, we make use of the shape of the sums used to make these estimates and connections with intersection numbers to relate these properties of the gamma vector to algebraic structures (e.g. characteristic classes involved in existing log concavity and Schur positivity properties).
We study the relation between torsion tensors of principal connections on G-structures and characteristic conic connections on associated cone structures. We formulate sufficient conditions under which the existence of a characteristic conic connection implies the existence of a torsion-free principal connection. We verify these conditions for adjoint varieties of simple Lie algebras, excluding those of type $\textsf{A}_{\ell eq 2}$ or $\textsf{C}_{\ell}$. As an application, we give a complete classification of the germs of minimal rational curves whose VMRT at a general point is such an adjoint variety: nontrivial ones come from lines on hyperplane sections of certain Grassmannians or minimal rational curves on wonderful group compactifications.
For bounded lattices, we introduce certain Galois connections, called (cyclically) essential, retractable and UC Galois connections, which behave well with respect to concepts of module-theoretic nature involving essentiality. We show that essential retractable Galois connections preserve uniform dimension, whereas essential retractable UC Galois connections induce a bijective correspondence between sets of closed elements. Our results are applied to suitable Galois connections between submodule lattices. Cyclically essential Galois connections unify semi-projective and semi-injective modules, while retractable Galois connections unify retractable and coretractable modules.
Around 1923, Elie Cartan introduced affine connections on manifolds and definedthe main related concepts: torsion, curvature, holonomy groups. He discussed applications of these concepts in Classical and Relativistic Mechanics; in particular he explained how parallel transport with respect to a connection can be related to the principle of inertia in Galilean Mechanics and, more generally, can be used to model the motion of a particle in a gravitational field. In subsequent papers, Elie Cartan extended these concepts for other types of connections on a manifold: Euclidean, Galilean and Minkowskian connections which can be considered as special types of affine connections, the group of affine transformations of the affine tangent space being replaced by a suitable subgroup; and more generally, conformal and projective connections, associated to a group which is no more a subgroup of the affine group. Around 1950, Charles Ehresmann introduced connections on a fibre bundle and, when the bundle has a Lie group as structure group, connection forms on the associated principal bundle, with values in the Lie algebra of the structure group. He called Cartan connections the various types of
We develop a new perspective on principal bundles with connection as morphisms from the tangent bundle of the underlying manifold to a classifying dg-Lie groupoid. This groupoid can be identified with a lift of the inner homomorphisms groupoid arising in Ševera's differentiation procedure of Lie quasi-groupoids. Our new perspective readily extends to principal groupoid bundles, but requires an adjustment, an additional datum familiar from higher gauge theory. We show that for Lie groupoids, the additional adjustment data amounts to a Cartan connection. The resulting adjusted connections naturally provide a global formulation of the kinematical data of curved Yang-Mills-Higgs theories as described by Kotov-Strobl (arXiv:1510.07654) and Fischer (arXiv:2104.02175).
This article is an overview of the results obtained in recent years on symplectic connections. We present what is known about preferred connections (critical points of a variational principle). The class of Ricci-type connections (for which the curvature is entirely determined by the Ricci tensor) is described in detail, as well as its far reaching generalization to special connections. A twistorial construction shows a relation between Ricci-type connections and complex geometry. We give a construction of Ricci-flat symplectic connections. We end up by presenting, through an explicit example, an approach to noncommutative symplectic symmetric spaces.
We consider smooth families of Lie groups (group bundles) and connections that are compatible with the group operation. We characterize the space of group connections on a group bundle as an affine space modeled over the vector space of $1$-forms with values cocycles in the Lie algebra bundle of the aforementioned group bundle. We show that group connections satisfy the Ambrose-Singer theorem and that group bundles can be seen as a particular case of associated bundles realizing group connections as associated connections. We give a construction of the Moduli space of group connections with fixed base and fiber, as an space of representations of the fundamental group of the base.
We define and study multiplicative connections in the tangent bundle of a Lie groupoid. Multiplicative connections are linear connections satisfying an appropriate compatibility with the groupoid structure. Our definition is natural in the sense that a linear connection on a Lie groupoid is multiplicative if and only if its torsion is a multiplicative tensor in the sense of Bursztyn-Drummond [5] and its geodesic spray is a multiplicative vector field. We identify the obstruction to the existence of a multiplicative connection. We also discuss the infinitesimal version of multiplicative connections in the tangent bundle, that we call infinitesimally multiplicative (IM) connections and we prove an integration theorem for IM connections. Finally, we present a few toy examples.
We shall give a twisted Dirac structure on the space of irreducible connections on a SU(n)-bundle over a three-manifold, and give a family of twisted Dirac structures on the space of irreducible connections on the trivial SU(n)-bundle over a four-manifold. The twist is described by the Cartan 3-form on the space of connections. It vanishes over the subspace of flat connections. So the spaces of flat connections are endowed with ( non-twisted ) Dirac structures. The Dirac structure on the space of flat connections over the three-manifold is obtained as the boundary restriction of a corresponding Dirac structure over the four-manifold. We discuss also the action of the group of gauge transformations over these Dirac structures.
The notion of an odd quasi-connection on a supermanifold, which is loosely an affine connection that carries non-zero Grassmann parity, is examined. Their torsion and curvature are defined, however, in general, they are not tensors. A special class of such generalised connections, referred to as odd connections in this paper, have torsion and curvature tensors. Part of the structure is an odd involution of the tangent bundle of the supermanifold and this puts drastic restrictions on the supermanifolds that admit odd connections. In particular, they must have equal number of even and odd dimensions. Amongst other results, we show that an odd connection is defined, up to an odd tensor field of type $(1,2)$, by an affine connection and an odd endomorphism of the tangent bundle. Thus, the theory of odd connections and affine connections are not completely separate theories. As an example relevant to physics, it is shown that $N= 1$ super-Minkowski spacetime admits a natural odd connection.
In general relativity, the gravitational potential is represented by the Levi-Civita connection, the only symmetric connection preserving the metric. On a differentiable manifold, a metric identifies with an orthogonal structure, defined as a Lorentz reduction of the frame bundle. The Levi-Civita connection appears as the only symmetric connection preserving the reduction. This paper presents generalization of this process to other aproaches of gravitation: Weyl structure with Weyl connections, teleparallel structures with Weitzenbock connections, unimodular structure, similarly appear as frame bundle reductions, with preserving connections. To each subgroup H of the linear group GL correspond reduced structures, or H-structures. They are subbundles of the frame bundle (with GL as principal group), with H as principal group. A linear connection in a manifold M is a principal connection on the frame bundle. Given a reduction, the corresponding preserving connections on M are the linear connections which preserve it. I also show that the time gauge used in the 3+1 formalism for general relativity similarly appears as the result of a bundle reduction.
A gauge invariant notion of a strong connection is presented and characterized. It is then used to justify the way in which a global curvature form is defined. Strong connections are interpreted as those that are induced from the base space of a quantum bundle. Examples of both strong and non-strong connections are provided. In particular, such connections are constructed on a quantum deformation of the fibration $S^2 -> RP^2$. A certain class of strong $U_q(2)$-connections on a trivial quantum principal bundle is shown to be equivalent to the class of connections on a free module that are compatible with the q-dependent hermitian metric. A particular form of the Yang-Mills action on a trivial $U\sb q(2)$-bundle is investigated. It is proved to coincide with the Yang-Mills action constructed by A.Connes and M.Rieffel. Furthermore, it is shown that the moduli space of critical points of this action functional is independent of q.
Recently the present authors introduced a general class of Finsler connections which leads to a smart representation of connection theory in Finsler geometry and yields to a classification of Finsler connections into the three classes. Here the properties of one of these classes namely the Berwald-type connections which contains Berwald and Chern(Rund) connections as a special case is studied. It is proved among the other that the hv-curvature of these connections vanishes if and only if the Finsler space is a Berwald one. Some applications of this connection is discussed.
An axiomatic theory of operator connections and operator means was investigated by Kubo and Ando in 1980. A connection is a binary operation for positive operators satisfying the monotonicity, the transformer inequality and the joint-continuity from above. In this paper, we show that the joint-continuity assumption can be relaxed to some conditions which are weaker than the separate-continuity. This provides an easier way for checking whether a given binary opertion is a connection. Various axiomatic characterizations of connections are obtained. We show that the concavity is an important property of a connection by showing that the monotonicity can be replaced by the concavity or the midpoint concavity. Each operator connection induces a unique scalar connection. Moreover, there is an affine order isomorphism between connections and induced connections. This gives a natural viewpoint to define any named means.