Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that
On October 27th, 2022, Elon Musk purchased Twitter, becoming its new CEO and firing many top executives in the process. Musk listed fewer restrictions on content moderation and removal of spam bots among his goals for the platform. Given findings of prior research on moderation and hate speech in online communities, the promise of less strict content moderation poses the concern that hate will rise on Twitter. We examine the levels of hate speech and prevalence of bots before and after Musk's acquisition of the platform. We find that hate speech rose dramatically upon Musk purchasing Twitter and the prevalence of most types of bots increased, while the prevalence of astroturf bots decreased.
Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/η$, where $η$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish self-regulation near the edge but do not explain why the trajectory is forced toward $2/η$ from arbitrary initialization. We introduce the edge coupling, a functional on consecutive iterate pairs whose coefficient is uniquely fixed by the gradient-descent update. Differencing its criticality condition yields a step recurrence with stability boundary $2/η$, and a second-order expansion yields a loss-change formula whose telescoping sum forces curvature toward $2/η$. The two formulas involve different Hessian averages, but the mean value theorem localizes each to the true Hessian at an interior point of the step segment, yielding exact forcing of the Hessian eigenvalue with no gap. Setting both gradients of the edge coupling to zero classifies fixed points and period-two orbits; near a fixed point, the problem reduces to a function of the half-amplitude alone, which determines which directions support period-two orbits and on which side of the
We generalize the attention mechanism by viewing it through the lens of Entropic Optimal Transport, revealing that standard attention corresponds to a transport problem regularized by an implicit uniform prior. We introduce Generalized Optimal transport Attention with Trainable priors (GOAT), a new attention mechanism that replaces this naive assumption with a learnable, continuous prior. This prior maintains full compatibility with optimized kernels such as FlashAttention. GOAT also provides an EOT-based explanation of attention sinks and materializes a solution for them, avoiding the representational trade-offs of standard attention. Finally, by absorbing spatial information into the core attention computation, GOAT learns an extrapolatable prior that combines the flexibility of learned positional embeddings with the length generalization of fixed encodings.
We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's near-zero eigenvalues trap residual error in a test-invisible reservoir. Within the signal channel, minibatch SGD ensures that coherent population signal accumulates via fast linear drift, while idiosyncratic memorization is suppressed into a slow, diffusive random walk. We prove generalization survives even when the kernel evolves $\mathcal{O}(1)$ in operator norm, the full feature-learning regime. This theory naturally explains disparate phenomena in deep learning theory, such as benign overfitting, double descent, implicit bias, and grokking. Lastly, we derive an exact population-risk objective from a single training run with no validation data, for any architecture, loss, or optimizer, and prove that it measures precisely the noise in the signal channel. This objective reduces in practice to an SNR preconditioner on top of Adam, adding one state vector at no extra cost; it accelerates grokking by $5 \times
We sketch the proof of an effective equidistribution theorem for one-parameter unipotent subgroups in $S$-arithmetic quotients arising from $\mathbf K$-forms of $\mathrm{SL}_2^{\mathsf n}$ where $\mathbf K$ is a number field. This gives an effective version of equidistribution results of Ratner and Shah with a polynomial rate. The key new phenomenon is the existence of many intermediate groups between the $\mathrm{SL}_2$ containing our unipotent and the ambient group, which introduces potential local and global obstruction to equidistribution. Our approach relies on a Bourgain-type projection theorem in the presence of obstructions, together with a careful analysis of these obstructions.
This technical report presents the construction and analysis of polynomial navigation functions for motion planning in 3-D workspaces populated by spherical and cylindrical obstacles. The workspace is modeled as a bounded spherical region, and obstacles are encoded using smooth polynomial implicit functions. We establish conditions under which the proposed navigation functions admit a unique non-degenerate minimum at the target while avoiding local minima, including in the presence of pairwise intersecting obstacles. Gradient and Hessian analyses are provided, and the theoretical results are validated through numerical simulations in obstacle rich 3-D environments.
Large language models (LLMs) are commonly evaluated on tasks that test their knowledge or reasoning abilities. In this paper, we explore a different type of evaluation: whether an LLM can predict aspects of its own responses. Since LLMs lack the ability to execute themselves, we introduce the Self-Execution Benchmark, which measures a model's ability to anticipate properties of its output, such as whether a question will be difficult for it, whether it will refuse to answer, or what kinds of associations it is likely to produce. Our experiments show that models generally perform poorly on this benchmark, and that increased model size or capability does not consistently lead to better performance. These results suggest a fundamental limitation in how LLMs represent and reason about their own behavior.
We prove an effective equidistribution theorem for semisimple closed orbits on compact adelic quotients. The obtained error depends polynomially on the minimal complexity of intermediate orbits and the complexity of the ambient space. The proof uses dynamical arguments, property $(τ)$, Prasad's volume formula, an effective closing lemma, and a novel effective generation result for subgroups. The latter in turn relies on an effective version of Greenberg's theorem. We apply the above to the problem of establishing a local-global principle for representations of integral quadratic forms, improving the codimension assumptions and providing effective bounds in a theorem of Ellenberg and Venkatesh.
The manipulation of flexible objects such as cables, wires and fresh food items by robot hands forms a special challenge in robot grasp mechanics. This paper considers the steering of flexible linear objects in planar environments by two robot hands. The flexible linear object, modeled as an elastic non-stretchable rod, is manipulated by varying the gripping endpoint positions while keeping equal endpoint tangents. The flexible linear object shape has a closed form solution in terms of the grasp endpoint positions and tangents, called Euler's elastica. This paper obtains the elastica solutions under the optimal control framework, then uses the elastica solutions to obtain closed-form criteria for non self-intersection, stability and obstacle avoidance of the flexible linear object. The new tools are incorporated into a planning scheme for steering flexible linear objects in planar environments populated by sparsely spaced obstacles. The scheme is fully implemented and demonstrated with detailed examples.
We prove a dichotomy regarding the behavior of one-parameter unipotent flows on quotients of semisimple lie groups under time change. We show that if $u^{(1)}_t$ acting on $\mathbf{G}_{1}/Γ_1$ is such a flow it satisfies exactly one of the following: (1) The flow is loosely Kronecker, and hence measurably isomorphic after an appropriate time change to any other loosely Kronecker system. (2) The flow exhibits the following rigid behavior: if the one-parameter unipotent flow $u^{(1)} _ t$ on $\mathbf{G}_1/Γ_1$ is measurably isomorphic after time change to another such flow $u^{(2)} _ t$ on $\mathbf{G}_2/Γ_ 2$, then $\mathbf{G}_1/Γ_1 $ is isomorphic to $\mathbf{G}_2/ Γ_2$ with the isomorphism taking $u^{(1)}_t$ to $u^{(2)}_t$ and moreover the time change is cohomologous to a trivial one up to a renormalization.
This paper describes a method for steering deformable linear objects using two robot hands in environments populated by sparsely spaced obstacles. The approach involves manipulating an elastic inextensible rod by varying the gripping endpoint positions and tangents. Closed form solutions that describe the flexible linear object shape in planar environments, Euler's elastica, are described. The paper uses these solutions to formulate criteria for non self-intersection, stability and obstacle avoidance. These criteria are formulated as constraints in the flexible object six-dimensional configuration space that represents the robot gripping endpoint positions and tangents. In particular, this paper introduces a novel criterion that ensures the flexible object stability during steering. All safety criteria are integrated into a scheme for steering flexible linear objects in planar environments, which is lifted into a steering scheme in three-dimensional environments populated by sparsely spaced obstacles. Experiments with a dual-arm robot demonstrate the method.
We liberate Equilibrium Propagation (EP) from the limit of infinitesimal perturbations by establishing a finite-nudge foundation for local credit assignment. By modeling network states as Gibbs-Boltzmann distributions rather than deterministic points, we prove that the gradient of the difference in Helmholtz free energy between a nudged and free phase is exactly the difference in expected local energy derivatives. This validates the classic Contrastive Hebbian Learning update as an exact gradient estimator for arbitrary finite nudging, requiring neither infinitesimal approximations nor convexity. Furthermore, we derive a generalized EP algorithm based on the path integral of loss-energy covariances, enabling learning with strong error signals that standard infinitesimal approximations cannot support.
We establish effective equidistribution theorems, with a polynomial error rate, for orbits of unipotent subgroups in quotients of quasi-split, almost simple Linear algebraic groups of absolute rank 2. As an application, inspired by the results of Eskin, Margulis and Mozes, we establish quantitative results regarding the distribution of values of an indefinite ternary quadratic form at integer points, giving in particular an effective and quantitative proof of the Oppenheim Conjecture.
The scaled-dot-product attention (SDPA) mechanism is a core component of modern deep learning, but its mathematical form is often motivated by heuristics. This work provides a first-principles justification for SDPA. We first show that the attention forward pass is the exact solution to a degenerate, one-sided Entropic Optimal Transport (EOT) problem, which seeks a distribution that maximizes similarity while being maximally entropic. This optimization perspective has a direct consequence for the backward pass. We prove that the standard gradient computed via backpropagation is mathematically identical to an advantage-based policy gradient, a variance-reduced update rule from reinforcement learning. Crucially, we demonstrate that the EOT formulation of the forward pass induces a specific information geometry on the space of attention distributions. It is this geometry, characterized by the Fisher Information Matrix, that dictates the precise form of the learning gradient, revealing the advantage-based update as a natural consequence of the optimization problem being solved. This unified view reveals SDPA as a principled mechanism where the forward pass performs optimal inference an
Weakly supervised LiDAR semantic segmentation has made significant strides with limited labeled data. However, most existing methods focus on the network training under weak supervision, while efficient annotation strategies remain largely unexplored. To tackle this gap, we implement LiDAR semantic segmentation using scatter image annotation, effectively integrating an efficient annotation strategy with network training. Specifically, we propose employing scatter images to annotate LiDAR point clouds, combining a pre-trained optical flow estimation network with a foundation image segmentation model to rapidly propagate manual annotations into dense labels for both images and point clouds. Moreover, we propose ScatterNet, a network that includes three pivotal strategies to reduce the performance gap caused by such annotations. Firstly, it utilizes dense semantic labels as supervision for the image branch, alleviating the modality imbalance between point clouds and images. Secondly, an intermediate fusion branch is proposed to obtain multimodal texture and structural features. Lastly, a perception consistency loss is introduced to determine which information needs to be fused and whi
Compared to single-modal knowledge distillation, cross-modal knowledge distillation faces more severe challenges due to domain gaps between modalities. Although various methods have proposed various solutions to overcome these challenges, there is still limited research on how domain gaps affect cross-modal knowledge distillation. This paper provides an in-depth analysis and evaluation of this issue. We first introduce the Non-Target Divergence Hypothesis (NTDH) to reveal the impact of domain gaps on cross-modal knowledge distillation. Our key finding is that domain gaps between modalities lead to distribution differences in non-target classes, and the smaller these differences, the better the performance of cross-modal knowledge distillation. Subsequently, based on Vapnik-Chervonenkis (VC) theory, we derive the upper and lower bounds of the approximation error for cross-modal knowledge distillation, thereby theoretically validating the NTDH. Finally, experiments on five cross-modal datasets further confirm the validity, generalisability, and applicability of the NTDH.
We prove an effective closing lemma for unipotent flows on quotients of perfect real groups. This is largely motivated by recent developments in effective unipotent dynamics.
We present a new argument in the study of positive entropy measures for higher rank diagonalisable actions. The argument relies on a quantitative form of recurrence along unipotent directions (that are not known to preserve the measure). Using this argument we prove a classification of positive entropy measures for any higher rank action on an irreducible arithmetic quotient of a form of $SL_2$. We also provide an Adelic version of this classification result where no entropy assumption is needed. These results can also be used to prove new results regarding Diophantine approximations of integer multiples of an arbitrary element $α\in\mathbb{R}$.
We show that pair correlation function for the spectrum of a flat 2-dimensional torus satisfying an explicit Diophantine condition agrees with those of a Poisson process with a polynomial error rate. The proof is based on a quantitative equidistribution theorem and tools from geometry of numbers.