搜索 — ResearchTracker

Suppose $\widehatθ_n$ is a strongly consistent estimator for $θ_0$ in some i.i.d. situation. Let $N_\varepsilon$ and $Q_\varepsilon$ be respectively the last $n$ and the total number of $n$ for which $\widehatθ_n$ is at least $\varepsilon$ away from $θ_0$. The limit distributions for ${\varepsilon}^2 N_\varepsilon$ and ${\varepsilon}^2 Q_\varepsilon$ as $\varepsilon$ goes to zero are obtained under natural and weak conditions. The theory covers both parametric and nonparametric cases, multi-dimensional parameters, and general distance functions. Our results are of probabilistic interest, and, on the statistical side, suggest ways in which competing estimators can be compared. In particular several new optimality properties for the maximum likelihood estimator sequence in parametric families are established. Another use of our results is ways of constructing sequential fixed-volume or shrinking-volume confidence sets, as well as sequential tests with power 1. The paper also includes limit distribution results for the last $n$ and the number of $n$ for which the supremum distance $\|F_n-F\|\ge\varepsilon$, where $F_n$ is the empirical distribution function. Yet other results are reac

Closed-Form Last Layer Optimization

arXiv2025-10-06作者：Alexandre Galashov, Nathaël Da Costa, Liyuan Xu

Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the last layer as a function of the backbone parameters, and optimizing solely for these parameters. We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer. We adapt the method for the setting of stochastic gradient descent, by trading off the loss on the current batch against the accumulated information from previous batches. We provide theoretical analyses showing convergence of the method to an optimal solution in the neural tangent kernel regime, as well as quantifying the gains compared to standard SGD in a one-step analysis. Finally, we demonstrate the effectiveness of our approach compared with SGD and Adam on a squared loss in several regression tasks, including neural operators and causal inference.

搜索结果：last

On the last time and the number of times an estimator is more than epsilon from its target value

Closed-Form Last Layer Optimization

Agents' Last Exam

The Last Meridian Circles of Pulkovo Observatory

Variational Bayesian Last Layers

Near-existence of bigeodesics in dynamical exponential last passage percolation

Understanding Fermat's Last Theorem's Proofs

Geodesic switches and exceptional times in dynamical Brownian last passage percolation

The Last Success Problem with Samples

Foreign Exchange Markets with Last Look

Fano's Last Fano

The last sunset on mainland Europe

Hawaii is turning ocean plastic and fishing nets into roads

Measuring the Duration of Last Scattering

Pfaffian Schur processes and last passage percolation in a half-quadrant

Holomorphic last multipliers on complex manifolds

Riemann-Hilbert problems for last passage percolation

On the last digit and the last non-zero digit of $n^n$ in base $b$

On the last digits of tetrations of base $2^{k}$ and $5^{k}$

Stochastic analysis for the Dirichlet--Ferguson process