搜索 — ResearchTracker

In a cross-sectional study, adolescent and young adult females were asked to recall the time of menarche, if experienced. Some respondents recalled the date exactly, some recalled only the month or the year of the event, and some were unable to recall anything. We consider estimation of the menarcheal age distribution from this interval censored data. A~complicated interplay between age-at-event and calendar time, together with the evident fact of memory fading with time, makes the censoring informative. We propose a model where the probabilities of various types of recall would depend on the time since menarche. For parametric estimation we model these probabilities using multinomial regression function. Establishing consistency and asymptotic normality of the parametric MLE requires a bit of tweaking of the standard asymptotic theory, as the data format varies from case to case. We also provide a non-parametric MLE, propose a computationally simpler approximation, and establish the consistency of both these estimators under mild conditions. We study the small sample performance of the parametric and non-parametric estimators through Monte Carlo simulations. Moreover, we provide a

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

arXiv2026-02-15作者：Nitay Calderon, Eyal Ben-David, Zorik Gekhman

Standard factuality evaluations of LLMs treat all errors alike, obscuring whether failures arise from missing knowledge (empty shelves) or from limited access to encoded facts (lost keys). We propose a behavioral framework that profiles factual knowledge at the level of facts rather than questions, characterizing each fact by whether it is encoded, and then by how accessible it is: cannot be recalled, can be directly recalled, or can only be recalled with inference-time computation (thinking). To support such profiling, we introduce WikiProfile, a new benchmark constructed via an automated pipeline with a prompted LLM grounded in web search. Across 4 million responses from 13 LLMs, we find that encoding is nearly saturated in frontier models on our benchmark, with GPT-5 and Gemini-3 encoding 95--98% of facts. However, recall remains a major bottleneck: many errors previously attributed to missing knowledge instead stem from failures to access it. These failures are systematic and disproportionately affect long-tail facts and reverse questions. Finally, we show that thinking improves recall and can recover a substantial fraction of failures, indicating that future gains may rely les

搜索结果：recalled

Estimating menarcheal age distribution from partially recalled data

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Construction of a Neural Network with Temperature-Dependent Recall Patterns

Does Self-Consistency Improve the Recall of Encyclopedic Knowledge?

Semantic Recall for Vector Search

Generative Recall, Dense Reranking: Learning Multi-View Semantic IDs for Efficient Text-to-Video Retrieval

Efficient representations for team and imperfect-recall equilibrium computation

RECALL-MM: A Multimodal Dataset of Consumer Product Recalls for Risk Analysis using Computational Methods and Large Language Models

Language-agnostic, automated assessment of listeners' speech recall using large language models

OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

The Impact of Navigation Aids on Search Performance and Object Recall in Wide-Area Augmented Reality

Simplifying imperfect recall games

Decomposing Prediction Mechanisms for In-Context Recall

A Two-Phase Recall-and-Select Framework for Fast Model Selection

The Value of Recall in Extensive-Form Games

Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social Media

Microstructures and Accuracy of Graph Recall by Large Language Models

Show, Recall, and Tell: Image Captioning with Recall Mechanism

LLM In-Context Recall is Prompt Dependent

Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building