搜索 — ResearchTracker

With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles and abstracts. To help researchers spot these connections, we present PaperWeaver, an enriched paper alerts system that provides contextualized text descriptions of recommended papers based on user-collected papers. PaperWeaver employs a computational method based on Large Language Models (LLMs) to infer users' research interests from their collected papers, extract context-specific aspects of papers, and compare recommended and collected papers on these aspects. Our user study (N=15) showed that participants using PaperWeaver were able to better understand the relevance of recommended papers and triage them more confidently when compared to a baseline that presented the related work sections from recommended papers.

What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit and an Open Scoring Schema

arXiv2026-05-20作者：Mahdi Naser Moghadasi, Faezeh Ghaderi

We read twelve well-known LLM agent benchmark papers and recorded, dimension by dimension, what each paper actually says about how its evaluation was run. The motivation came from a familiar frustration: two papers will report results on the same benchmark with the same model name and disagree, and you cannot tell why -- the scaffold, the sampling settings, the subset, or the evaluator version. In many cases the published artifact does not let you answer. This paper is an implementation report on the attempt. We designed a small audit schema (five fields: benchmark identity, harness specification, inference settings, cost reporting, failure breakdown), wrote a scoring codebook with the boundary cases we hit during pilot scoring, applied it to twelve canonical papers (eight agent, four classical static), and recorded what we saw. We score the disclosure of an agent run, not its correctness, and make no claim that disclosure implies a trustworthy result. The mean audit score across the eight agent-benchmark papers is 0.38 (out of 1.0), and across the four classical static benchmarks 0.66; the largest gap is on cost (none of the eight agent benchmark papers disclose inference cost in

搜索结果：Papers

PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit and an Open Scoring Schema

What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

New paper-by-paper classification for Scopus based on references reclassified by the origin of the papers citing them

HCI Papers Cite HCI Papers, Increasingly So

Report on some papers related to the function $\mathop{\mathcal R }(s)$ found by Siegel in Riemann's posthumous papers

What is Visualization for Communication? Analyzing Four Years of VisComm Papers

On three early papers by Herbert Busemann

Large Scale Subject Category Classification of Scholarly Papers with Deep Attentive Neural Networks

Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing

Prediction of highly cited papers

Uncited papers are not unread

Publication and collaboration anomalies in academic papers originating from a paper mill: evidence from a Russia-based paper mill

Uncovering the dynamics of citations of scientific papers

Assessing the Quality of Scientific Papers

Errata on the Calculation of Hot Gas Properties in a Few Li Jiang-Tao's Papers

Discovering seminal works with marker papers

Paper Espresso: From Paper Overload to Research Insight

Paper self-citation: An unexplored phenomenon