搜索 — ResearchTracker

OCR systems, ranging from classical engines to specialised OCR vision-language models (OCR-VLMs) and frontier multimodal LLMs, report strong results on English and Chinese document benchmarks, yet their behaviour on Indic scripts is largely uncharacterised. We benchmark ten systems on Devanagari (Hindi): classical EasyOCR; open VLMs (Qwen2.5-VL-3B, Qwen3-VL-8B, olmOCR-7B); specialised OCR-VLMs (DeepSeek-OCR, Unlimited-OCR); and frontier closed models (Gemini 2.5 Flash, Claude Opus 4.7, GPT-5.5, Mistral OCR), across four synthetic degradation conditions and 300 real printed scans. We report four findings. First, on clean rendered text all ten cluster within chrF++ 91 to 98, so synthetic text does not separate them. Second, under degradation the specialised OCR-VLMs are the most fragile: DeepSeek-OCR suffers rare but catastrophic repetition failures (outputs up to 71 the reference length) that wreck its corpus mean even though its median is the best of any system, which is why we report median and catastrophic-rate instead of the mean. Third, on real scans nine of the ten systems collapse (EasyOCR falls from chrF++ 93.6 to 58.3) and the field spreads across a 76-point range, so synth

ECHO: Explainable Co-editing with Human-in-the-loop Operations for Presentation Refinement

arXiv2026-05-11作者：Yu Fu, Yongqi Kang, Yujia Zhou

Authoring and refining presentation slides is a highly time-consuming core task in academic and business domains. While generative AI tools have lowered the barrier for creating initial drafts, their "black-box, one-way generation" paradigm severely deprives users of fine-grained control. Through a formative study (N=10), we identified "trial-and-error anxiety" and "inconsistent cross-page formatting" as primary bottlenecks in human-AI co-creation. Consequently, we present ECHO, an interactive system based on multimodal intent grounding and explainable operation plans. ECHO enables precise local edits via a "natural language + visual selection" paradigm, utilizing a decoupled "Plan-Confirm-Execute" loop and dynamic memory mechanisms to transform implicit AI intents into highly controllable layout co-creation. To systematically evaluate document refinement, we propose the CoEdit-Eval framework. Objective evaluations across multiple foundation models (e.g., GPT-5, GLM-4.7) demonstrate that while baselines uniformly fail in intent mapping (0% accuracy) and spatial grounding (0% Hit@1), the ECHO architecture boosts Target Hit@1 to 55%--85% depending on the base model. Furthermore, inte

搜索结果：GPT-55

Can OCR-VLMs Read Devanagari? A Stress-Test Benchmark and Post-Correction Study

ECHO: Explainable Co-editing with Human-in-the-loop Operations for Presentation Refinement

From Outliers to Errors: Auditing Pali-to-English LLM Translations with Multi-Reference Adjudication

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting

Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

HealthAgentBench: A Unified Benchmark Suite of Realistic Agentic Healthcare Environments for Challenging Frontier AI Agents

Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

The Daily Dose: Workflow-Integrated Large Language Model Automation for Clinical Summarization and Trial Identification in Radiation Oncology

Deterministic access to global viral sequence data enables robust agentic scientific discovery

R+R: Reassessing Java Security API Misuse in Current LLMs: A Replication on JCA and JSSE APIs with External Security Knowledge

Multilingual Prompt Localization for Agent-as-a-Judge: Language and Backbone Sensitivity in Requirement-Level Evaluation

On a problem on a generalization of Euler's totient function

When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents

Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

Temporal Leakage in Search-Engine Date-Filtered Web Retrieval: A Retrospective Forecasting Case Study

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Multimodal Safety Evaluation in Generative Agent Social Simulations