搜索结果：Food chemistry

共找到 20 条结果

高级筛选 ▾

ChemPro: A Progressive Chemistry Benchmark for Large Language Models

arXiv

We introduce ChemPro, a progressive benchmark with 4100 natural language question-answer pairs in Chemistry, across 4 coherent sections of difficulty designed to assess the proficiency of Large Language Models (LLMs) in a broad spectrum of general chemistry topics. We include Multiple Choice Questions and Numerical Questions spread across fine-grained information recall, long-horizon reasoning, multi-concept questions, problem-solving with nuanced articulation, and straightforward questions in a balanced ratio, effectively covering Bio-Chemistry, Inorganic-Chemistry, Organic-Chemistry and Physical-Chemistry. ChemPro is carefully designed analogous to a student's academic evaluation for basic to high-school chemistry. A gradual increase in the question difficulty rigorously tests the ability of LLMs to progress from solving basic problems to solving more sophisticated challenges. We evaluate 45+7 state-of-the-art LLMs, spanning both open-source and proprietary variants, and our analysis reveals that while LLMs perform well on basic chemistry questions, their accuracy declines with different types and levels of complexity. These findings highlight the critical limitations of LLMs in

Food4All: An Agentic Framework and Benchmark for Food Resource Navigation with Adaptive User Understanding

arXiv2025-10-21作者：Yiyang Li, Weixiang Sun, Tianyi Ma

Food assistance referral requires conversational agents to translate underspecified, often noisy help-seeking dialogues into locally valid resource recommendations. We present Food4All, an agentic food-resource referral framework and benchmark grounded in 686 structured Indiana food resources. Food4All couples a food-specific search tool with 300 multi-turn evaluation tasks spanning single food needs, composite cases with access or document constraints, and five non-ideal user interaction traits: unreasonable demands, rambling responses, impatience, incomplete answers, and inconsistent information. We evaluate six Large Language Models (LLMs) on requirement grounding, resource retrieval, final referral correctness, and interaction efficiency. Although the strongest model achieves 96.33% referral accuracy, our diagnostics reveal persistent failures in grounding schedule, eligibility, intake, and document constraints, as well as failures to preserve valid retrieved resources in the final recommendation. Trait-level analysis further shows that different non-ideal behaviors stress different parts of the referral pipeline. Food4All provides a controlled testbed for studying tool-calling

搜索结果：Food chemistry

ChemPro: A Progressive Chemistry Benchmark for Large Language Models

Food4All: An Agentic Framework and Benchmark for Food Resource Navigation with Adaptive User Understanding

Extending FKG.in: Towards a Food Claim Traceability Network

ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images

MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance

Long-Tailed Continual Learning For Visual Food Recognition

Ortho-Para Chemistry of H2CO in the Protoplanetary Disk TW Hya

Building FKG.in: a Knowledge Graph for Indian Food

MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams

Food safety trends across Europe: insights from the 392-million-entry CompreHensive European Food Safety (CHEFS) database

Food Delivery Time Prediction in Indian Cities Using Machine Learning Models

Machine learning and natural language processing models to predict the extent of food processing

Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval

Unlocking The Future of Food Security Through Access to Finance for Sustainable Agribusiness Performance

Peripheral Nervous System Responses to Food Stimuli: Analysis Using Data Science Approaches

Development of an updated, comprehensive food composition database for Australian-grown horticultural commodities

Food Redistribution as Optimization

Evaluation of the use of web technology by government of Sri Lanka to ensure food security for its citizens