搜索结果：OpenAI

共找到 20 条结果

高级筛选 ▾

OpenAI single-agent LLM architecture reduces computational overhead relative to multi-agent orchestration in a simulated mars rover decision-support benchmark.

PubMed2026-01-01作者：Sanabria D

Mars rover missions require decision-support systems that can interpret terrain, telemetry, environmental conditions, and mission objectives under delayed communication with Earth. This study evaluates whether multi-agent orchestration improves simulated Mars rover decision support compared with a single-agent baseline. A controlled benchmark of 100 synthetic mission-inspired rover scenarios was evaluated using OpenAI GPT-4o and GPT-5.5, with five repeated runs per scenario and architecture. Model-facing scenario inputs were separated from evaluator-side labels so that expected actions and hazards were reserved for scoring only. Performance was measured using decision accuracy, exact and substring-based semantic hazard F1, hazard error counts, latency, token usage, scenario-level paired statistical comparisons, and GPT-4o specialist-agent ablations. Across the tested OpenAI configurations, the single-agent architecture showed numerical advantages in decision accuracy and hazard-label alignment, but these decision-quality differences were not consistently significant under scenario-level statistical analysis with Holm-Bonferroni adjustment. The only decision-quality metric remaining significant was GPT-5.5 exact hazard F1, although absolute values were very low. The most reliable difference was computational efficiency: the single-agent architecture required substantially lower latency and token usage than the prompt-defined multi-agent orchestration architecture. Multi-agent orchestration generated broader hazard lists, including plausible non-canonical observations, but did not reliably improve aggregate decision accuracy or hazard F1. These findings suggest that, for short-context, tool-less, static decision-support tasks where all relevant context is available in a single input, multi-agent orchestration should be treated as a cost-bearing design choice rather than an assumed improvement. The study contributes a reproducible architecture-level benchmark for evaluating when LLM-based orchestration is worth its operational cost in mission-inspired workflows.

搜索结果：OpenAI

OpenAI single-agent LLM architecture reduces computational overhead relative to multi-agent orchestration in a simulated mars rover decision-support benchmark.

From API to Action: A Multi-Model Comparison of OpenAI, Anthropic, Google, and Meta LLMs for Clinical Trial Data Extraction.

Can large language models provide high-quality desk review decisions in an orthopaedic surgery journal? A concordance study comparing three AI models to human editorial decisions.

AI-Assisted Pharmaceutical Formulation Design: Comparative Development and Experimental Evaluation of Sustained-Release Lornoxicam Tablets.

Can Motivational Interviewing be delivered using Artificial Intelligence chatbots? Evaluating the capability of GPT-4o.

Using Large Language Models to Generate Retina Patient Education Material: A Comparative Analysis With American Society of Retina Specialists Patient Brochures.

Correct but Incomplete: Limitations of AI-Assisted Decision Support in Rectal Cancer.

Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects.

Application of large language models to the annotation of cell lines and mouse strains in genomics data.

AI-assisted concept mapping enhances orthopaedic nursing students' critical thinking: A quasi-experimental study.

LLMs in Medical Education for Autism Caregivers: A Comparative Evaluation of Accuracy, Readability, Actionability, and Neurodiversity-Affirming Language.

Performance of large language models in electrocardiogram interpretation: A comparative study.

Exploring Attitudes of Primary Caregivers Towards Pediatric Tissue-Based Research using Large Language Models: Insights from Rural and Urban Community Calls and Surveys.

Automated Generation and Human Evaluation of Neurosurgical Board Examination Self-Assessment Questions.

Efficacy of a Large Language Model Data Extraction System in Evidence Reviews for Emerging Infectious Diseases: A Randomized Crossover Trial.

Diagnostic Performance and Workup Efficiency of Large Language Models in Secondary Hypertension: A Blinded Comparative Study.

A No-Code, Guideline-Based Custom GPT Outperforms Cardiologists in Response Quality for Cardiac Amyloidosis.

Benchmark-Based Evaluation of ChatGPT and Gemini in Radiation Oncology: Performance, Limitations, and Challenges for Clinical Interpretation.

Disparities in AI-Based Prior Authorization for Head and Neck Reconstruction: A Large Language Model Analysis.

Double Disadvantage? Wellbeing Among Vietnamese Migrant Parents of Autistic Children.

搜索结果：OpenAI

OpenAI single-agent LLM architecture reduces computational overhead relative to multi-agent orchestration in a simulated mars rover decision-support benchmark.

From API to Action: A Multi-Model Comparison of OpenAI, Anthropic, Google, and Meta LLMs for Clinical Trial Data Extraction.

Can large language models provide high-quality desk review decisions in an orthopaedic surgery journal? A concordance study comparing three AI models to human editorial decisions.

AI-Assisted Pharmaceutical Formulation Design: Comparative Development and Experimental Evaluation of Sustained-Release Lornoxicam Tablets.

Can Motivational Interviewing be delivered using Artificial Intelligence chatbots? Evaluating the capability of GPT-4o.

Using Large Language Models to Generate Retina Patient Education Material: A Comparative Analysis With American Society of Retina Specialists Patient Brochures.

Correct but Incomplete: Limitations of AI-Assisted Decision Support in Rectal Cancer.

Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects.

Application of large language models to the annotation of cell lines and mouse strains in genomics data.

AI-assisted concept mapping enhances orthopaedic nursing students' critical thinking: A quasi-experimental study.

LLMs in Medical Education for Autism Caregivers: A Comparative Evaluation of Accuracy, Readability, Actionability, and Neurodiversity-Affirming Language.

Performance of large language models in electrocardiogram interpretation: A comparative study.

Exploring Attitudes of Primary Caregivers Towards Pediatric Tissue-Based Research using Large Language Models: Insights from Rural and Urban Community Calls and Surveys.

Automated Generation and Human Evaluation of Neurosurgical Board Examination Self-Assessment Questions.

Efficacy of a Large Language Model Data Extraction System in Evidence Reviews for Emerging Infectious Diseases: A Randomized Crossover Trial.

Diagnostic Performance and Workup Efficiency of Large Language Models in Secondary Hypertension: A Blinded Comparative Study.

A No-Code, Guideline-Based Custom GPT Outperforms Cardiologists in Response Quality for Cardiac Amyloidosis.

Benchmark-Based Evaluation of ChatGPT and Gemini in Radiation Oncology: Performance, Limitations, and Challenges for Clinical Interpretation.

Disparities in AI-Based Prior Authorization for Head and Neck Reconstruction: A Large Language Model Analysis.

Double Disadvantage? Wellbeing Among Vietnamese Migrant Parents of Autistic Children.