搜索 — ResearchTracker

Thanks to the powerful language comprehension capabilities of Large Language Models (LLMs), existing instruction-based image editing methods have introduced Multimodal Large Language Models (MLLMs) to promote information exchange between instructions and images, ensuring the controllability and flexibility of image editing. However, these frameworks often build a multi-instruction dataset to train the model to handle multiple editing tasks, which is not only time-consuming and labor-intensive but also fails to achieve satisfactory results. In this paper, we present TalkPhoto, a versatile training-free image editing framework that facilitates precise image manipulation through conversational interaction. We instruct the open-source LLM with a specially designed prompt template to analyze user needs after receiving instructions and hierarchically invoke existing advanced editing methods, all without additional training. Moreover, we implement a plug-and-play and efficient invocation of image editing methods, allowing complex and unseen editing tasks to be integrated into the current framework, achieving stable and high-quality editing results. Extensive experiments demonstrate that o

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

arXiv2025-12-18作者：Tianyuan Qu, Lei Ke, Xiaohang Zhan

Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions; the editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting. To strengthen planning, we apply GRPO-based reinforcement learning using 1K instruction-only examples, yielding substantial gains in reasoning fidelity and format reliability. We further present IV-Edit, a benchmark focused on fine-grained grounding and knowledge-intensive edits. Across IV-Complex settings, RePlan consistently outperforms strong baselines trained on far larger datasets, improving regional precision and overall fidelity. Our project page: https://replan-iv-edit.github.io

搜索结果：Frontiers in genome editing

TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Beyond Local Edits: Embedding-Virtualized Knowledge for Broader Evaluation and Preservation of Model Editing

Multi-Focus Querying of the Human Genome Information on Desktop and in Virtual Reality: an Evaluation

Agptools: a utility suite for editing genome assemblies

3D-Consistent Multi-View Editing by Correspondence Guidance

PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic Planning

DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing

EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Prokaryotic genome editing based on the subtype I-B-Svi CRISPR-Cas system

The Mitochondrial Genome of Cathaya argyrophylla Reaches 18.99 Mb: Analysis of Super-Large Mitochondrial Genomes in Pinaceae

Third Time's the Charm? Image and Video Editing with StyleGAN3

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Template-based eukaryotic genome editing directed by SviCas3

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

Identification and characterization of unique to human regulatory sequences in embryonic stem cells reveal associations with transposable elements, distal enhancers, non-coding RNA, and DNA methylation-driven mechanisms of genome editing

InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing

Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing