搜索 — ResearchTracker

Most existing extreme compression methods fail to achieve an optimal rate-distortion-perception trade-off, as they typically prioritize perceptual fidelity and visual realism over pixel-level accuracy. Consequently, the resulting reconstructions often deviate noticeably from the originals. Ultra-low bitrate image compression is therefore crucial-not only for producing extremely compact representations but also for ensuring that reconstructed images remain semantically coherent and faithful to the source at the pixel level. To this end, we propose SPRDiff, a diffusion-based compression method that fully leverages both semantic and pixel representations, thereby enhancing reconstruction fidelity under ultra-low bitrate constraints. Specifically, we develop a triple-encoder architecture that utilizes high-fidelity features from the pretrained distortion-oriented and semantic-oriented encoders to compensate for the limited representations extracted by the frozen VAE encoder, thereby improving latent compression and entropy modeling. To further enhance the reconstruction fidelity of diffusion models, we introduce a distortion-aware reconstruction module with dual feature extraction. Thi

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

arXiv2026-04-20作者：Jingwei Yang, Ruoxi Wu, Wei Shen

Style transfer must match a target style while preserving content semantics. DiT-based diffusion models often suffer from content-style entanglement, leading to reference-content leakage and unstable generation. We present UniCSG, a unified framework for content-constrained, style-driven generation in both text-guided and reference-guided settings. UniCSG employs staged training: (i) a latent-space semantic disentanglement stage that combines low-frequency preprocessing with conditioning corruption to encourage content-style separation, and (ii) a latent-space frequency-aware detail reconstruction stage that refines details via multi-scale frequency supervision. We further incorporate pixel-space reward learning to align latent objectives with perceptual quality after decoding. Experiments demonstrate improved content faithfulness, style alignment, and robustness in both settings.

搜索结果：Pixel-faithful

Exploiting Semantic and Pixel Representations for Ultra-Low Bitrate Image Compression

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

EditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editing

Mid-Infrared Single-Photon Compressive Spectroscopy

Segment Anything with Robust Uncertainty-Accuracy Correlation

Pixal3D: Pixel-Aligned 3D Generation from Images

When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers

Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement

Toward Faithful Segmentation Attribution via Benchmarking and Dual-Evidence Fusion

From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

FG-Portrait: 3D Flow Guided Editable Portrait Animation

An Implementation of the Crack Topology Score with Extensions

Enhancing Concept Localization in CLIP-based Concept Bottleneck Models

Faithful Counterfactual Visual Explanations (FCVE)

TraNCE: Transformative Non-linear Concept Explainer for CNNs

RECODE: Reasoning Through Code Generation for Visual Question Answering

cryoSENSE: Compressive Sensing Enables High-throughput Microscopy with Sparse and Generative Priors on the Protein Cryo-EM Image Manifold

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction