搜索 — ResearchTracker

Current large language models (LLMs) are trained on massive amounts of text data, primarily from a few dominant languages. Studies suggest that this over-reliance on high-resource languages, such as English, hampers LLM performance in mid- and low-resource languages. To mitigate this problem, we propose to (i) optimize the language distribution by training a small proxy model within a domain-reweighing DoGE algorithm that we extend to XDoGE for a multilingual setup, and (ii) rescale the data and train a full-size model with the established language weights either from scratch or within a continual pre-training phase (CPT). We target six languages possessing a variety of geographic and intra- and inter-language-family relations, namely, English and Spanish (high-resource), Portuguese and Catalan (mid-resource), Galician and Basque (low-resource). We experiment with Salamandra-2b, which is a promising model for these languages. We investigate the effects of substantial data repetition on minor languages and under-sampling on dominant languages using the IberoBench framework for quantitative evaluation. Finally, we release a new promising IberianLLM-7B-Instruct model centering on Iber

DOGE: Differentiable Bezier Graph Optimization for Road Network Extraction

arXiv2025-11-25作者：Jiahui Sun, Junran Lu, Jinhui Yin

Automatic extraction of road networks from aerial imagery is a fundamental task, yet prevailing methods rely on polylines that struggle to model curvilinear geometry. We maintain that road geometry is inherently curve-based and introduce the Bézier Graph, a differentiable parametric curve-based representation. The primary obstacle to this representation is to obtain the difficult-to-construct vector ground-truth (GT). We sidestep this bottleneck by reframing the task as a global optimization problem over the Bézier Graph. Our framework, DOGE, operationalizes this paradigm by learning a parametric Bézier Graph directly from segmentation masks, eliminating the need for curve GT. DOGE holistically optimizes the graph by alternating between two complementary modules: DiffAlign continuously optimizes geometry via differentiable rendering, while TopoAdapt uses discrete operators to refine its topology. Our method sets a new state-of-the-art on the large-scale SpaceNet and CityScale benchmarks, presenting a new paradigm for generating high-fidelity vector maps of road networks. We will release our code and related data.

搜索结果：DOGE

XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs

DOGE: Differentiable Bezier Graph Optimization for Road Network Extraction

DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation

This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs

DoGE: Domain Reweighting with Generalization Estimation

DOGE: An Extrinsic Orientation and Gyroscope Bias Estimation for Visual-Inertial Odometry Initialization

Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets

What is a good doge? Analyzing the patrician social network of the Republic of Venice

DOGE-Train: Discrete Optimization on GPU with End-to-end Training

The Doge of Wall Street: Analysis and Detection of Pump and Dump Cryptocurrency Manipulations

Large Language Models Can Be a Viable Substitute for Expert Political Surveys When a Shock Disrupts Traditional Measurement Approaches

Dynamic Collateral Control for Permissionless Spot Perpetual Basis Trading

Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

The Cyber Immune System: Harnessing Adversarial Forces for Security Resilience

Retrofitting a two-way peg between blockchains

A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

Measuring Memecoin Fragility

AI-Generated Images for representing Individuals: Navigating the Thin Line Between Care and Bias

Pre-training and Diagnosing Knowledge Base Completion Models

Direct measurement of the scattering cross sections of liquid ortho-deuterium for ultracold neutrons and comparison with model calculations