搜索 — ResearchTracker

The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight models, but that their performance decreases for questions with figures or open-ended questions. To facilitate future research, we release the benchmark dataset and model outputs.

LLM Parameters for Math Across Languages: Shared or Separate?

arXiv2026-06-16作者：Behzad Shomali, Luisa Victor, Tim Selbach

Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that manifests differently by language. We present a cross-lingual mechanistic analysis of mathematical reasoning in LLMs, enabling us to localize and compare model parameters that support mathematical reasoning across languages. We find that the extracted math-associated parameters exhibit partial cross-lingual overlap, with the strongest overlap concentrated in intermediate model layers. We further observe that English consistently produces the largest set of math-relevant parameters, whereas lower-resource languages reveal smaller sets of relevant parameters. These results suggest that math-related behavior in multilingual LLMs is neither fully language-invariant nor fully language-specific, but instead exhibits partial cross-lingual parameter overlap with systematic language-dependent differences.

搜索结果：math

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

LLM Parameters for Math Across Languages: Shared or Separate?

STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers

Real and Complex Analysis: Solutions to Problems in Amer. Math. Monthly, Math. Magazine, College Math. J., Elemente der Math., Crux Math., EMS Newsletter, Math. Gazette

Is math useful?

Measuring Mathematical Problem Solving With the MATH Dataset

Low progress math in a high performing system

DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

Behavioural predictors of math anxiety

Math and Dance: Notes from emerging interaction

SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation

Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math Errors

Corrigendum to "On the geometry of metric measure spaces. I." Acta Math. 196 (2006)

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Vafa-Witten Theory: Invariants, Floer Homologies, Higgs Bundles, a Geometric Langlands Correspondence, and Categorification (String Math 2022 Proceedings)

Corrigendum to "A New Uniqueness Theorem for the Tight C*-algebra of an Inverse Semigroup" [C. R. Math. Acad. Sci. Soc. R. Can. 44 (2022), no. 4, 88--112]

Solving Math Word Problems with Reexamination

Adolf Hurwitz and the Fundamental Theorem of Galois Theorie: The Königsberg Lectures of 1890-1891