搜索 — ResearchTracker

Weird generalization is a phenomenon in which models fine-tuned on data from a narrow domain (e.g. insecure code) develop surprising traits that manifest even outside that domain (e.g. broad misalignment)-a phenomenon that prior work has highlighted as a critical safety concern. Here, we present an extended replication study of key weird generalization results across an expanded suite of models and datasets. We confirm that surprising (and dangerous) traits can emerge under certain circumstances, but we find that weird generalization is exceptionally brittle: it emerges only for specific models on specific datasets, and it vanishes under simple training-time, prompt-based interventions. We find that the most effective interventions provide prompt context that makes the generalized behavior the expected behavior. However, we show that even very generic interventions that do not anticipate specific generalized traits can still be effective in mitigating weird generalization's effects. Our findings thus help clarify the nature of the safety threat that weird generalization poses and point toward an easily implemented set of solutions.

Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models

arXiv2025-08-22作者：Ke Zhou, Marios Constantinides, Daniele Quercia

Large language models (LLMs) are often trained on data that reflect WEIRD values: Western, Educated, Industrialized, Rich, and Democratic. This raises concerns about cultural bias and fairness. Using responses to the World Values Survey, we evaluated five widely used LLMs: GPT-3.5, GPT-4, Llama-3, BLOOM, and Qwen. We measured how closely these responses aligned with the values of the WEIRD countries and whether they conflicted with human rights principles. To reflect global diversity, we compared the results with the Universal Declaration of Human Rights and three regional charters from Asia, the Middle East, and Africa. Models with lower alignment to WEIRD values, such as BLOOM and Qwen, produced more culturally varied responses but were 2% to 4% more likely to generate outputs that violated human rights, especially regarding gender and equality. For example, some models agreed with the statements ``a man who cannot father children is not a real man'' and ``a husband should always know where his wife is'', reflecting harmful gender norms. These findings suggest that as cultural representation in LLMs increases, so does the risk of reproducing discriminatory beliefs. Approaches suc

搜索结果：Weird

Weird Generalization is Weirdly Brittle

Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models

On the emergence and properties of weird quasiperiodic attractors

LLMs Model Non-WEIRD Populations: Experiments with Synthetic Cultural Agents

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images

WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?

Abundance of weird quasiperiodic attractors in piecewise linear discontinuous maps

Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation

One Weird Trick to Untie Landin's Knot

Weird $\mathbb R$-Factorizable Groups

Does the Doer Effect Exist Beyond WEIRD Populations? Toward Analytics in Radio and Phone-Based Learning

Why AI Is WEIRD and Should Not Be This Way: Towards AI For Everyone, With Everyone, By Everyone

How WEIRD is Usable Privacy and Security Research? (Extended Version)

Not Only WEIRD but "Uncanny"? A Systematic Review of Diversity in Human-Robot Interaction Research

WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?

Primitive weird numbers having more than three distinct prime factors

Visualizing the Weird and the Eerie

AutoWeird: Weird Translational Scoring Function Identified by Random Search

Weird Machines as Insecure Compilation

Searching on the boundary of abundance for odd weird numbers