Current methods to identify and classify racist language in text rely on small-n qualitative approaches or large-n approaches focusing exclusively on overt forms of racist discourse. This article provides a step-by-step generalizable guideline to identify and classify different forms of racist discourse in large corpora. In our approach, we start by conceptualizing racism and its different manifestations. We then contextualize these racist manifestations to the time and place of interest, which allows researchers to identify their discursive form. Finally, we apply XLM-RoBERTa (XLM-R), a cross-lingual model for supervised text classification with a cutting-edge contextual understanding of text. We show that XLM-R and XLM-R-Racismo, our pretrained model, outperform other state-of-the-art approaches in classifying racism in large corpora. We illustrate our approach using a corpus of tweets relating to the Ecuadorian indígena community between 2018 and 2021.
Racism is an alarming phenomenon in our country as well as all over the world. Every day we have come across some racist comments in our daily life and virtual life. Though we can eradicate this racism from virtual life (such as Social Media). In this paper, we have tried to detect those racist comments with NLP and deep learning techniques. We have built a novel dataset in the Bengali Language. Further, we annotated the dataset and conducted data label validation. After extensive utilization of deep learning methodologies, we have successfully achieved text detection with an impressive accuracy rate of 87.94\% using the Ensemble approach. We have applied RNN and LSTM models using BERT Embeddings. However, the MCNN-LSTM model performed highest among all those models. Lastly, the Ensemble approach has been followed to combine all the model results to increase overall performance.
Large language models (LLMs) have garnered significant attention for their remarkable performance in a continuously expanding set of natural language processing tasks. However, these models have been shown to harbor inherent societal biases, or stereotypes, which can adversely affect their performance in their many downstream applications. In this paper, we introduce a novel, purely prompt-based approach to uncover hidden stereotypes within any arbitrary LLM. Our approach dynamically generates a knowledge representation of internal stereotypes, enabling the identification of biases encoded within the LLM's internal knowledge. By illuminating the biases present in LLMs and offering a systematic methodology for their analysis, our work contributes to advancing transparency and promoting fairness in natural language processing systems.
Research shows that many like-minded people use popular microblogging websites for posting hateful speech against various religions and race. Automatic identification of racist and hate promoting posts is required for building social media intelligence and security informatics based solutions. However, just keyword spotting based techniques cannot be used to accurately identify the intent of a post. In this paper, we address the challenge of the presence of ambiguity in such posts by identifying the intent of author. We conduct our study on Tumblr microblogging website and develop a cascaded ensemble learning classifier for identifying the posts having racist or radicalized intent. We train our model by identifying various semantic, sentiment and linguistic features from free-form text. Our experimental results shows that the proposed approach is effective and the emotion tone, social tendencies, language cues and personality traits of a narrative are discriminatory features for identifying the racist intent behind a post.
Transcending the binary categorization of racist and xenophobic texts, this research takes cues from social science theories to develop a four dimensional category for racism and xenophobia detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of deep learning techniques, this categorical detection enables insights into the nuances of emergent topics reflected in racist and xenophobic expression on Twitter. Moreover, a stage wise analysis is applied to capture the dynamic changes of the topics across the stages of early development of Covid-19 from a domestic epidemic to an international public health emergency, and later to a global pandemic. The main contributions of this research include, first the methodological advancement. By bridging the state-of-the-art computational methods with social science perspective, this research provides a meaningful approach for future research to gain insight into the underlying subtlety of racist and xenophobic discussion on digital platforms. Second, by enabling a more accurate comprehension and even prediction of public opinions and actions, this research paves the way for the enactment of effective intervention p
Situated in the global outbreak of COVID-19, our study enriches the discussion concerning the emergent racism and xenophobia on social media. With big data extracted from Twitter, we focus on the analysis of negative sentiment reflected in tweets marked with racist hashtags, as racism and xenophobia are more likely to be delivered via the negative sentiment. Especially, we propose a stage-based approach to capture how the negative sentiment changes along with the three development stages of COVID-19, under which it transformed from a domestic epidemic into an international public health emergency and later, into the global pandemic. At each stage, sentiment analysis enables us to recognize the negative sentiment from tweets with racist hashtags, and keyword extraction allows for the discovery of themes in the expression of negative sentiment by these tweets. Under this public health crisis of human beings, this stage-based approach enables us to provide policy suggestions for the enactment of stage-specific intervention strategies to combat racism and xenophobia on social media in a more effective way.
The (((echo))) symbol -- triple parenthesis surrounding a name, made it to mainstream social networks in early 2016, with the intensification of the U.S. Presidential race. It was used by members of the alt-right, white supremacists and internet trolls to tag people of Jewish heritage -- a modern incarnation of the infamous yellow badge (Judenstern) used in Nazi-Germany. Tracking this trending meme, its meaning, and its function has proved elusive for its semantic ambiguity (e.g., a symbol for a virtual hug). In this paper we report of the construction of an appropriate dataset allowing the reconstruction of networks of racist communities and the way they are embedded in the broader community. We combine natural language processing and structural network analysis to study communities promoting hate. In order to overcome dog-whistling and linguistic ambiguity, we propose a multi-modal neural architecture based on a BERT transformer and a BiLSTM network on the tweet level, while also taking into account the users ego-network and meta features. Our multi-modal neural architecture outperforms a set of strong baselines. We further show how the the use of language and network structure i
Racism remains a persistent societal issue, increasingly amplified by the structure and dynamics of online social networks. In this work, we propose a three-state compartmental model to study the spreading and suppression of racist content, drawing from epidemic-like dynamics and interaction-driven transitions. We analyze the model on fully-connected (homogeneous mixing) networks using a set of coupled differential equations, and on Barabási-Albert (BA) scale-free and Watts-Strogatz (WS) small-world networks through agent-based simulations. The system exhibits three distinct stationary regimes: two racism-free absorbing states and one active phase with persistent racist content. We identify and characterize the phase transitions between these regimes, discuss the role of network topology, and highlight the emergence of absorbing states. Our findings illustrate how statistical physics tools can help uncover the macroscopic consequences of microscopic social interactions in digital environments.
Safety evaluation of Large Language Models (LLMs) has largely focused on high-resource languages, leaving low-resource languages critically underserved. We present AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, a linguistically distinct low-resource language with approximately 7.5 million speakers across Albania, Kosovo, North Macedonia, and the diaspora. The dataset contains 2,951 prompts spanning 11 safety categories, including self-harm, violence, racist content, child exploitation, and radicalization, with an average of 268 prompts per category. Each prompt is provided in Albanian with an English reference translation and a detailed category label. This resource addresses a significant gap in safety evaluation infrastruc-ture for low-resource languages and provides an essential benchmark for developing safer, more inclusive LLMs. The dataset will be provided upon request to support safety evaluation, fine-tuning, red-teaming, and guardrail development for Albanian-speaking communities.
Kazakh is underrepresented in resources for evaluating the safety behavior of large language models. We present KZ-SafetyPrompts, a Kazakh prompt dataset for safety evaluation across eleven categories covering common risk areas such as self-harm, violence, child exploitation, sexual content, racist content, radicalization, and regulated goods or illegal activities. The dataset contains 5,717 prompts written natively in Kazakh (Cyrillic), organized by category, with English translations for cross-lingual analysis. Prompts resemble realistic user queries, often in a teen or child style, and are phrased as intent prompts without procedural instructions. We document the writing protocol, labeling procedures (including borderline-case decision rules), and quality-control steps (schema standardization, completeness checks, and deduplication). We also align the categories with widely used safety taxonomies to support integration with existing evaluation pipelines. Baseline results with GPT-4o show an overall refusal rate of 28.2%, varying from 5.5% to 53.8% across categories, indicating that Kazakh prompts expose category-specific safety gaps not captured by English-only evaluation.
Online harassment, incitement to violence, racist behavior, and other harmful content on social media can damage social harmony and even break the law. Traditional blocklisting technologies can block malicious users, but this comes at the expense of identity privacy. The anonymous blocklisting has emerged as an effective mechanism to restrict the abuse of freedom of speech while protecting user identity privacy. However, the state-of-the-art anonymous blocklisting schemes suffer from either poor dynamism or low efficiency. In this paper, we propose $\mathsf{ShadowBlock}$, an efficient dynamic anonymous blocklisting scheme. Specifically, we utilize the pseudorandom function and cryptographic accumulator to construct the public blocklisting, enabling users to prove they are not on the blocklisting in an anonymous manner. To improve verification efficiency, we design an aggregation zero-knowledge proof mechanism that converts multiple verification operations into a single one. In addition, we leverage the accumulator's property to achieve efficient updates of the blocklisting, i.e., the original proof can be reused with minimal updates rather than regenerating the entire proof. Experi
This paper presents a characterization of AI-generated images shared on 4chan, examining how this anonymous online community is (mis-)using generative image technologies. Through a methodical data collection process, we gathered 900 images from 4chan's /pol/ (Politically Incorrect) board, which included the label "/mwg/" (memetic warfare general), between April and July 2024, identifying 66 unique AI-generated images. The analysis reveals concerning patterns in the use of this technology, with 69.7% of images including recognizable figures, 28.8% of images containing racist elements, 28.8% featuring anti-Semitic content, and 9.1% incorporating Nazi-related imagery. Overall, we document how users are weaponizing generative AI to create extremist content, political commentary, and memes that often bypass conventional content moderation systems. This research highlights significant implications for platform governance, AI safety mechanisms, and broader societal impacts as generative AI technologies become increasingly accessible. The findings underscore the urgent need for enhanced safeguards in generative AI systems and more effective regulatory frameworks to mitigate potential harms
Data filtering strategies are a crucial component to develop safe Large Language Models (LLM), since they support the removal of harmful contents from pretraining datasets. There is a lack of research on the actual impact of these strategies on vulnerable groups to discrimination, though, and their effectiveness has not been yet systematically addressed. In this paper we present a benchmark study of data filtering strategies for harm reduction aimed at providing a systematic evaluation on these approaches. We provide an overview $55$ technical reports of English LMs and LLMs to identify the existing filtering strategies in literature and implement an experimental setting to test their impact against vulnerable groups. Our results show that the positive impact that strategies have in reducing harmful contents from documents has the side effect of increasing the underrepresentation of vulnerable groups to discrimination in datasets. WARNING: the paper could contain racist, sexist, violent, and generally offensive contents
This article proposes a synthetic theory of socio-epistemic structuration to understand the reproduction of inequality in contemporary societies. I argue that social reality is not only determined by material structures and social networks but is fundamentally shaped by the epistemic frameworks -- ideologies, narratives, and attributions of agency -- that mediate actors' engagement with their environment. The theory integrates findings from critical race theory, network sociology, social capital studies, historical sociology, and analyses of emerging AI agency. I analyze how structures (from the ``racial contract'' to Facebook networks) and epistemic frameworks (from racist ideology to personal culture) mutually reinforce one another, creating resilient yet unequal life trajectories. Using data from large-scale experiments like the Moving to Opportunity and social network analyses, I demonstrate that exposure to diverse environments and social capital is a necessary but insufficient condition for social mobility; epistemic friction, manifested as `friending bias' and persistent cultural frameworks, systematically limits the benefits of such exposure. I conclude that a public and me
Improvements in model construction, including fortified safety guardrails, allow Large language models (LLMs) to increasingly pass standard safety checks. However, LLMs sometimes slip into revealing harmful behavior, such as expressing racist viewpoints, during conversations. To analyze this systematically, we introduce CoBia, a suite of lightweight adversarial attacks that allow us to refine the scope of conditions under which LLMs depart from normative or ethical behavior in conversations. CoBia creates a constructed conversation where the model utters a biased claim about a social group. We then evaluate whether the model can recover from the fabricated bias claim and reject biased follow-up questions. We evaluate 11 open-source as well as proprietary LLMs for their outputs related to six socio-demographic categories that are relevant to individual safety and fair treatment, i.e., gender, race, religion, nationality, sex orientation, and others. Our evaluation is based on established LLM-based bias metrics, and we compare the results against human judgments to scope out the LLMs' reliability and alignment. The results suggest that purposefully constructed conversations reliably
Economics Job Market Rumors (EJMR) is an online forum and clearinghouse for information on the academic job market for economists. It also includes content that is abusive, defamatory, racist, misogynistic, or otherwise "toxic." Almost all of this content is created anonymously by contributors who receive a four-character username when posting on EJMR. Using only publicly available data we show that the statistical properties of the scheme by which these usernames were generated allows the IP addresses from which most posts were made to be determined with high probability. We recover 47,630 distinct IP addresses of EJMR posters and attribute them to 66.1% of the roughly 7 million posts made over the past 12 years. We geolocate posts and describe aggregated cross-sectional variation -- particularly regarding toxic, misogynistic, and hate speech -- across sub-forums, geographies, institutions, and IP addresses. Our analysis suggests that content on EJMR comes from all echelons of the economics profession, including, but not limited to, its elite institutions.
Against a backdrop of widespread interest in how publics can participate in the design of AI, I argue for a research agenda focused on AI incidents - examples of AI going wrong and sparking controversy - and how they are constructed in online environments. I take up the example of an AI incident from September 2020, when a Twitter user created a 'horrible experiment' to demonstrate the racist bias of Twitter's algorithm for cropping images. This resulted in Twitter not only abandoning its use of that algorithm, but also disavowing its decision to use any algorithm for the task. I argue that AI incidents like this are a significant means for participating in AI systems that require further research. That research agenda, I argue, should focus on how incidents are constructed through networked online behaviours that I refer to as 'networked trouble', where formats for participation enable individuals and algorithms to interact in ways that others - including technology companies - come to know and come to care about. At stake, I argue, is an important mechanism for participating in the design and deployment of AI.
Analyzing large sets of visual media remains a challenging task, particularly in mixed-method studies dealing with problematic information and human subjects. Using AI tools in such analyses risks reifying and exacerbating biases, as well as untenable computational and cost limitations. As such, we turn to adopting geometric computer graphics and vision methods towards analyzing a large set of images from a problematic information campaign, in conjunction with human-in-the-loop qualitative analysis. We illustrate an effective case of this approach with the implementation of color quantization towards analyzing online hate image at the US-Mexico border, along with a historicist trace of the history of color quantization and skin tone scales, to inform our usage and reclamation of these methodologies from their racist origins. To that end, we scaffold motivations and the need for more researchers to consider the advantages and risks of reclaiming such methodologies in their own work, situated in our case study.
Many online hate groups exist to disparage others based on race, gender identity, sex, or other characteristics. The accessibility of these communities allows users to join multiple types of hate groups (e.g., a racist community and a misogynistic community), raising the question of whether users who join additional types of hate communities could be further radicalized compared to users who stay in one type of hate group. However, little is known about the dynamics of joining multiple types of hate groups, nor the effect of these groups on peripatetic users. We develop a new method to classify hate subreddits and the identities they disparage, then apply it to understand better how users come to join different types of hate subreddits. The hate classification technique utilizes human-validated deep learning models to extract the protected identities attacked, if any, across 168 subreddits. We find distinct clusters of subreddits targeting various identities, such as racist subreddits, xenophobic subreddits, and transphobic subreddits. We show that when users become active in their first hate subreddit, they have a high likelihood of becoming active in additional hate subreddits of
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.