Affective polarization, characterized by increased negativity and hostility towards opposing groups, has become a prominent feature of political discourse worldwide. Our study examines the presence of this type of polarization in a selection of European parliaments in a fully automated manner. Utilizing a comprehensive corpus of parliamentary speeches from the parliaments of six European countries, we employ natural language processing techniques to estimate parliamentarian sentiment. By comparing the levels of negativity conveyed in references to individuals from opposing groups versus one's own, we discover patterns of affectively polarized interactions. The findings demonstrate the existence of consistent affective polarization across all six European parliaments. Although activity correlates with negativity, there is no observed difference in affective polarization between less active and more active members of parliament. Finally, we show that reciprocity is a contributing mechanism in affective polarization between parliamentarians across all six parliaments.
Keeping track of how lawmakers vote is essential for government transparency. While many parliamentary voting records are available online, they are often difficult to interpret, making it challenging to understand legislative behavior across parliaments and predict voting outcomes. Accurate prediction of votes has several potential benefits, from simplifying parliamentary work by filtering out bills with a low chance of passing to refining proposed legislation to increase its likelihood of approval. In this study, we leverage advanced machine learning and data analysis techniques to develop a comprehensive framework for predicting parliamentary voting outcomes across multiple legislatures. We introduce the Voting Prediction Framework (VPF) - a data-driven framework designed to forecast parliamentary voting outcomes at the individual legislator level and for entire bills. VPF consists of three key components: (1) Data Collection - gathering parliamentary voting records from multiple countries using APIs, web crawlers, and structured databases; (2) Parsing and Feature Integration - processing and enriching the data with meaningful features, such as legislator seniority, and content-
In parliamentary elections, parties compete for a limited, typically fixed number of seats. Most parliaments are assembled using apportionment methods that distribute the seats based on the parties' vote counts. Common apportionment methods include divisor sequence methods (like D'Hondt or Sainte-Laguë), the largest-remainder method, and first-past-the-post. In many countries, an electoral threshold is implemented to prevent very small parties from entering the parliament. Further, several countries have apportionment systems that incorporate multiple districts. We study how computationally hard it is to change the election outcome (i.e., to increase or limit the influence of a distinguished party) by convincing a limited number of voters to change their vote. We refer to these bribery-style attacks as \emph{strategic campaigns} and study the corresponding problems in terms of their computational (both classical and parameterized) complexity. We also run extensive experiments on real-world election data and study the effectiveness of optimal campaigns, in particular as opposed to using heuristic bribing strategies and with respect to the influence of the threshold and the influence
Our goal is to learn about the political interests and preferences of the Members of Parliament by mining their parliamentary activity, in order to develop a recommendation/filtering system that, given a stream of documents to be distributed among them, is able to decide which documents should receive each Member of Parliament. We propose to use positive unlabeled learning to tackle this problem, because we only have information about relevant documents (the own interventions of each Member of Parliament in the debates) but not about irrelevant documents, so that we cannot use standard binary classifiers trained with positive and negative examples. We have also developed a new algorithm of this type, which compares favourably with: a) the baseline approach assuming that all the interventions of other Members of Parliament are irrelevant, b) another well-known positive unlabeled learning method and c) an approach based on information retrieval methods that matches documents and legislators' representations. The experiments have been carried out with data from the regional Andalusian Parliament at Spain.
Ensuring legislative accountability in multi-party systems requires quantitative tools that reveal actual voting behavior beyond formal party affiliations. We present a network-based framework for analyzing parliamentary dynamics at multiple scales, capturing coalition structure, group coherence, and individual influence. Applied to over 4 million vote expressions from the Italian Parliament across three government formations (2018-2021), the methodology combines network modularity, voting distance metrics, and betweenness centrality to map the structure of collective decision-making. Using this framework, we show that system-level polarization, as captured by network modularity, varies systematically with coalition structure rather than coalition size. Technical governments display paradoxically lower global polarization despite broader formal support, reflecting structurally mixed voting patterns rather than unified blocs. On polarizing issues such as immigration, network polarization depends strongly on the fragmentation or cohesion of the opposition, even when the governing coalition votes cohesively. Betweenness analysis reveals that mediator roles are highly concentrated, wit
This paper introduces ParlaCAP, a large-scale dataset for analyzing parliamentary agenda setting across Europe, and proposes a cost-effective method for building domain-specific policy topic classifiers. Applying the Comparative Agendas Project (CAP) schema to the multilingual ParlaMint corpus of over 8 million speeches from 28 parliaments of European countries and autonomous regions, we follow a teacher-student framework in which a high-performing large language model (LLM) annotates in-domain training data and a multilingual encoder model is fine-tuned on these annotations for scalable data annotation. We show that this approach produces a classifier tailored to the target domain. Agreement between the LLM and human annotators is comparable to inter-annotator agreement among humans, and the resulting model outperforms existing CAP classifiers trained on manually-annotated but out-of-domain data. In addition to the CAP annotations, the ParlaCAP dataset offers rich speaker and party metadata, as well as sentiment predictions coming from the ParlaSent multilingual transformer model, enabling comparative research on political attention and representation across countries. We illustra
Large Language Models (LLMs) display remarkable capabilities to understand or even produce political discourse but have been found to consistently exhibit a progressive left-leaning bias. At the same time, so-called persona or identity prompts have been shown to produce LLM behavior that aligns with socioeconomic groups with which the base model is not aligned. In this work, we analyze whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies. We evaluate whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods. Finally, we find that we can simulate the voting behavior of Members of the European Parliament reasonably well, achieving a weighted F1 score of approximately 0.793. Our persona dataset of politicians in the 2024 European Parliament and our code are available at the following url: https://github.com/dess-mannheim/european_parliament_simulation.
This paper examines the impact of temperature shocks on European Parliament elections. We combine high-resolution climate data with results from parliamentary elections between 1989 and 2019, aggregated at the NUTS-2 regional level. Exploiting exogenous variation in unusually warm and hot days during the months preceding elections, we identify the effect of short-run temperature shocks on voting behaviour. We find that temperature shocks reduce ideological polarisation and increase vote concentration, as voters consolidate around larger, more moderate parties. This aggregated pattern is explained by a gain in support of liberal and, to a lesser extent, social democratic parties, while right-wing parties lose vote share. Consistent with a salience mechanism, complementary analysis of party manifestos shows greater emphasis on climate-related issues in warmer pre-electoral contexts. Overall, our findings indicate that climate shocks can shift party systems toward the centre and weaken political extremes.
This paper presents a new long-form release of the Swiss Parliaments Corpus, converting entire multi-hour Swiss German debate sessions (each aligned with the official session protocols) into high-quality speech-text pairs. Our pipeline starts by transcribing all session audio into Standard German using Whisper Large-v3 under high-compute settings. We then apply a two-step GPT-4o correction process: first, GPT-4o ingests the raw Whisper output alongside the official protocols to refine misrecognitions, mainly named entities. Second, a separate GPT-4o pass evaluates each refined segment for semantic completeness. We filter out any segments whose Predicted BLEU score (derived from Whisper's average token log-probability) and GPT-4o evaluation score fall below a certain threshold. The final corpus contains 801 hours of audio, of which 555 hours pass our quality control. Compared to the original sentence-level SPC release, our long-form dataset achieves a 6-point BLEU improvement, demonstrating the power of combining robust ASR, LLM-based correction, and data-driven filtering for low-resource, domain-specific speech corpora.
Social media platforms face heightened risks during major political events; yet, how platforms adapt their moderation practices in response remains unclear. The Digital Services Act Transparency Database offers an unprecedented opportunity to systematically study content moderation at scale, enabling researchers and policymakers to assess platforms' compliance and effectiveness. Herein, we analyze 1.58 billion self-reported moderation actions taken by eight large social media platforms during an extended period of eight months surrounding the 2024 European Parliament elections. Our findings reveal a lack of adaptation in moderation strategies, as platforms did not exhibit significant changes in their enforcement behaviors surrounding the elections. This raises concerns about whether platforms adapted their moderation practices at all, or if structural limitations of the database concealed possible adjustments. Moreover, we found that noted transparency and accountability issues persist nearly a year after initial concerns were raised. These results highlight the limitations of current self-regulatory approaches and underscore the need for stronger enforcement and data access mechan
The work covers the development and explainability of machine learning models for predicting political leanings through parliamentary transcriptions. We concentrate on the Slovenian parliament and the heated debate on the European migrant crisis, with transcriptions from 2014 to 2020. We develop both classical machine learning and transformer language models to predict the left- or right-leaning of parliamentarians based on their given speeches on the topic of migrants. With both types of models showing great predictive success, we continue with explaining their decisions. Using explainability techniques, we identify keywords and phrases that have the strongest influence in predicting political leanings on the topic, with left-leaning parliamentarians using concepts such as people and unity and speak about refugees, and right-leaning parliamentarians using concepts such as nationality and focus more on illegal migrants. This research is an example that understanding the reasoning behind predictions can not just be beneficial for AI engineers to improve their models, but it can also be helpful as a tool in the qualitative analysis steps in interdisciplinary research.
Brexit, a global pandemic, the Russian invasion of Ukraine, and record inflation - few legislative bodies have faced such a cascade of shocks as the European Parliament did during its 9th term (2019-2024). Using the Bipartite Configuration Model and a set of network statistics, this research explores how multi-polarization was characterized during this term by constructing and analyzing co-voting networks across all legislative subjects and within specific legislative subjects. The results contest binary polarization narratives inherited from US/UK scholarship by uncovering a multi-polar landscape. In many legislative subjects, including "Community policies", "Internal market, single market", and "External relations of the Union", coalitions realign fluidly, forming several voting communities rather than a single left-right divide. Ideological affinity and group memberships, not nationality, emerge as the primary forces that bind or separate Members of the European Parliament, reaffirming the chamber's transnational character. Two quantitative patterns stand out. First, the Greens/EFA and The Left display the highest intragroup cohesion, while governing groups - EPP, S&D, and R
This paper introduces the first functional model of a quantum parliament that is dominated by two parties or coalitions, and may or may not contain independent legislators. We identify a single crucial parameter, aptly named \emph{free will radius}, that can be used as a practical measure of the quantumness of the parties and the parliament as a whole. The free will radius used by the two parties determines the degree of independence that is afforded to the representatives of the parties. Setting the free will radius to zero degrades the quantum parliament to a classical one. On the other hand, setting the free will radius to its maximum value $1$, makes the representatives totally independent. Moreover, we present a quantum circuit with which we simulate in Qiskit the operation of the quantum parliament under various scenarios. The experimental results allow to arrive at some novel and fundamental conclusions that, we believe, provide new insights on the operation and the traits of a possible future quantum parliament. Finally, we propose the novel game of ``Passing the Bill,'' which captures the operation of the quantum parliament and basic options available to the leadership of
Data protection laws and policies have been studied extensively in recent years, but little is known about the parliamentary politics of data protection. This imitation applies even to the European Union (EU) that has taken the global lead in data protection and privacy regulation. For patching this notable gap in existing research, this paper explores the data protection questions raised by the Members of the European Parliament (MEPs) in the Parliament's plenary sessions and the answers given to these by the European Commission. Over a thousand of such questions and answers are covered in a period from 1995 to early 2023. Given computational analysis based on text mining, the results indicate that (a) data protection has been actively debated in the Parliament during the past twenty years. No noticeable longitudinal trends are present; the debates have been relatively constant. As could be expected, (b) the specific data protection laws in the EU have frequently been referenced in these debates, which (c) do not seem to align along conventional political dimensions such as the left-right axis. Furthermore, (d) numerous distinct data protection topics have been debated by the parl
The application of natural language processing on political texts as well as speeches has become increasingly relevant in political sciences due to the ability to analyze large text corpora which cannot be read by a single person. But such text corpora often lack critical meta information, detailing for instance the party, age or constituency of the speaker, that can be used to provide an analysis tailored to more fine-grained research questions. To enable researchers to answer such questions with quantitative approaches such as natural language processing, we provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023, split into a total of 10,806,105 speeches. This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment, which enables a deeper analysis. We further provide three exploratory analyses, detailing topic shares of different parties throughout time, a descriptive analysis of the development of the age of a
"Synthetic samples" based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans, assuming that their training data includes information on human attitudes and behavior. However, LLM-synthetic samples might exhibit bias, for example due to training data and fine-tuning processes being unrepresentative of diverse contexts. Such biases risk reinforcing existing biases in research, policymaking, and society. Therefore, researchers need to investigate if and under which conditions LLM-generated synthetic samples can be used for public opinion prediction. In this study, we examine to what extent LLM-based predictions of individual public opinion exhibit context-dependent biases by predicting the results of the 2024 European Parliament elections. Prompting three LLMs with individual-level background information of 26,000 eligible European voters, we ask the LLMs to predict each person's voting behavior. By comparing them to the actual results, we show that LLM-based predictions of future voting behavior largely fail, their accuracy is unequally distributed across national and linguistic contexts, and they require detailed attitudinal info
We introduce a dataset on political orientation and power position identification. The dataset is derived from ParlaMint, a set of comparable corpora of transcribed parliamentary speeches from 29 national and regional parliaments. We introduce the dataset, provide the reasoning behind some of the choices during its creation, present statistics on the dataset, and, using a simple classifier, some baseline results on predicting political orientation on the left-to-right axis, and on power position identification, i.e., distinguishing between the speeches delivered by governing coalition party members from those of opposition party members.
This paper introduces DeepParliament, a legal domain Benchmark Dataset that gathers bill documents and metadata and performs various bill status classification tasks. The proposed dataset text covers a broad range of bills from 1986 to the present and contains richer information on parliament bill content. Data collection, detailed statistics and analyses are provided in the paper. Moreover, we experimented with different types of models ranging from RNN to pretrained and reported the results. We are proposing two new benchmarks: Binary and Multi-Class Bill Status classification. Models developed for bill documents and relevant supportive tasks may assist Members of Parliament (MPs), presidents, and other legal practitioners. It will help review or prioritise bills, thus speeding up the billing process, improving the quality of decisions and reducing the time consumption in both houses. Considering that the foundation of the country's democracy is Parliament and state legislatures, we anticipate that our research will be an essential addition to the Legal NLP community. This work will be the first to present a Parliament bill prediction task. In order to improve the accessibility o
In this paper we propose a dynamical approach based on the Gorini-Kossakowski-Sudarshan-Lindblad equation for a problem of decision making. More specifically, we consider what was recently called a quantum parliament, asked to approve or not a certain law, and we propose a model of the connections between the various members of the parliament, proposing in particular some special form of the interactions giving rise to a {\em collaborative} or non collaborative behaviour.
The TCPD-IPD dataset is a collection of questions and answers discussed in the Lower House of the Parliament of India during the Question Hour between 1999 and 2019. Although it is difficult to analyze such a huge collection manually, modern text analysis tools can provide a powerful means to navigate it. In this paper, we perform an exploratory analysis of the dataset. In particular, we present insightful corpus-level statistics and a detailed analysis of three subsets of the dataset. In the latter analysis, the focus is on understanding the temporal evolution of topics using a dynamic topic model. We observe that the parliamentary conversation indeed mirrors the political and socio-economic tensions of each period.