Context: Interest in diversity in software development has significantly increased in recent years. Reporting on diversity in software projects can enhance user trust and assist regulators in evaluating adoption. Recent AI directives include clauses that mandate diversity information during development, highlighting the growing interest of public regulators. However, current documentation often neglects diversity in favor of technical features, partly due to a lack of tools for its description and annotation. Objectives: This work introduces the Software Diversity Card, a structured approach for documenting and sharing diversity-related aspects within software projects. It aims to profile the various teams involved in software development and governance, including user groups in testing and software adaptations for diverse social groups. Methods: We conducted a literature review on diversity and inclusion in software development and analyzed 1,000 top-starred Open Source Software (OSS) repositories on GitHub to identify diversity-related information. Moreover, we present a diversity modeling language, a toolkit for generating cards using it, and a study of its application in two re
When selecting applicants for scholarships, universities, or jobs, practitioners often aim for a diverse cohort of qualified recipients. However, differing articulations, constructs, and notions of diversity prevents decision-makers from operationalising and progressing towards the diversity they all agree is needed. To understand this challenge of translation from values, to requirements, to decision support tools (DSTs), we conducted participatory design studies exploring professionals' varied perceptions of diversity and how to build for them. Our results suggest three definitions of diversity: bringing together different perspectives; ensuring representativeness of a base population; and contextualising applications, which we use to create the Diversity Triangle. We experience-prototyped DSTs reflecting each angle of the Diversity Triangle to enhance decision-making around diversity. We find that notions of diversity are highly diverse; efforts to design DSTs for diversity should start by working with organisations to distil 'diversity' into definitions and design requirements.
Recommender systems have made significant strides in various industries, primarily driven by extensive efforts to enhance recommendation accuracy. However, this pursuit of accuracy has inadvertently given rise to echo chamber/filter bubble effects. Especially in industry, it could impair user's experiences and prevent user from accessing a wider range of items. One of the solutions is to take diversity into account. However, most of existing works focus on user's explicit preferences, while rarely exploring user's non-interaction preferences. These neglected non-interaction preferences are especially important for broadening user's interests in alleviating echo chamber/filter bubble effects.Therefore, in this paper, we first define diversity as two distinct definitions, i.e., user-explicit diversity (U-diversity) and user-item non-interaction diversity (N-diversity) based on user historical behaviors. Then, we propose a succinct and effective method, named as Controllable Category Diversity Framework (CCDF) to achieve both high U-diversity and N-diversity simultaneously.Specifically, CCDF consists of two stages, User-Category Matching and Constrained Item Matching. The User-Categor
Measuring the diversity of creative outputs is central to evaluating post-training mode collapse, comparing decoding strategies, and quantifying creative behavior in both AI and human writing. We propose a new approach to measuring diversity using in-context learning, of which the ``Decan'' metric, $D_{Ca_n} = C \times a_n$, is the working instance we evaluate: a per-byte score read off the per-token log-probabilities of a base model $θ$ in a \emph{single forward pass} per permutation, with no embedding model, no reference corpus, and no human labels. This approach is grounded in information theory, makes use of language model in-context learning to detect a wide range of similarities between any number of inputs, and obviates the need to train a special-purpose model. The same pipeline scores AI samples and human-written response sets, with diversity treated as a property of (responses, prompt, scoring model). On Tevet and Berant's human-grounded McDiv benchmark, $D_{Ca_n}$ reaches OCA 0.846 on the McDiv prompt\_gen set where it performs best, behind the strongest neural baseline reported in Tevet and Berant (SentBERT, 0.897). On the OLMo-2-7B post-training pipeline, $D_{Ca_n}$ dr
How do labor demand shocks affect workforce diversity in the absence of targeted diversity policies? A conceptual framework illustrates the potential trade-off between the demographic and quality composition of a workforce when there is a positive labor demand shock. Exploiting the German reunification as a natural experiment, we analyze the academic labor market where nearly all social sciences professors in East Germany were replaced while STEM faculty remained largely unchanged. Using administrative data and a regional difference-in-differences design, we find increased dispersion in the institutional quality of hires, indicating that the new hires came from less select departments. At the same time, female representation did not increase despite qualified women in the pipeline. Instead, East German hiring patterns converged to those in West Germany in terms of gender composition. In simulations, we investigate implied losses: Under conservative assumptions, we show that, considering the pipeline of qualified applicants, the marginal female hire's quality is approximately half a standard deviation higher than the marginal male hire's quality.
Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires connecting information from multiple sources. This paper introduces Vendi-RAG, a framework based on an iterative process that jointly optimizes retrieval diversity and answer quality. This joint optimization leads to significantly higher accuracy for multi-hop QA tasks. Vendi-RAG leverages the Vendi Score (VS), a flexible similarity-based diversity metric, to promote semantic diversity in document retrieval. It then uses an LLM judge that evaluates candidate answers, generated after a reasoning step, and outputs a score that the retriever uses to balance relevance and diversity among the retrieved documents during each iteration. Experiments on three challenging datasets -- HotpotQA, MuSiQue, and 2WikiMultiHopQA -- demonstrate Vendi-RAG's effectiveness in multi-hop reasoning tasks. The framework achieves significant accuracy improvements over traditional single-step and m
Modern deep learning science often assumes that neural networks learn from a fixed data distribution. However, many practically important learning problems involve data distributions that change throughout training. How does such non-stationarity impact the inductive biases of deep learning towards models with different structural, generalisation, and safety properties? A fruitful testbed for studying inductive bias is in-context linear regression sequence modelling, where small transformers display strikingly different generalisation patterns depending on the diversity of the (fixed) training task distribution. In this paper, we explore the effect of diversifying the task distribution across training time, finding that such temporal diversity leads to an increased bias towards generalisation over memorisation.
Metaheuristics are widely applied for their ability to provide more efficient solutions. The RIME algorithm is a recently proposed physical-based metaheuristic algorithm with certain advantages. However, it suffers from rapid loss of population diversity during optimization and is prone to fall into local optima, leading to unbalanced exploitation and exploration. To address the shortcomings of RIME, this paper proposes a modified RIME with covariance learning and diversity enhancement (MRIME-CD). The algorithm applies three strategies to improve the optimization capability. First, a covariance learning strategy is introduced in the soft-rime search stage to increase the population diversity and balance the over-exploitation ability of RIME through the bootstrapping effect of dominant populations. Second, in order to moderate the tendency of RIME population to approach the optimal individual in the early search stage, an average bootstrapping strategy is introduced into the hard-rime puncture mechanism, which guides the population search through the weighted position of the dominant populations, thus enhancing the global search ability of RIME in the early stage. Finally, a new sta
Phylogenetic diversity is a popular measure for quantifying the biodiversity of a collection $Y$ of species, while phylogenetic diversity indices provide a way to apportion phylogenetic diversity to individual species. Typically, for some specific diversity index, the phylogenetic diversity of $Y$ is not equal to the sum of the diversity indices of the species in $Y.$ In this paper, we investigate the extent of this difference for two commonly-used indices: Fair Proportion and Equal Splits. In particular, we determine the maximum value of this difference under various instances including when the associated rooted phylogenetic tree is allowed to vary across all root phylogenetic trees with the same leaf set and whose edge lengths are constrained by either their total sum or their maximum value.
Diversity is a commonly known principle in the design of recommender systems, but also ambiguous in its conceptualization. Through semi-structured interviews we explore how practitioners at three different public service media organizations in the Netherlands conceptualize diversity within the scope of their recommender systems. We provide an overview of the goals that they have with diversity in their systems, which aspects are relevant, and how recommendations should be diversified. We show that even within this limited domain, conceptualization of diversity greatly varies, and argue that it is unlikely that a standardized conceptualization will be achieved. Instead, we should focus on effective communication of what diversity in this particular system means, thus allowing for operationalizations of diversity that are capable of expressing the nuances and requirements of that particular domain.
Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms, including color diversity.Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meets the desired distance requirements and the required batch size.To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans.The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity, our approach contributes to the creation of more inclusive and represe
In this work, we study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution needs to ensure that the number of chosen cluster centers from each group should be within the range defined by a lower and upper bound threshold for each group, while simultaneously minimizing the clustering objective, which can be either $k$-median, $k$-means or $k$-supplier. We study the computational complexity of the proposed problems, offering insights into their NP-hardness, polynomial-time inapproximability, and fixed-parameter intractability. We present parameterized approximation algorithms with approximation ratios $1+ \frac{2}{e} + ε\approx 1.736$, $1+\frac{8}{e} + ε\approx 3.943$, and $5$ for diversity-aware $k$-median, diversity-aware $k$-means and diversity-aware $k$-supplier, respectively. Assuming Gap-ETH, the approximation ratios are tight for the diversity-aware $k$-median and diversity-aware $k$-means problems. Our results imply the same approximation factors for their respective fair variants with disjoint groups -- fair $k$-median, fair $k$-means, and fair $k$-supplier -- with lower bou
Computing diverse sets of high quality solutions for a given optimization problem has become an important topic in recent years. In this paper, we introduce a coevolutionary Pareto Diversity Optimization approach which builds on the success of reformulating a constrained single-objective optimization problem as a bi-objective problem by turning the constraint into an additional objective. Our new Pareto Diversity optimization approach uses this bi-objective formulation to optimize the problem while also maintaining an additional population of high quality solutions for which diversity is optimized with respect to a given diversity measure. We show that our standard co-evolutionary Pareto Diversity Optimization approach outperforms the recently introduced DIVEA algorithm which obtains its initial population by generalized diversifying greedy sampling and improving the diversity of the set of solutions afterwards. Furthermore, we study possible improvements of the Pareto Diversity Optimization approach. In particular, we show that the use of inter-population crossover further improves the diversity of the set of solutions.
Measuring diversity accurately is important for many scientific fields, including machine learning (ML), ecology, and chemistry. The Vendi Score was introduced as a generic similarity-based diversity metric that extends the Hill number of order q=1 by leveraging ideas from quantum statistical mechanics. Contrary to many diversity metrics in ecology, the Vendi Score accounts for similarity and does not require knowledge of the prevalence of the categories in the collection to be evaluated for diversity. However, the Vendi Score treats each item in a given collection with a level of sensitivity proportional to the item's prevalence. This is undesirable in settings where there is a significant imbalance in item prevalence. In this paper, we extend the other Hill numbers using similarity to provide flexibility in allocating sensitivity to rare or common items. This leads to a family of diversity metrics -- Vendi scores with different levels of sensitivity -- that can be used in a variety of applications. We study the properties of the scores in a synthetic controlled setting where the ground truth diversity is known. We then test their utility in improving molecular simulations via Ven
When using Quality Diversity (QD) optimization to solve hard exploration or deceptive search problems, we assume that diversity is extrinsically valuable. This means that diversity is important to help us reach an objective, but is not an objective in itself. Often, in these domains, practitioners benchmark their QD algorithms against single objective optimization frameworks. In this paper, we argue that the correct comparison should be made to \emph{multi-objective} optimization frameworks. This is because single objective optimization frameworks rely on the aggregation of sub-objectives, which could result in decreased information that is crucial for maintaining diverse populations automatically. In order to facilitate a fair comparison between quality diversity and multi-objective optimization, we present a method that utilizes dimensionality reduction to automatically determine a set of behavioral descriptors for an individual, as well as a set of objectives for an individual to solve. Using the former, one can generate solutions using standard quality diversity optimization techniques, and using the latter, one can generate solutions using standard multi-objective optimization
Quality diversity is a recent family of evolutionary search algorithms which focus on finding several well-performing (quality) yet different (diversity) solutions with the aim to maintain an appropriate balance between divergence and convergence during search. While quality diversity has already delivered promising results in complex problems, the capacity of divergent search variants for quality diversity remains largely unexplored. Inspired by the notion of surprise as an effective driver of divergent search and its orthogonal nature to novelty this paper investigates the impact of the former to quality diversity performance. For that purpose we introduce three new quality diversity algorithms which employ surprise as a diversity measure, either on its own or combined with novelty, and compare their performance against novelty search with local competition, the state of the art quality diversity algorithm. The algorithms are tested in a robot navigation task across 60 highly deceptive mazes. Our findings suggest that allowing surprise and novelty to operate synergistically for divergence and in combination with local competition leads to quality diversity algorithms of significa
Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD). At the trajectory distribution level, we re-define BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity
Despite the critical role of functional diversity (FD) in understanding ecological systems and processes, its robust quantification remains a significant challenge. A long-held view in the field is that it is not possible to capture its three facets -- functional richness, functional divergence, and functional evenness -- in a single index. This perspective has prompted recent proposals for FD measurement to use three separate indices, one for each aspect. Here, we challenge this paradigm by demonstrating that the probability-weighted Vendi Score (pVS), first introduced by Friedman and Dieng (2023), can serve as a powerful functional diversity index that can capture its three facets. We adapt pVS to functional ecology by defining it as the exponential of the Rényi entropy of the eigenvalues of the abundance-weighted trait similarity matrix. This formulation allows pVS to be applicable at any biological level. It can be defined at the species level, at which most existing FD metrics are defined, and at the individual level to naturally incorporate intraspecific trait variation (ITV) when detailed data are available. We theoretically and empirically demonstrate the robustness of pVS.
Effectively leveraging diversity has been shown to improve performance for various machine learning models, including large language models (LLMs). However, determining the most effective way of using diversity remains a challenge. In this work, we compare two diversity approaches for answering binary questions using LLMs: model diversity, which relies on multiple models answering the same question, and question interpretation diversity, which relies on using the same model to answer the same question framed in different ways. For both cases, we apply majority voting as the ensemble consensus heuristic to determine the final answer. Our experiments on boolq, strategyqa, and pubmedqa show that question interpretation diversity consistently leads to better ensemble accuracy compared to model diversity. Furthermore, our analysis of GPT and LLaMa shows that model diversity typically produces results between the best and the worst ensemble members without clear improvement.
In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of