Social recommendation, which seeks to leverage social ties among users to alleviate the sparsity issue of user-item interactions, has emerged as a popular technique for elevating personalized services in recommender systems. Despite being effective, existing social recommendation models are mainly devised for recommending regular items such as blogs, images, and products, and largely fail for community recommendations due to overlooking the unique characteristics of communities. Distinctly, communities are constituted by individuals, who present high dynamicity and relate to rich structural patterns in social networks. To our knowledge, limited research has been devoted to comprehensively exploiting this information for recommending communities. To bridge this gap, this paper presents CASO, a novel and effective model specially designed for social community recommendation. Under the hood, CASO harnesses three carefully-crafted encoders for user embedding, wherein two of them extract community-related global and local structures from the social network via social modularity maximization and social closeness aggregation, while the third one captures user preferences using collaborati
Although beneficial information abounds on social media, the dissemination of harmful information such as so-called ``fake news'' has become a serious issue. Therefore, many researchers have devoted considerable effort to limiting the diffusion of harmful information. A promising approach to limiting diffusion of such information is link deletion methods in social networks. Link deletion methods have been shown to be effective in reducing the size of information diffusion cascades generated by synthetic models on a given social network. In this study, we evaluate the effectiveness of link deletion methods by using actual logs of retweet cascades, rather than by using synthetic diffusion models. Our results show that even after deleting 10\%--50\% of links from a social network, the size of cascades after link deletion is estimated to be only 50\% the original size under the optimistic estimation, which suggests that the effectiveness of the link deletion strategy for suppressing information diffusion is limited. Moreover, our results also show that there is a considerable number of cascades with many seed users, which renders link deletion methods inefficient.
This text provides with an introduction to the modern approach of artificiality and simulation in social sciences. It presents the relationship between complexity and artificiality, before introducing the field of artificial societies which greatly benefited from the computer power fast increase, gifting social sciences with formalization and experimentation tools previously owned by "hard" sciences alone. It shows that as "a new way of doing social sciences", artificial societies should undoubtedly contribute to a renewed approach in the study of sociality and should play a significant part in the elaboration of original theories of social phenomena.
Environmental Social Governance (ESG) is a widely used metric that measures the sustainability of a company practices. Currently, ESG is determined using self-reported corporate filings, which allows companies to portray themselves in an artificially positive light. As a result, ESG evaluation is subjective and inconsistent across raters, giving executives mixed signals on what to improve. This project aims to create a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment. Social sentiment allows for more balanced perspectives which directly highlight public opinion, helping companies create more focused and impactful initiatives. To build this, Python web scrapers were developed to collect data from Wikipedia, Twitter, LinkedIn, and Google News for the S&P 500 companies. Data was then cleaned and passed through NLP algorithms to obtain sentiment scores for ESG subcategories. Using these features, machine-learning algorithms were trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities. The Random-Forest model was the strongest model with a mean absolute error of 13.4% an
The rise of social media has fundamentally transformed how people engage in public discourse and form opinions. While these platforms offer unprecedented opportunities for democratic engagement, they have been implicated in increasing social polarization and the formation of ideological echo chambers. Previous research has primarily relied on observational studies of social media data or theoretical modeling approaches, leaving a significant gap in our understanding of how individuals respond to and are influenced by polarized online environments. Here we present a novel experimental framework for investigating polarization dynamics that allows human users to interact with LLM-based artificial agents in a controlled social network simulation. Through a user study with 122 participants, we demonstrate that this approach can successfully reproduce key characteristics of polarized online discourse while enabling precise manipulation of environmental factors. Our results provide empirical validation of theoretical predictions about online polarization, showing that polarized environments significantly increase perceived emotionality and group identity salience while reducing expressed
In 2016, a network of social media accounts animated by Russian operatives attempted to divert political discourse within the American public around the presidential elections. This was a coordinated effort, part of a Russian-led complex information operation. Utilizing the anonymity and outreach of social media platforms Russian operatives created an online astroturf that is in direct contact with regular Americans, promoting Russian agenda and goals. The elusiveness of this type of adversarial approach rendered security agencies helpless, stressing the unique challenges this type of intervention presents. Building on existing scholarship on the functions within influence networks on social media, we suggest a new approach to map those types of operations. We argue that pretending to be legitimate social actors obliges the network to adhere to social expectations, leaving a social footprint. To test the robustness of this social footprint we train artificial intelligence to identify it and create a predictive model. We use Twitter data identified as part of the Russian influence network for training the artificial intelligence and to test the prediction. Our model attains 88% pred
Word embeddings are an essential instrument in many NLP tasks. Most available resources are trained on general language from Web corpora or Wikipedia dumps. However, word embeddings for domain-specific language are rare, in particular for the social science domain. Therefore, in this work, we describe the creation and evaluation of word embedding models based on 37,604 open-access social science research papers. In the evaluation, we compare domain-specific and general language models for (i) language coverage, (ii) diversity, and (iii) semantic relationships. We found that the created domain-specific model, even with a relatively small vocabulary size, covers a large part of social science concepts, their neighborhoods are diverse in comparison to more general models. Across all relation types, we found a more extensive coverage of semantic relationships.
This study investigates the interconnectivity of firms and Environmental Justice Organizations (EJOs) involved in socio-environmental conflicts worldwide, using data from the Environmental Justice Atlas (EJAtlas). By constructing a multilayer network that links firms, conflicts, and EJOs, the research applies social network analysis to evaluate the simultaneous involvement of these actors across multiple disputes. Both projected networks of firms and EJOs have been analysed by aggregating nodes by categories and countries to reveal structural differences. Findings reveal a stark contrast between the interconnectedness of firms and EJOs. Multinational corporations form a cohesive global network, enabling them to coordinate strategies and exert influence across regions. Conversely, EJOs are fragmented, often operating in isolated clusters with limited interconnection but forming a robust, decentralized and self-organized global network. Firms network present a strong dependence on pertaining conflict category while EJOs network does not depend on conflict category. This structural difference suggests a risk of systemic and structural coordination for firms towards exploitative expans
Social media plays a central role in shaping public opinion and behavior, yet performing experiments on these platforms and, in particular, on feed algorithms is becoming increasingly challenging. This guide offers practical recommendations for researchers developing and deploying field experiments focused on real-time reranking of social media feeds. The article is organized around two contributions. First, we provide an overview of an experimental method using web browser extensions that intercepts and reranks content in real time, enabling naturalistic reranking field experiments. We then describe feed interventions and measurements that this paradigm enables on participants' actual feeds, without requiring the involvement of social media platforms. Second, we offer concrete technical recommendations for intercepting and reranking social media feeds with minimal user-facing delay, and provide an open-source implementation. This document aims to summarize lessons learned in running field experiments on social media, provide concrete implementation details, and foster the ecosystem of independent social media research. Finally, we release the source code that serves as a blueprint
Do different fields of knowledge require different research strategies? A numerical model exploring different virtual knowledge landscapes, revealed two diverging optimal search strategies. Trend following is maximized when the popularity of new discoveries determine the number of individuals researching it. This strategy works best when many researchers explore few large areas of knowledge. In contrast, individuals or small groups of researchers are better in discovering small bits of information in dispersed knowledge landscapes. Bibliometric data of scientific publications showed a continuous bipolar distribution of these strategies, ranging from natural sciences, with highly cited publications in journals containing a large number of articles, to the social sciences, with rarely cited publications in many journals containing a small number of articles. The natural sciences seem to adapt their research strategies to landscapes with large concentrated knowledge clusters, whereas social sciences seem to have adapted to search in landscapes with many small isolated knowledge clusters. Similar bipolar distributions were obtained when comparing levels of insularity estimated by indic
The community plays a crucial role in understanding user behavior and network characteristics in social networks. Some users can use multiple social networks at once for a variety of objectives. These users are called overlapping users who bridge different social networks. Detecting communities across multiple social networks is vital for interaction mining, information diffusion, and behavior migration analysis among networks. This paper presents a community detection method based on nonnegative matrix tri-factorization for multiple heterogeneous social networks, which formulates a common consensus matrix to represent the global fused community. Specifically, the proposed method involves creating adjacency matrices based on network structure and content similarity, followed by alignment matrices which distinguish overlapping users in different social networks. With the generated alignment matrices, the method could enhance the fusion degree of the global community by detecting overlapping user communities across networks. The effectiveness of the proposed method is evaluated with new metrics on Twitter, Instagram, and Tumblr datasets. The results of the experiments demonstrate its
Conventional economic and socio-behavioural models assume perfect symmetric access to information and rational behaviour among interacting agents in a social system. However, real-world events and observations appear to contradict such assumptions, leading to the possibility of other, more complex interaction rules existing between such agents. We investigate this possibility by creating two different models for a doctor-patient system. One retains the established assumptions, while the other incorporates principles of reflexivity theory and cognitive social structures. In addition, we utilize a microbial genetic algorithm to optimize the behaviour of the physician and patient agents in both models. The differences in results for the two models suggest that social systems may not always exhibit the behaviour or even accomplish the purpose for which they were designed and that modelling the social and cognitive influences in a social system may capture various ways a social agent balances complementary and competing information signals in making choices.
In this paper, we address the challenge of discovering hidden nodes in unknown social networks, formulating three types of hidden-node discovery problems, namely, Sybil-node discovery, peripheral-node discovery, and influencer discovery. We tackle these problems by employing a graph exploration framework grounded in machine learning. Leveraging the structure of the subgraph gradually obtained from graph exploration, we construct prediction models to identify target hidden nodes in unknown social graphs. Through empirical investigations of real social graphs, we investigate the efficiency of graph exploration strategies in uncovering hidden nodes. Our results show that our graph exploration strategies discover hidden nodes with an efficiency comparable to that when the graph structure is known. Specifically, the query cost of discovering 10% of the hidden nodes is at most only 1.2 times that when the topology is known, and the query-cost multiplier for discovering 90% of the hidden nodes is at most only 1.4. Furthermore, our results suggest that using node embeddings, which are low-dimensional vector representations of nodes, for hidden-node discovery is a double-edged sword: it is
Nowadays, protecting trust in social sciences also means engaging in open community dialogue, which helps to safeguard robustness and improve efficiency of research methods. The combination of open data, open review and open dialogue may sound simple but implementation in the real world will not be straightforward. However, in view of Begley and Ellis's (2012) statement that, "the scientific process demands the highest standards of quality, ethics and rigour," they are worth implementing. More importantly, they are feasible to work on and likely will help to restore plausibility to social sciences research. Therefore, I feel it likely that the triplet of open data, open review and open dialogue will gradually emerge to become policy requirements regardless of the research funding source.
Social Network Analysis is a way of studying agents embedded in contexts. In about 1998, physicists discovered social networks as representations of complex systems. Small-world and scale-free networks are the paradigmatic models of this Network Science. Relying on various models and mechanisms of socio-cultural processes, an identity model is developed and calibrated in a case study of Social Network Science. This research domain results from the union of Social Network Analysis and Network Science. A unique dataset of 25,760 scholarly articles from one century of research (1916-2012) is created. Clustering this set of publications, five subdomains are detected and analyzed in terms of authorship, citation, and word usage structures and dynamics. The scaling hypothesis of percolation theory is formulated for socio-cultural systems, namely that power-law size distributions like Lotka's, Bradford's, and Zipf's Law mean that the described identity resides at the phase transition between the stability and change of meaning. In this case, it can be diagnosed using bivariate scaling laws and Abbott's heuristic of fractal distinctions. Identities are not dichotomies but dualities of soci
In recent months, the social impact of Artificial Intelligence (AI) has gained considerable public interest, driven by the emergence of Generative AI models, ChatGPT in particular. The rapid development of these models has sparked heated discussions regarding their benefits, limitations, and associated risks. Generative models hold immense promise across multiple domains, such as healthcare, finance, and education, to cite a few, presenting diverse practical applications. Nevertheless, concerns about potential adverse effects have elicited divergent perspectives, ranging from privacy risks to escalating social inequality. This paper adopts a methodology to delve into the societal implications of Generative AI tools, focusing primarily on the case of ChatGPT. It evaluates the potential impact on several social sectors and illustrates the findings of a comprehensive literature review of both positive and negative effects, emerging trends, and areas of opportunity of Generative AI models. This analysis aims to facilitate an in-depth discussion by providing insights that can inspire policy, regulation, and responsible development practices to foster a human-centered AI.
Social divide and polarization have become significant societal issues. To understand the mechanisms behind these phenomena, social media analysis offers research opportunities in computational social science, where developing effective user embedding methods is essential for subsequent analysis. Traditionally, researchers have used predefined network-based user features (e.g., network size, degree, and centrality measures). However, because such measures may not capture the complex characteristics of social media users, in our study we developed a method for embedding users based on a URL domain co-occurrence network. This approach effectively represents social media users involved in competing events such as political campaigns and public health crises. We assessed the method's performance using binary classification tasks and datasets that covered topics associated with the COVID-19 infodemic, such as QAnon, Biden, and Ivermectin, among Twitter users. Our results revealed that user embeddings generated directly from the retweet network and/or based on language performed below expectations, whereas our domain-based embeddings outperformed those methods while reducing computation
There is widespread emphasis on reform in the teaching of introductory statistics at the college level. Underpinning this reform is a consensus among educators and practitioners that traditional curricular materials and pedagogical strategies have not been effective in promoting statistical literacy, a competency that is becoming increasingly necessary for effective decision-making and evidence-based practice. This paper explains the historical context of, and rationale for reform-oriented teaching of introductory statistics (at the college level) in the health, social and behavioral sciences (evidence-based disciplines). A firm understanding and appreciation of the basis for change in pedagogical approach is important, in order to facilitate commitment to reform, consensus building on appropriate strategies, and adoption and maintenance of best practices. In essence, reform-oriented pedagogy, in this context, is a function of the interaction among content, pedagogy, technology, and assessment. The challenge is to create an appropriate balance among these domains.
Bots have been in the spotlight for many social media studies, for they have been observed to be participating in the manipulation of information and opinions on social media. These studies analyzed the activity and influence of bots in a variety of contexts: elections, protests, health communication and so forth. Prior to this analyses is the identification of bot accounts to segregate the class of social media users. In this work, we propose an ensemble method for bot detection, designing a multi-platform bot detection architecture to handle several problems along the bot detection pipeline: incomplete data input, minimal feature engineering, optimized classifiers for each data field, and also eliminate the need for a threshold value for classification determination. With these design decisions, we generalize our bot detection framework across Twitter, Reddit and Instagram. We also perform feature importance analysis, observing that the entropy of names and number of interactions (retweets/shares) are important factors in bot determination. Finally, we apply our multi-platform bot detector to the US 2020 presidential elections to identify and analyze bot activity across multiple
Community detection on social media has attracted considerable attention for many years. However, existing methods do not reveal the relations between communities. Communities can form alliances or engage in antagonisms due to various factors, e.g., shared or conflicting goals and values. Uncovering such relations can provide better insights to understand communities and the structure of social media. According to social science findings, the attitudes that members from different communities express towards each other are largely shaped by their community membership. Hence, we hypothesize that inter-community attitudes expressed among users in social media have the potential to reflect their inter-community relations. Therefore, we first validate this hypothesis in the context of social media. Then, inspired by the hypothesis, we develop a framework to detect communities and their relations by jointly modeling users' attitudes and social interactions. We present experimental results using three real-world social media datasets to demonstrate the efficacy of our framework.