We investigated the use of Empirical Mode Decomposition (EMD) combined with Gaussian Mixture Models (GMM), feature engineering and machine learning algorithms to optimize trading decisions. We used five, two, and one year samples of hourly candle data for GameStop, Tesla, and XRP (Ripple) markets respectively. Applying a 15 hour rolling window for each market, we collected several features based on a linear model and other classical features to predict the next hour's movement. Subsequently, a GMM filtering approach was used to identify clusters among these markets. For each cluster, we applied the EMD algorithm to extract high, medium, low and trend components from each feature collected. A simple thresholding algorithm was applied to classify market movements based on the percentage change in each market's close price. We then evaluated the performance of various machine learning models, including Random Forests (RF) and XGBoost, in classifying market movements. A naive random selection of trading decisions was used as a benchmark, which assumed equal probabilities for each outcome, and a temporal cross-validation approach was used to test models on 40%, 30%, and 20% of the datas
The surge of retail investor activity on social media, exemplified by the 2021 GameStop short squeeze, raised questions about the influence of online sentiment on stock prices. This paper explores whether sentiment derived from social media discussions can meaningfully predict stock market movements. We focus on Reddit's r/wallstreetbets and analyze sentiment related to two companies: GameStop (GME) and AMC Entertainment (AMC). To assess sentiment's role, we employ two existing text-based sentiment analysis methods and introduce a third, a ChatGPT-annotated and fine-tuned RoBERTa-based model designed to better interpret the informal language and emojis prevalent in social media discussions. We use correlation and causality metrics to determine these models' predictive power. Surprisingly, our findings suggest that social media sentiment has only a weak correlation with stock prices. At the same time, simpler metrics, such as the volume of comments and Google search trends, exhibit stronger predictive signals. These results highlight the complexity of retail investor behavior and suggest that traditional sentiment analysis may not fully capture the nuances of market-moving online di
In early 2021, the stock prices of GameStop, AMC, Nokia and BlackBerry experienced dramatic increases, triggered by short-squeeze operations that have been largely attributed to Reddit's retail investors. Here, we shed light on the extent and timing of Reddit users' influence on the GameStop short squeeze. Using statistical analysis tools with high temporal resolution, we find that increasing Reddit discussions anticipated high trading volumes. This effect emerged abruptly a few weeks before the event but waned once the community gained widespread visibility through Twitter. Meanwhile, the collective investment of the community quantified through posts of individual positions, closely mirrored the market capitalization of the stock. This evidence suggests a coordinated action of users in developing a shared financial strategy through social media--targeting GameStop first and other stocks afterward. Overall, our results provide novel insights into the role of Reddit users in the dynamics of the GameStop short squeeze.
Stock markets are impacted by a large variety of factors including news and discussions among investors about investment opportunities. With the emergence of social media, new opportunities for having financial discussions arose. The market frenzy surrounding GameStop (GME) on the Reddit subreddit Wallstreetbets, caused financial discussion forums to receive widespread attention and it was established that Wallstreetbets played a leading role in the stock market movements of GME. Here, we present a new data set for exploring the effect of social media discussion forums on the stock market. The dataset consists of posts published on various Reddit subreddits concerning the popular meme stocks GameStop (GME), American Multi-Cinema Entertainment Holdings (AMC), and BlackBerry (BB). We document the data collection and processing steps and show that the posts and comments about these meme stocks are related to their market movements.
Nowadays human interactions largely take place on social networks, with online users' behavior often falling into a few general typologies or "social roles". Among these, opinion leaders are of crucial importance as they have the ability to spread an idea or opinion on a large scale across the network, with possible tangible consequences in the real world. In this work we extract and characterize the different social roles of users within the Reddit WallStreetBets community, around the time of the GameStop short squeeze of January 2021 -- when a handful of committed users led the whole community to engage in a large and risky financial operation. We identify the profiles of both average users and of relevant outliers, including opinion leaders, using an iterative, semi-supervised classification algorithm, which allows us to discern the characteristics needed to play a particular social role. The key features of opinion leaders are large risky investments and constant updates on a single stock, which allowed them to attract a large following and, in the case of GameStop, ignite the interest of the community. Finally, we observe a substantial change in the behavior and attitude of us
Spearheaded by retail traders on the website reddit, the GameStop short squeeze of early 2021 shows that social media embeds information that correlates with market movements. This paper seeks to examine this relationship by using daily frequencies of classified comments and buzzwords as additional factors in a Fama-French three factor model. Comments are classified using an unsupervised clustering method, while past studies have used pretrained models that are not specific to the domains being studied.
A trite yet fundamental question in economics is: What causes large asset price fluctuations? A tenfold rise in the price of GameStop equity, between the 22nd and 28th of January 2021, demonstrated that herding behaviour among retail investors is an important contributing factor. This paper presents a data-driven guide to the forum that started the hype -- WallStreetBets (WSB). Our initial experiments decompose the forum using a large language topic model and network tools. The topic model describes the evolution of the forum over time and shows the persistence of certain topics (such as the market / S\&P500 discussion), and the sporadic interest in others, such as COVID or crude oil. Network analysis allows us to decompose the landscape of retail investors into clusters based on their posting and discussion habits; several large, correlated asset discussion clusters emerge, surrounded by smaller, niche ones. A second set of experiments assesses the impact that WSB discussions have had on the market. We show that forum activity has a Granger-causal relationship with the returns of several assets, some of which are now commonly classified as `meme stocks', while others have gone
The short squeeze of Gamestop (GME) has revealed to the world how retail investors pooling through social media can severely impact financial markets. In this paper, we devise an early warning signal to detect suspicious users' social network activity, which might affect the financial market stability. We apply our approach to the subreddit r/WallStreetBets, selecting two meme stocks (GME and AMC) and two non-meme stocks (AAPL and MSFT) as case studies. The alert system is structured in two stpng; the first one is based on extraordinary activity on the social network, while the second aims at identifying whether the movement seeks to coordinate the users to a bulk action. We run an event study analysis to see the reaction of the financial markets when the alert system catches social network turmoil. A regression analysis witnesses the discrepancy between the meme and non-meme stocks in how the social networks might affect the trend on the financial market.
Understanding collective decision making at a large-scale, and elucidating how community organization and community dynamics shape collective behavior are at the heart of social science research. In this work we study the behavior of thousands of communities with millions of active members. We define a novel task: predicting which community will undertake an unexpected, large-scale, distributed campaign. To this end, we develop a hybrid model, combining textual cues, community meta-data, and structural properties. We show how this multi-faceted model can accurately predict large-scale collective decision-making in a distributed environment. We demonstrate the applicability of our model through Reddit's r/place - a large-scale online experiment in which millions of users, self-organized in thousands of communities, clashed and collaborated in an effort to realize their agenda. Our hybrid model achieves a high F1 prediction score of 0.826. We find that coarse meta-features are as important for prediction accuracy as fine-grained textual cues, while explicit structural features play a smaller role. Interpreting our model, we provide and support various social insights about the unique
Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. Yet, it is known that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and POLITICS sub-reddits. First, we empirically demonstrate that LPLM can display average performance drops of about 88% (in the best case!) when predicting the popularity of future posts from sub-reddits whose topic distribution changes with time. We then introduce a simple methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks. Our models display performance drops of only about 40% in the worst cases (2% in the best ones) when predicting the popularity of future posts, while using only about 7% of the total number of parameters of LPLM and providing interpretable representations that offer insight into real-world events, like th
Who actually expresses an intent to buy GameStop shares on Reddit? What convinces people to buy stocks? Are people convinced to support a coordinated plan to adversely impact Wall Street investors? Existing literature on understanding intent has mainly relied on surveys and self reporting; however there are limitations to these methodologies. Hence, in this paper, we develop an annotated dataset of communications centered on the GameStop phenomenon to analyze the subscriber intentions behaviors within the r/WallStreetBets community to buy (or not buy) stocks. Likewise, we curate a dataset to better understand how intent interacts with a user's general support towards the coordinated actions of the community for GameStop. Overall, our dataset can provide insight to social scientists on the persuasive power to buy into social movements online by adopting common language and narrative. WARNING: This paper contains offensive language that commonly appears on Reddit's r/WallStreetBets subreddit.
NASA’s Psyche spacecraft is about to pull off a dramatic close flyby of Mars, skimming just 2,800 miles above the planet to get a powerful gravitational boost on its journey to the mysterious metal-rich asteroid Psyche。 The maneuver will save propellant while giving mission scientists a rare chance to test and calibrate the spacecraft’s instruments
Google has revealed its vision for the AI laptop of tomorrow
A new quantum physics study reveals that simply changing a magnetic field over time can unlock entirely new forms of matter that don’t exist under normal conditions。 By carefully “driving” materials with timed magnetic shifts, researchers created exotic quantum states that could be far more stable and resistant to errors—one of the biggest challeng
A new quantum-inspired algorithm has cracked a problem so massive that conventional supercomputers struggle to even approach it。 Researchers used the method to simulate extraordinarily complex quantum materials known as quasicrystals, opening the door to powerful new quantum devices and ultra-efficient electronics。 The work could help scientists de
NASA’s Curiosity rover had an unexpectedly stubborn Mars souvenir after drilling into a rock nicknamed “Atacama” — the entire chunk ripped loose from the ground and stayed stuck to the rover’s drill。 Engineers watched as Curiosity shook, vibrated, tilted, and spun the drill over several days in an effort to free the rock, while cameras captured the
Scientists are using sunlight to turn plastic waste into clean fuels like hydrogen, offering a breakthrough solution to both pollution and energy challenges。 While still in development, the approach could transform trash into a valuable resource for a low-carbon future
A team at the University of Hong Kong has developed a new “super steel” that can survive the harsh conditions needed to make green hydrogen from seawater。 The material uses an unexpected double-protection mechanism that resists corrosion far better than conventional stainless steel。 Even more impressive, it could replace costly titanium parts used