It is now commonplace for organizations to pay developers to work on specific open source software (OSS) projects to pursue their business goals. Such paid developers work alongside voluntary contributors, but given the different motivations of these two groups of developers, conflict may arise, which may pose a threat to a project's sustainability. This paper presents an empirical study of paid developers and volunteers in Rust, a popular open source programming language project. Rust is a particularly interesting case given considerable concerns about corporate participation. We compare volunteers and paid developers through contribution characteristics and long-term participation, and solicit volunteers' perceptions on paid developers. We find that core paid developers tend to contribute more frequently; commits contributed by one-time paid developers have bigger sizes; peripheral paid developers implement more features; and being paid plays a positive role in becoming a long-term contributor. We also find that volunteers do have some prejudices against paid developers. This study suggests that the dichotomous view of paid vs. volunteer developers is too simplistic and that furt
Prior spending shutoff experiments in search advertising have found that paid ads cannibalize organic traffic. But it is unclear whether the same is true for other high volume advertising channels like mobile display advertising. We therefore analyzed a large-scale spending shutoff experiment by a US-based mobile game developer, GameSpace. Contrary to previous findings, we found that paid advertising boosts organic installs rather than cannibalizing them. Specifically, every $100 spent on ads is associated with 37 paid and 3 organic installs. The complementarity between paid ads and organic installs is corroborated by evidence of temporal and cross-platform spillover effects: ad spending today is associated with additional paid and organic installs tomorrow and impressions on one platform lead to clicks on other platforms. Our findings demonstrate that mobile app install advertising is about 7.5% more effective than indicated by paid install metrics alone due to spillover effects, suggesting that mobile app developers are under-investing in marketing.
This paper develops a model to evaluate the viability of blockchain markets as the sole venue for price formation. Blockchains clear at discrete intervals called block time, and transactions are executed sequentially according to priority fees paid by traders who compete for queue position. We show that these features undermine the viability of markets. Paid-priority ordering induces endogenous selection, where only traders with sufficiently high valuations participate. The participation cutoff rises with competition, which intensifies with lower information costs or higher liquidity demand. This hinders price discovery and biases prices. It also impairs liquidity: the cutoff concentrates trading among aggressive traders and increases adverse selection that liquidity suppliers absorb in a single clearing round. Although longer block times enhance consensus security, they amplify these effects and can cause markets to shut down.
Climate discourse online plays a crucial role in shaping public understanding of climate change and influencing political and policy outcomes. However, climate communication unfolds across structurally distinct platforms with fundamentally different incentive structures: paid advertising ecosystems incentivize targeted, strategic persuasion, while public social media platforms host largely organic, user-driven discourse. Existing computational studies typically analyze these environments in isolation, limiting our ability to distinguish institutional messaging from public expression. In this work, we present a comparative analysis of climate discourse across paid advertisements on Meta (previously known as Facebook) and public posts on Bluesky from July 2024 to September 2025. We introduce an interpretable, end-to-end thematic discovery and assignment framework that clusters texts by semantic similarity and leverages large language models (LLMs) to generate concise, human-interpretable theme labels. We evaluate the quality of the induced themes against traditional topic modeling baselines using both human judgments and an LLM-based evaluator, and further validate their semantic coh
A reader browsing through an online article is highly likely to encounter an advertorial, often without realizing it. Advertorials represent a relatively new marketing strategy where advertisements are deliberately designed to resemble the style and tone of editorial content. Despite their appearance, they are, in fact, paid content intended to promote a product, brand, or service. Studies indicate that advertorials are significantly more effective (81%) and less intrusive than traditional banner ads or pop-ups. Despite ongoing regulatory efforts to ensure clear disclosure of paid content, concerns persist about the deceptive nature of advertorials. Advertorials can mislead readers into believing that they are consuming unbiased editorial content. In doing so, they gain undeserved legitimacy, by draping themselves in the credibility of the publication's design; not to inform or inspire genuine interest, but to deceive. In this study, we conduct the first large-scale and systematic study of advertorials. We propose a novel automated methodology for detecting advertorials in the wild, and we collect 185K ad URLs over a period of 5 months. We investigate the prevalence of problematic
Continual Test-Time Adaptation (CTTA) aims to online adapt a pre-trained model to changing environments during inference. Most existing methods focus on exploiting target data, while overlooking another crucial source of information, the pre-trained weights, which encode underutilized domain-invariant priors. This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure. We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation. Based on this insight, we propose PAID (Pairwise Angular-Invariant Decomposition), a prior-driven CTTA method that decomposes weight into magnitude and direction, and introduces a learnable orthogonal matrix via Householder reflections to globally rotate direction while preserving the pairwise angular structure. During adaptation, only the magnitudes and the orthogonal matrices are updated. PAID achieves consistent improvements over recent SOTA methods on four widely used CTTA benchmarks, demonst
The Android market is a place where developers offer paid and-or free apps to users. Free apps are interesting to users because they can try them immediately without incurring a monetary cost. However, free apps often have limited features and-or contain ads when compared to their paid counterparts. Thus, users may eventually need to pay to get additional features and-or remove ads. While paid apps have clear market values, their ads-supported versions are not entirely free because ads have an impact on performance. In this paper, first, we perform an exploratory study about ads-supported and paid apps to understand their differences in terms of implementation and development process. We analyze 40 Android apps and we observe that (i) ads-supported apps are preferred by users although paid apps have a better rating, (ii) developers do not usually offer a paid app without a corresponding free version, (iii) ads-supported apps usually have more releases and are released more often than their corresponding paid versions, (iv) there is no a clear strategy about the way developers set prices of paid apps, (v) paid apps do not usually include more functionalities than their corresponding
This study presents estimates of the global expenditure on article processing charges (APCs) paid to six publishers for open access between 2019 and 2023. APCs are fees charged for publishing in some fully open access journals (gold) and in subscription journals to make individual articles open access (hybrid). There is currently no way to systematically track institutional, national or global expenses for open access publishing due to a lack of transparency in APC prices, what articles they are paid for, or who pays them. We therefore curated and used an open dataset of annual APC list prices from Elsevier, Frontiers, MDPI, PLOS, Springer Nature, and Wiley in combination with the number of open access articles from these publishers indexed by OpenAlex to estimate that, globally, a total of \$8.349 billion (\$8.968 billion in 2023 US dollars) were spent on APCs between 2019 and 2023. We estimate that in 2023 MDPI (\$681.6 million), Elsevier (\$582.8 million) and Springer Nature (\$546.6) generated the most revenue with APCs. After adjusting for inflation, we also show that annual spending almost tripled from \$910.3 million in 2019 to \$2.538 billion in 2023, that hybrid exceed gol
We study an online linear regression setting in which the observed feature vectors are corrupted by noise and the learner can pay to reduce the noise level. In practice, this may happen for several reasons: for example, because features can be measured more accurately using more expensive equipment, or because data providers can be incentivized to release less private features. Assuming feature vectors are drawn i.i.d. from a fixed but unknown distribution, we measure the learner's regret against the linear predictor minimizing a notion of loss that combines the prediction error and payment. When the mapping between payments and noise covariance is known, we prove that the rate $\sqrt{T}$ is optimal for regret if logarithmic factors are ignored. When the noise covariance is unknown, we show that the optimal regret rate becomes of order $T^{2/3}$ (ignoring log factors). Our analysis leverages matrix martingale concentration, showing that the empirical loss uniformly converges to the expected one for all payments and linear predictors.
We initiate a systematic study to help distinguish a special group of online users, called hidden paid posters, or termed "Internet water army" in China, from the legitimate ones. On the Internet, the paid posters represent a new type of online job opportunity. They get paid for posting comments and new threads or articles on different online communities and websites for some hidden purposes, e.g., to influence the opinion of other people towards certain social events or business markets. Though an interesting strategy in business marketing, paid posters may create a significant negative effect on the online communities, since the information from paid posters is usually not trustworthy. When two competitive companies hire paid posters to post fake news or negative comments about each other, normal online users may feel overwhelmed and find it difficult to put any trust in the information they acquire from the Internet. In this paper, we thoroughly investigate the behavioral pattern of online paid posters based on real-world trace data. We design and validate a new detection mechanism, using both non-semantic analysis and semantic analysis, to identify potential online paid posters
This note examines the distributional implications of introducing a fast-track queue for accessing a service when agents are heterogeneous in both income and service valuation. Relative to a single free queue, I show that willingness to adopt the priority system is determined solely by income, regardless of service valuation. High-income individuals benefit from the fast-track access, while low-income individuals are worse off and remain in the free line. Middle-income individuals weakly prefer the single free queue; yet, under the priority regime, they pay for fast-track access. Thus, the use of the priority queue does not reveal preferences for the priority system.
We study the problem of linear contextual bandits with paid observations, where at each round the learner selects an action in order to minimize its loss in a given context, and can then decide to pay a fixed cost to observe the loss of any arm. Building on the Follow-the-Regularized-Leader framework with efficient estimators via Matrix Geometric Resampling, we introduce a computationally efficient Best-of-Both-Worlds (BOBW) algorithm for this problem. We show that it achieves the minimax-optimal regret of $Θ(T^{2/3})$ in adversarial settings, while guaranteeing poly-logarithmic regret in (corrupted) stochastic regimes. Our approach builds on the framework from \cite{BOBWhardproblems} to design BOBW algorithms for ``hard problem'', using analysis techniques tailored for the setting that we consider.
In the context of paid research studies and clinical trials, budget considerations often require patient sampling from available populations which comes with inherent constraints. We introduce the R package CDsampling, which is the first to our knowledge to integrate optimal design theories within the framework of constrained sampling. This package offers the possibility to find both D-optimal approximate and exact allocations for samplings with or without constraints. Additionally, it provides functions to find constrained uniform sampling as a robust sampling strategy when the model information is limited. To demonstrate its efficacy, we provide simulated examples and a real-data example with datasets embedded in the package and compare them with classical sampling methods. Furthermore, the package revisits the theoretical results of the Fisher information matrix for generalized linear models (including regular linear regression model) and multinomial logistic models, offering functions for its computation.
We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz "productivity" function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $\mathcal{O}(K^2(\log T)\sqrt{T})$ where $K$ is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order $T^{2/3}$ one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data
Accurate tree detection is of growing importance in applications such as urban planning, forest inventory, and environmental monitoring. In this article, we present an approach to creating tree maps by annotating them in 3D point clouds. Point cloud representations allow the precise identification of tree positions, particularly stem locations, and their heights. Our method leverages human computational power through paid crowdsourcing, employing a web tool designed to enable even non-experts to effectively tackle the task. The primary focus of this paper is to discuss the web tool's development and strategies to ensure high-quality tree annotations despite encountering noise in the crowdsourced data. Following our methodology, we achieve quality measures surpassing 90% for various challenging test sets of diverse complexities. We emphasize that our tree map creation process, including initial point cloud collection, can be completed within 1-2 days.
Collaborative machine learning (CML) provides a promising paradigm for democratizing advanced technologies by enabling cost-sharing among participants. However, the potential for rent-seeking behaviors among parties can undermine such collaborations. Contract theory presents a viable solution by rewarding participants with models of varying accuracy based on their contributions. However, unlike monetary compensation, using models as rewards introduces unique challenges, particularly due to the stochastic nature of these rewards when contribution costs are privately held information. This paper formalizes the optimal contracting problem within CML and proposes a transformation that simplifies the non-convex optimization problem into one that can be solved through convex optimization algorithms. We conduct a detailed analysis of the properties that an optimal contract must satisfy when models serve as the rewards, and we explore the potential benefits and welfare implications of these contract-driven CML schemes through numerical experiments.
Open source development contains contributions from both hired and volunteer software developers. Identification of this status is important when we consider the transferability of research results to the closed source software industry, as they include no volunteer developers. While many studies have taken the employment status of developers into account, this information is often gathered manually due to the lack of accurate automatic methods. In this paper, we present an initial step towards predicting paid and unpaid open source development using machine learning and compare our results with automatic techniques used in prior work. By relying on code source repository meta-data from Mozilla, and manually collected employment status, we built a dataset of the most active developers, both volunteer and hired by Mozilla. We define a set of metrics based on developers' usual commit time pattern and use different classification methods (logistic regression, classification tree, and random forest). The results show that our proposed method identify paid and unpaid commits with an AUC of 0.75 using random forest, which is higher than the AUC of 0.64 obtained with the best of the previ
We investigate how users perceive social media account verification, how those perceptions compare to platform practices, and what happens when a gap emerges. We use recent changes in Twitter's verification process as a natural experiment, where the meaning and types of verification indicators rapidly and significantly shift. The project consists of two components: a user survey and a measurement of verified Twitter accounts. In the survey study, we ask a demographically representative sample of U.S. respondents (n = 299) about social media account verification requirements both in general and for particular platforms. We also ask about experiences with online information sources and digital literacy. More than half of respondents misunderstand Twitter's criteria for blue check account verification, and over 80% of respondents misunderstand Twitter's new gold and gray check verification indicators. Our analysis of survey responses suggests that people who are older or have lower digital literacy may be modestly more likely to misunderstand Twitter verification. In the measurement study, we randomly sample 15 million English language tweets from October 2022. We obtain account verif
Sequential development of a new product or technology, or natural resource exploration, often progresses through ordered stages with uncertain rewards and requires costly (ex ante) planning to make future stages accessible. We model this process as an ordered Pandora's box problem where a decision-maker first chooses an initial scope, paying a cost that rises with the number of stages made accessible, and may later expand the scope at a marginal adjustment cost. Since the paid planning costs are sunk, the continuation values depend on the state variable ``paid scope''. We prove existence and uniqueness of scope-dependent reservation values, characterize the optimal search strategy as a threshold rule indexed by paid scope, and derive comparative statics. Interactions among three economic forces shape the optimal behavior -- a guarantee effect (a higher current best offer reduces the expected improvement from the next stage and induces earlier stopping), a paid-scope effect (a larger prepaid scope lowers the marginal cost of future access, raises the continuation value, and supports continuation at higher guarantees), and a remaining-horizon effect (fewer stages remaining shrink the
Internet users have suffered collateral damage in tussles over paid peering between large ISPs and large content providers. In order to qualify for settlement-free peering, large Internet Service Providers (ISPs) require that peers meet certain requirements. However, the academic literature has not yet shown the relationship between these settlement-free peering requirements and the value to each interconnecting network. We first consider the effect of paid peering on broadband prices. We adopt a two-sided market model in which an ISP maximizes profit by setting broadband prices and a paid peering price. Our result shows that paid peering fees reduce the premium plan price, and increase the video streaming price and the total price for premium tier customers who subscribe to video streaming services. We next consider the effect of paid peering on consumer surplus. We find that consumer surplus is a uni-modal function of the paid peering fee. The peering price depends critically on the incremental ISP cost per video streaming subscriber; at different costs, it can be negative, zero, or positive. Last, we construct a network cost model. We show that the traffic-sensitive network cost