Information about millions of people is collected for behavioural targeting, a type of marketing that involves tracking people's online behaviour for targeted advertising. It is hotly debated whether data protection law applies to behavioural targeting. Many behavioural targeting companies say that, as long as they do not tie names to data they hold about individuals, they do not process any personal data, and that, therefore, data protection law does not apply to them. European Data Protection Authorities, however, take the view that a company processes personal data if it uses data to single out a person, even if it cannot tie a name to these data. This paper argues that data protection law should indeed apply to behavioural targeting. Companies can often tie a name to nameless data about individuals. Furthermore, behavioural targeting relies on collecting information about individuals, singling out individuals, and targeting ads to individuals. Many privacy risks remain, regardless of whether companies tie a name to the information they hold about a person. A name is merely one of the identifiers that can be tied to data about a person, and it is not even the most practical iden
Digital advertising platforms and publishers sell ad inventory that conveys targeting information, such as demographic, contextual, or behavioral audience segments, to advertisers. While revealing this information improves ad relevance, it can reduce competition and lower auction revenues. To resolve this trade-off, this paper develops a general auction mechanism -- the Information-Bundling Position Auction (IBPA) mechanism -- that leverages the targeting information to maximize publisher revenue across both search and display advertising environments. The proposed mechanism treats the ad inventory type as the publisher's private information and allocates impressions by comparing advertisers' marginal revenues. We show that IBPA resolves the trade-off between targeting precision and market thickness: publisher revenue is increasing in information granularity and decreasing in disclosure granularity. Moreover, IBPA dominates the generalized second-price (GSP) auction for any distribution of advertiser valuations and under any information or disclosure regime. We also characterize computationally efficient approximations that preserve these guarantees. Using auction-level data from a
In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53,000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with baseline approaches to targeting. First, we target based on a causal forest that estimates heterogeneous treatment effects and then assigns students to treatment according to those estimated to have the highest treatment effects. Next, we evaluate two alternative targeting policies, one targeting students with low predicted probability of renewing financial aid in the absence of the treatment, the other targeting those with high probability. The predicted baseline outcome is not the ideal criterion for targeting, nor is it a priori clear whether to prioritize low, high, or intermediate predicted probability. Nonetheless, targeting on low baseline outcomes is common in practice, for example because the relationship between individual characteristics and treatment effects is often difficult
Developments in machine learning and big data allow firms to fully personalize and target their marketing mix. However, data and privacy regulations, such as those in the European Union (GDPR), incorporate a "right to explanation," which is fulfilled when targeting policies are comprehensible to customers. This paper provides a framework for firms to navigate right-to-explanation legislation. First, I construct a class of comprehensible targeting policies that is represented by a sentence. Second, I show how to optimize over this class of policies to find the profit-maximizing comprehensible policy. I further demonstrate that it is optimal to estimate the comprehensible policy directly from the data, rather than projecting down the black box policy into a comprehensible policy. Third, I find the optimal black box targeting policy and compare it to the optimal comprehensible policy. I then empirically apply my framework using data from a price promotion field experiment from a durable goods retailer. I quantify the cost of explanation, which I define as the difference in expected profits between the optimal black box and comprehensible targeting policies. Compared to the black box b
Detailed targeting of advertisements has long been one of the core offerings of online platforms. Unfortunately, malicious advertisers have frequently abused such targeting features, with results that range from violating civil rights laws to driving division, polarization, and even social unrest. Platforms have often attempted to mitigate this behavior by removing targeting attributes deemed problematic, such as inferred political leaning, religion, or ethnicity. In this work, we examine the effectiveness of these mitigations by collecting data from political ads placed on Facebook in the lead up to the 2022 U.S. midterm elections. We show that major political advertisers circumvented these mitigations by targeting proxy attributes: seemingly innocuous targeting criteria that closely correspond to political and racial divides in American society. We introduce novel methods for directly measuring the skew of various targeting criteria to quantify their effectiveness as proxies, and then examine the scale at which those attributes are used. Our findings have crucial implications for the ongoing discussion on the regulation of political advertising and emphasize the urgency for incre
A key challenge for targeted antipoverty programs in developing countries is that policymakers must rely on estimated rather than observed income, which leads to substantial targeting errors. The policy problem is not only to predict income, but to decide how noisy income estimates should be translated into feasible transfers. I formulate this as a statistical decision problem in which a policymaker chooses transfers to minimize a poverty-targeting loss subject to a fixed budget and a no-taxation constraint. I show that the standard plug-in rule, which treats estimated incomes as true, is inadmissible. I develop a nonparametric empirical Bayes targeting rule that assigns transfers using posterior distributions of true poverty gaps. Although the budget and no-taxation constraints make the targeting rule nonsmooth, Bayes regret is governed by the accuracy of the posterior functionals that determine the oracle allocation. In simulations using household survey data from nine African countries, the empirical Bayes rule reaches substantially more poor households and systematically improves poverty reduction relative to plug-in OLS and machine-learning benchmarks.
Major advertising platforms recently increased privacy protections by limiting advertisers' access to individual-level data. Instead of providing access to granular raw data, the platforms only allow a limited number of aggregate queries to a dataset, which is further protected by adding differentially private noise. This paper studies whether and how advertisers can design effective targeting policies within these restrictive privacy preserving data environments. To achieve this, I develop a probabilistic machine learning method based on Bayesian optimization, which facilitates dynamic data exploration. Since Bayesian optimization was designed to sample points from a function to find its maximum, it is not applicable to aggregate queries and to targeting. Therefore, I introduce two innovations: (i) integral updating of posteriors which allows to select the best regions of the data to query rather than individual points and (ii) a targeting-aware acquisition function that dynamically selects the most informative regions for the targeting task. I identify the conditions of the dataset and privacy environment that necessitate the use of such a "smart" querying strategy. I apply the s
This PhD thesis discusses how European law could improve privacy protection in the area of behavioural targeting. Behavioural targeting, also referred to as online profiling, involves monitoring people's online behaviour, and using the collected information to show people individually targeted advertisements. To protect privacy in the area of behavioural targeting, the EU lawmaker mainly relies on the consent requirement for the use of tracking technologies in the e-Privacy Directive, and on general data protection law. With informed consent requirements, the law aims to empower people to make choices in their best interests. But behavioural studies cast doubt on the effectiveness of the empowerment approach as a privacy protection measure. Many people click "I agree" to any statement that is presented to them. Therefore, to mitigate privacy problems such as chilling effects, this study argues for a combined approach of protecting and empowering the individual. Compared to the current approach, the lawmaker should focus more on protecting people. The PhD thesis is a legal study, but it also incorporates insights from other disciplines, such as computer science, behavioural economic
User targeting, the process of selecting targeted users from a pool of candidates for non-expert marketers, has garnered substantial attention with the advancements in digital marketing. However, existing user targeting methods encounter two significant challenges: (i) Poor cross-domain and cross-scenario transferability and generalization, and (ii) Insufficient forecastability in real-world applications. These limitations hinder their applicability across diverse industrial scenarios. In this work, we propose FOUND, an industrial-grade, transferable, and forecastable user targeting foundation model. To enhance cross-domain transferability, our framework integrates heterogeneous multi-scenario user data, aligning them with one-sentence targeting demand inputs through contrastive pre-training. For improved forecastability, the text description of each user is derived based on anticipated future behaviors, while user representations are constructed from historical information. Experimental results demonstrate that our approach significantly outperforms existing baselines in cross-domain, real-world user targeting scenarios, showcasing the superior capabilities of FOUND. Moreover, our
Modern treatment targeting methods often rely on estimating a conditional average treatment effect (CATE) using machine learning tools. While effective in identifying who benefits from treatment on the individual level, these approaches typically overlook system-level dynamics that may arise when treatments induce strain on shared capacity. We study the problem of targeting in Markovian systems, where treatment decisions must be made one at a time as units arrive, and early decisions can impact later outcomes through delayed or limited access to resources. We show that optimal policies in such settings compare CATE-like quantities to state-specific thresholds, where each threshold reflects the expected cumulative impact on the system of treating an additional individual in the given state. We propose an algorithm that augments standard CATE estimation with state-level value iteration to estimate these thresholds from observational data. Theoretical results establish consistency and convergence guarantees, and empirical studies demonstrate that our method improves long-run outcomes considerably relative to individual-level CATE targeting rules and generic offline reinforcement learn
Selective targeting of membranes with a specific receptor profile is an ongoing challenge in targeted drug delivery. We investigate the adsorption of copolymers on a multicomponent receptor-covered surface using grand-canonical Monte Carlo simulations and demonstrate that polymers can be designed to target a particular receptor density profile. To achieve this, the ligand profile on the polymers should match the targeted receptor profile, and the ligand--receptor affinity should be inversely proportional to the ligand profile. While the same can be obtained using multivalent nanoparticles, the entropic effects due to polymer conformations significantly enhance the binding selectivity of multivalent polymers compared to nanoparticles. Surprisingly, the ligand distribution on the polymer plays a crucial role, whereas the persistence length does not. The optimal selectivity to the overall receptor concentration is obtained by the Poisson distribution of ligands (random copolymer), whereas the maximal selectivity to a specific receptor profile is obtained by a defined sequence of grouped alternating ligands (regular copolymer). Interestingly, the regular copolymer can become anti-selec
Behavioral targeting, or online profiling, is a hotly debated topic. Much of the collection of personal information on the Internet is related to behavioral targeting, although research suggests that most people don't want to receive behaviorally targeted advertising. The World Wide Web Consortium is discussing a Do Not Track standard, and regulators worldwide are struggling to come up with answers. This article discusses European law and recent policy developments on behavioral targeting.
This paper introduces a marketing decision framework that optimizes customer targeting by integrating heterogeneous treatment effect estimation with explicit business guardrails. The objective is to maximize revenue and retention while adhering to constraints such as budget, revenue protection, and customer experience. The framework first estimates Conditional Average Treatment Effects (CATE) using uplift learners, then solves a constrained allocation problem to decide whom to target and which offer to deploy. It supports decisions in retention messaging, event rewards, and spend-threshold assignment. Validated through offline simulations and online A/B tests, the approach consistently outperforms propensity and static baselines, offering a reusable playbook for causal targeting at scale.
Many researchers and organizations, such as WHO and UNICEF, have raised awareness of the dangers of advertisements targeted at children. While most existing laws only regulate ads on television that may reach children, lawmakers have been working on extending regulations to online advertising and, for example, forbid (e.g., the DSA) or restrict (e.g., the COPPA) advertising based on profiling to children. At first sight, ad platforms such as Google seem to protect children by not allowing advertisers to target their ads to users who are less than 18 years old. However, this paper shows that other targeting features can be exploited to reach children. For example, on YouTube, advertisers can target their ads to users watching a particular video through placement-based targeting, a form of contextual targeting. Hence, advertisers can target children by placing their ads in children-focused videos. Through a series of ad experiments, we show that placement-based targeting is possible on children-focused videos and enables marketing to children. In addition, our ad experiments show that advertisers can use targeting based on profiling (e.g., interest, location, behavior) in combination
This paper introduces a method for detecting inappropriately targeting language in online conversations by integrating crowd and expert annotations with ChatGPT. We focus on English conversation threads from Reddit, examining comments that target individuals or groups. Our approach involves a comprehensive annotation framework that labels a diverse data set for various target categories and specific target words within the conversational context. We perform a comparative analysis of annotations from human experts, crowd annotators, and ChatGPT, revealing strengths and limitations of each method in recognizing both explicit hate speech and subtler discriminatory language. Our findings highlight the significant role of contextual factors in identifying hate speech and uncover new categories of targeting, such as social belief and body image. We also address the challenges and subjective judgments involved in annotation and the limitations of ChatGPT in grasping nuanced language. This study provides insights for improving automated content moderation strategies to enhance online safety and inclusivity.
With the growing popularity of various mobile devices, user targeting has received a growing amount of attention, which aims at effectively and efficiently locating target users that are interested in specific services. Most pioneering works for user targeting tasks commonly perform similarity-based expansion with a few active users as seeds, suffering from the following major issues: the unavailability of seed users for newcoming services and the unfriendliness of black-box procedures towards marketers. In this paper, we design an Entity Graph Learning (EGL) system to provide explainable user targeting ability meanwhile applicable to addressing the cold-start issue. EGL System follows the hybrid online-offline architecture to satisfy the requirements of scalability and timeliness. Specifically, in the offline stage, the system focuses on the heavyweight entity graph construction and user entity preference learning, in which we propose a Three-stage Relation Mining Procedure (TRMP), breaking loose from the expensive seed users. At the online stage, the system offers the ability of user targeting in real-time based on the entity graph from the offline stage. Since the user targeting
Machine learning is increasingly used to select which individuals receive limited-resource interventions in domains such as human services, education, development, and more. However, it is often not apparent what the right quantity is for models to predict. Policymakers rarely have access to data from a randomized controlled trial (RCT) that would enable accurate estimates of which individuals would benefit more from the intervention, while observational data creates a substantial risk of bias in treatment effect estimates. Practitioners instead commonly use a technique termed ``risk-based targeting" where the model is just used to predict each individual's status quo outcome (an easier, non-causal task). Those with higher predicted risk are offered treatment. There is currently almost no empirical evidence to inform which choices lead to the most effective machine learning-informed targeting strategies in social domains. In this work, we use data from 5 real-world RCTs in a variety of domains to empirically assess such choices. We find that when treatment effects can be estimated with high accuracy (which we simulate by allowing the model to partially observe outcomes in advance),
Proxy means testing (PMT) and community-based targeting (CBT) are two of the leading methods for targeting social assistance in developing countries. In this paper, we present a hybrid targeting method that incorporates CBT's emphasis on local information and preferences with PMT's reliance on verifiable indicators. Specifically, we outline a Bayesian framework for targeting that resembles PMT in that beneficiary selection is based on a weighted sum of sociodemographic characteristics. We nevertheless propose calibrating the weights to preference rankings from community targeting exercises, implying that the weights used by our method reflect how potential beneficiaries themselves substitute sociodemographic features when making targeting decisions. We discuss several practical extensions to the model, including a generalization to multiple rankings per community, an adjustment for elite capture, a method for incorporating auxiliary information on potential beneficiaries, and a dynamic updating procedure. We further provide an empirical illustration using data from Burkina Faso and Indonesia.
Advertising, long the financial mainstay of the web ecosystem, has become nearly ubiquitous in the world of mobile apps. While ad targeting on the web is fairly well understood, mobile ad targeting is much less studied. In this paper, we use empirical methods to collect a database of over 225,000 ads on 32 simulated devices hosting one of three distinct user profiles. We then analyze how the ads are targeted by correlating ads to potential targeting profiles using Bayes' rule and Pearson's chi squared test. This enables us to measure the prevalence of different forms of targeting. We find that nearly all ads show the effects of application- and time-based targeting, while we are able to identify location-based targeting in 43% of the ads and user-based targeting in 39%.
This study provides a formal analysis of the customer targeting problem when the cost for a marketing action depends on the customer response and proposes a framework to estimate the decision variables for campaign profit optimization. Targeting a customer is profitable if the impact and associated profit of the marketing treatment are higher than its cost. Despite the growing literature on uplift models to identify the strongest treatment-responders, no research has investigated optimal targeting when the costs of the treatment are unknown at the time of the targeting decision. Stochastic costs are ubiquitous in direct marketing and customer retention campaigns because marketing incentives are conditioned on a positive customer response. This study makes two contributions to the literature, which are evaluated on an e-commerce coupon targeting campaign. First, we formally analyze the targeting decision problem under response-dependent costs. Profit-optimal targeting requires an estimate of the treatment effect on the customer and an estimate of the customer response probability under treatment. The empirical results demonstrate that the consideration of treatment cost substantiall