The fundamental trade-off between privacy and utility remains an active area of research. Our contribution is motivated by two observations. First, privacy mechanisms developed for one-time data release cannot straightforwardly be extended to sequential releases. Second, practical databases are likely to be useful to multiple distinct parties. Furthermore, we can not rule out the possibility of data sharing between parties. With utility in mind, we formulate a privacy-utility trade-off problem to adaptively tackle sequential data requests made by different, potentially colluding entities. We consider both expected distortion and mutual information as measures to quantify utility, and use mutual information to measure privacy. We assume an attack model whereby illicit data sharing, which we call collusion, can occur between data receivers. We develop an adaptive algorithm for data releases that makes use of a modified Blahut-Arimoto algorithm. We show that the resulting data releases are optimal when expected distortion quantifies utility, and locally optimal when mutual information quantifies utility. Finally, we discuss how our findings may extend to applications in machine learni
We examine how textual features in earnings press releases predict stock returns on earnings announcement days. Using over 138,000 press releases from 2005 to 2023, we compare traditional bag-of-words and BERT-based embeddings. We find that press release content (soft information) is as informative as earnings surprise (hard information), with FinBERT yielding the highest predictive power. Combining models enhances explanatory strength and interpretability of the content of press releases. Stock prices fully reflect the content of press releases at market open. If press releases are leaked, it offers predictive advantage. Topic analysis reveals self-serving bias in managerial narratives. Our framework supports real-time return prediction through the integration of online learning, provides interpretability and reveals the nuanced role of language in price formation.
Differential Privacy (DP) has emerged as a robust framework for privacy-preserving data releases and has been successfully applied in high-profile cases, such as the 2020 US Census. However, in organizational settings, the use of DP remains largely confined to isolated data releases. This approach restricts the potential of DP to serve as a framework for comprehensive privacy risk management at an organizational level. Although one might expect that the cumulative privacy risk of isolated releases could be assessed using DP's compositional property, in practice, individual DP guarantees are frequently tailored to specific releases, making it difficult to reason about their interaction or combined impact. At the same time, less tailored DP guarantees, which compose more easily, also offer only limited insight because they lead to excessively large privacy budgets that convey limited meaning. To address these limitations, we present DPolicy, a system designed to manage cumulative privacy risks across multiple data releases using DP. Unlike traditional approaches that treat each release in isolation or rely on a single (global) DP guarantee, our system employs a flexible framework tha
Public scientific and metrology releases can leak the hidden settings that produced them. We formalize and quantify this risk as a profiled statistical side-channel audit: a release map exposes finite-band statistics of a power spectral density (PSD), a profiled observer trains labeled template spectra under an explicit budget, and a challenge release is drawn from one of two utility-equivalent recipes separated by a protected coordinate. Averaged PSD bins follow a gamma channel, replaced by a covariance-weighted log-spectrum channel when the bins are correlated; this yields exact Kullback-Leibler divergences, Chernoff exponents, protected-bit advantage bounds, and finite-training, finite-library, finite-compute, and model-mismatch corrections. Our headline result is a finite-band transport-leakage law: after amplitude and blur are eliminated, the protected acid-transport information obeys $I_{λ|α,β}(K) = (64/1225)\, w λ^{6} K^{9} + O(w λ^{8} K^{11})$ for $Kλ\ll 1$, a ninth-order exponent with a closed-form safe band. A step-by-step protocol turns a measured release into these numbers, and a fixed-seed reproducibility package regenerates every table and figure. We instantiate the a
This paper describes the second and third data releases (DR2 and DR3, respectively) from the ongoing United Kingdom Infrared Telescope (UKIRT) Hemisphere Survey (UHS). DR2 is primarily comprised of the $K$-band portion of the UHS survey, and was released to the public on June 1, 2023. DR3 mainly includes the $H$-band portion of the survey, with a public release scheduled for September 2025. The $H$- and $K$-band data releases complement the previous $J$-band data release (DR1) from 2018. The survey covers approximately 12,700 square degrees between declinations of 0 degrees and $+$60 degrees and achieves median 5$σ$ point source sensitivities of 19.0 mag and 18.0 mag (Vega) for $H$ and $K$, respectively. The data releases include images and source catalogs which include $\sim$581 million $H$-band detections and $\sim$461 million $K$-band detections. DR2 and DR3 also include merged catalogs, created by combining $J$- and $K$-band detections (DR2) and $J$-, $H$-, and $K$-band detections (DR3). The DR2 merged catalog has a total of $\sim$513 million sources, while the DR3 merged catalog contains $\sim$560 million sources.
Market and user characteristics of mobile apps make their release management different from proprietary software products and web services. Despite the wealth of information regarding users' feedback on an app, an in-depth analysis of app releases is difficult due to the inconsistency and uncertainty of the information. To better understand and potentially improve app release processes, we analyze major, minor, and patch releases for releases following semantic versioning. In particular, we were interested in finding out the difference between marketed and not-marketed releases. Our results show that, in general, major, minor, and patch releases have significant differences in the release cycle duration, nature, and change velocity. We also observed that there is a significant difference between marketed and non-marketed mobile app releases in terms of cycle duration, nature and the extent of changes, and the number of opened and closed issues.
Prior research primarily examined differentially-private continual releases against data streams, where entries were immutable after insertion. However, most data is dynamic and housed in databases. Addressing this literature gap, this article presents a methodology for achieving differential privacy for continual releases in dynamic databases, where entries can be inserted, modified, and deleted. A dynamic database is represented as a changelog, allowing the application of differential privacy techniques for data streams to dynamic databases. To ensure differential privacy in continual releases, this article demonstrates the necessity of constraints on mutations in dynamic databases and proposes two common constraints. Additionally, it explores the differential privacy of two fundamental types of continual releases: Disjoint Continual Releases (DCR) and Sliding-window Continual Releases (SWCR). The article also highlights how DCR and SWCR can benefit from a hierarchical algorithm for better privacy budget utilization. Furthermore, it reveals that the changelog representation can be extended to dynamic entries, achieving local differential privacy for continual releases. Lastly, th
In modern software ecosystems, dependency management plays a critical role in ensuring secure and maintainable applications. However, understanding the relationship between release practices and their impact on vulnerabilities and update cycles remains a challenge. In this study, we analyze the release histories of 10,000 Maven artifacts, covering over 203,000 releases and 1.7 million dependencies. We evaluate how release speed affects software security and lifecycle. Our results show an inverse relationship between release speed and dependency outdatedness. Artifacts with more frequent releases maintain significantly shorter outdated times. We also find that faster release cycles are linked to fewer CVEs in dependency chains, indicating a strong negative correlation. These findings emphasize the importance of accelerated release strategies in reducing security risks and ensuring timely updates. Our research provides valuable insights for software developers, maintainers, and ecosystem managers.
Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as healthcare, finance, and mobility. While prior research has focused on protecting privacy in a single data release, many real-world systems operate under sequential or continuous data publishing, where the same or related data are released over time. Such sequential disclosures introduce new vulnerabilities, as temporal correlations across releases may enable adversaries to infer sensitive information that remains hidden in any individual release. In this paper, we investigate whether an attacker can compromise privacy in sequential data releases by exploiting dependencies between consecutive publications, even when each individual release satisfies standard privacy guarantees. To this end, we propose a novel attack model that captures these sequential dependencies by integrating a Hidden Markov Model with a reinforcement learning-based bi-directional inference mechanism. This enables the attacker to leverage both earlier and later observations in the sequence to infer private information. We ins
This study focuses on optimizing species release $S_2$ to control species population $S_1$ through impulsive release strategies. We investigate the conditions required to remove species $S_1$, which is equivalent to the establishment of $S_2$. The research includes a theoretical analysis that examines the positivity, existence, and uniqueness of solutions, the conditions ensuring global stability, and a sufficient condition for controlling the $S_1$-free solution. In addition, we formulate an optimal control problem to maximize the effectiveness of $S_2$ releases, manage the population of $S_1$, and minimize the costs associated with this intervention strategy. Numerical simulations are conducted to validate the proposed theories and allow visualization of population dynamics under various release scenarios.
This paper presents our recent initiatives to foster the discoverability of new releases on the music streaming service Deezer. After introducing our search and recommendation features dedicated to new releases, we outline our shift from editorial to personalized release suggestions using cold start embeddings and contextual bandits. Backed by online experiments, we discuss the advantages of this shift in terms of recommendation quality and exposure of new releases on the service.
Press releases about scholarly news are brief statements provided in advance to the press, including a description of the most relevant findings of one or more accepted scientific publications, usually under the condition that journalists will adhere to an embargo until the publication date. The existence of centralized platforms such as EurekAlert! allows press releases to be disseminated online as independent news articles. Press releases can include additional material (e.g., interviews, commentaries, explanatory tables, figures, media, recommended readings), which turn them into online objects with analytical value of their own. The objective of this work is to illustrate how press releases can be quantitatively analyzed applying similar tools and approaches as those applied in scientometric research (SCI). To achieve this goal, a scientometric inspired analytical framework is proposed based on the formulation of spaces of interaction of objects, actors, and impacts. As such, the framework proposed considers press releases as science communication (SCO) objects, produced by different SCO actors (e.g., journalists), and the subject of receiving impact (e.g., tweets, links). To c
The first observations of the cosmic microwave background (CMB) from NASA's \emph{Wilkinson Microwave Anisotropy Probe} (WMAP) led to finding `alignment' anomalies not expected from fluctuations in the isotropic cosmological model. We study the data of all 8 full-sky public releases since then to test for anomalous alignments and shapes of the first 60 multipoles, i.e., over the range $2\leq l \leq 61$. We use rotationally invariant and covariant statistics to test isotropy of all subsequent WMAP data releases, along with those from the ESA's \emph{Planck} mission. Anomalous alignments among the multipoles $l=1, 2, 3$ are very consistent and robust. More alignments are detected, some of them new, while significance is diluted by the large range of the search. Power entropy, a measure of the randomness of the multipoles, is consistently anomalous at about $2σ$ level or better across all data releases. It appears that the CMB is not as random as the cosmological principle predicts on large angular scales
We investigated the intermittent energy-releasing processes by analyzing the long-period pulsations during a C2.8 flare on 2023 June 03. The C2.8 flare shows three successive and repetitive pulsations in soft X-ray (SXR) and high-temperature extreme ultraviolet (EUV) emissions, which may imply three episodes of energy releases during the solar flare. The QPP period is estimated to be as long as about 7.5 minutes. EUV imaging observations suggest that these three pulsations come from the same flare area dominated by the hot loop system. Conversely, the flare radiation in wavelengths of radio/microwave, low-temperature EUV, ultraviolet (UV), and Ha only reveals the first pulsation, which may be associated with nonthermal electrons accelerated by magnetic reconnection. The other two pulsations in wavelengths of SXR and high-temperature EUV might be caused by the loop-loop interaction. Our observations indicate that the three episodes of energy releases during the C2.8 flare are triggered by different mechanisms, namely the accelerated electron via magnetic reconnection, and the loop-loop interaction in a complicated magnetic configuration.
Cargo, the software packaging manager of Rust, provides a yank mechanism to support release-level deprecation, which can prevent packages from depending on yanked releases. Most prior studies focused on code-level (i.e., deprecated APIs) and package-level deprecation (i.e., deprecated packages). However, few studies have focused on release-level deprecation. In this study, we investigate how often and how the yank mechanism is used, the rationales behind its usage, and the adoption of yanked releases in the Cargo ecosystem. Our study shows that 9.6% of the packages in Cargo have at least one yanked release, and the proportion of yanked releases kept increasing from 2014 to 2020. Package owners yank releases for other reasons than withdrawing a defective release, such as fixing a release that does not follow semantic versioning or indicating a package is removed or replaced. In addition, we found that 46% of the packages directly adopted at least one yanked release and the yanked releases propagated through the dependency network, which leads to 1.4% of the releases in the ecosystem having unresolved dependencies.
The proliferation of open Pre-trained Language Models (PTLMs) on model registry platforms like Hugging Face (HF) presents both opportunities and challenges for companies building products around them. Similar to traditional software dependencies, PTLMs continue to evolve after a release. However, the current state of release practices of PTLMs on model registry platforms are plagued by a variety of inconsistencies, such as ambiguous naming conventions and inaccessible model training documentation. Given the knowledge gap on current PTLM release practices, our empirical study uses a mixed-methods approach to analyze the releases of 52,227 PTLMs on the most well-known model registry, HF. Our results reveal 148 different naming practices for PTLM releases, with 40.87% of changes to model weight files not represented in the adopted name-based versioning practice or their documentation. In addition, we identified that the 52,227 PTLMs are derived from only 299 different base models (the modified original models used to create 52,227 PTLMs), with Fine-tuning and Quantization being the most prevalent modification methods applied to these base models. Significant gaps in release transparen
Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the used package releases provide an indication of their maturity. Packages with a 0.y.z version number are commonly assumed to be under initial development, suggesting that they are likely to be less stable, and depending on them may be considered as less healthy. In this paper, we empirically study, for four open source package distributions (Cargo, npm, Packagist and RubyGems) to which extent 0.y.z package releases and >=1.0.0 package releases behave differently. We quantify the prevalence of 0.y.z releases, we explore how long packages remain in the initial development stage, we compare the update frequency of 0.y.z and >=1.0.0 package releases, we study how often 0.y.z releases are required by other packages, we assess whether semantic versioning is respected for dependencies towards them, and we compare some characteristics of 0.y.
Koufogiannis et al. (2016) showed a $\textit{gradual release}$ result for Laplace noise-based differentially private mechanisms: given an $\varepsilon$-DP release, a new release with privacy parameter $\varepsilon' > \varepsilon$ can be computed such that the combined privacy loss of both releases is at most $\varepsilon'$ and the distribution of the latter is the same as a single release with parameter $\varepsilon'$. They also showed gradual release techniques for Gaussian noise, later also explored by Whitehouse et al. (2022). In this paper, we consider a more general $\textit{multiple release}$ setting in which analysts hold private releases with different privacy parameters corresponding to different access/trust levels. These releases are determined one by one, with privacy parameters in arbitrary order. A multiple release is $\textit{lossless}$ if having access to a subset $S$ of the releases has the same privacy guarantee as the least private release in $S$, and each release has the same distribution as a single release with the same privacy parameter. Our main result is that lossless multiple release is possible for a large class of additive noise mechanisms. For the Ga
Dependency graphs show where released code can flow, while leaving implicit whether the public path used to publish a release changed. We introduce a predecessor-aware release-authority record that compares each package release with its immediate predecessor across publisher, repository, workflow, provenance, signing, and mediation evidence. We instantiate the record over a purposefully sampled, audited April 2024--June 2026 cohort from npm, PyPI, Maven Central, crates.io, and RubyGems: 45,812 releases, 43,100 eligible predecessor comparisons, and 942 package coordinates. Go is reported separately as a VCS/proxy/checksum-log boundary adapter. Transparent rules identify 204 policy-triggering public release-path discontinuities. The exact trigger policy is the primary candidate queue. A uniform semantic-distance rule selects 320 releases and covers 190/204 triggers; a descriptive regime-specific rule selects 337 releases and covers all 204. In a blinded 60-row shared core, three practitioners rated 20/30 triggers as immediate review, 9/30 as monitoring, 1/30 as no review, and all 30 controls as no review. These signals are review cues over public release-path evidence. Exact maliciou
Official court press releases from Germany's highest courts present and explain judicial rulings to the public, as well as to expert audiences. Prior NLP efforts emphasize technical headnotes, ignoring citizen-oriented communication needs. We introduce CourtPressGER, a 6.4k dataset of triples: rulings, human-drafted press releases, and synthetic prompts for LLMs to generate comparable releases. This benchmark trains and evaluates LLMs in generating accurate, readable summaries from long judicial texts. We benchmark small and large LLMs using reference-based metrics, factual-consistency checks, LLM-as-judge, and expert ranking. Large LLMs produce high-quality drafts with minimal hierarchical performance loss; smaller models require hierarchical setups for long judgments. Initial benchmarks show varying model performance, with human-drafted releases ranking highest.