Fluorescence collection from individual emitters plays a key role in state detection and remote entanglement generation, fundamental functionalities in many quantum platforms. Planar photonics have been demonstrated for robust and scalable addressing of trapped-ion systems, motivating consideration of similar elements for the complementary challenge of photon collection. Here, using an argument from the reciprocity principle, we show that far-field photon collection efficiency can be simply expressed in terms of the fields associated with the collection optic at the emitter position alone. We calculate collection efficiencies into ideal paraxial and fully vectorial focused Gaussian modes parameterized in terms of focal waist, and further quantify the modest enhancements possible with more general beam profiles, establishing design requirements for efficient collection. Towards practical implementation, we design, fabricate, and characterize two diffractive collection elements operating at $λ=397$ nm; a forward emitting design is predicted to offer 0.25% collection efficiency into a single waveguide mode, while a more efficient reverse-emitting design offers $1.14\%$ collection effi
Recently, there has been interest in representing single graphs by multiple drawings; for example, using graph stories, storyplans, or uncrossed collections. In this paper, we apply this idea to orthogonal graph drawing. Due to the orthogonal drawing style, we focus on 4-graphs, that is, graphs of maximum degree 4. We restrict ourselves to plane graphs, that is, planar graphs whose embedding is fixed. Our goal is to represent any plane 4-graph $G$ by an unbent collection, that is, a collection of orthogonal drawings of $G$ that adhere to the embedding of $G$ and ensure that each edge of $G$ is drawn without bends in at least one of the drawings. We investigate two objectives. First, we consider minimizing the number of drawings in an unbent collection. We prove that every plane 4-graph can be represented by a collection with at most three drawings, which is tight. We also give necessary and sufficient conditions for a graph to admit an unbent collection of size $2$. Second, we consider minimizing the total number of bends over all drawings in an unbent collection. We show that this problem is NP-hard and give a 3-approximation algorithm. For the special case of plane triconnected c
We present a collection of parallel corpora of 12 sign languages in video format, together with subtitles in the dominant spoken languages of the corresponding countries. The entire collection includes more than 1,300 hours in 4,381 video files, accompanied by 1,3~M subtitles containing 14~M tokens. Most notably, it includes the first consistent parallel corpora for 8 Latin American sign languages, whereas the size of the German Sign Language corpora is ten times the size of the previously available corpora. The collection was created by collecting and processing videos of multiple sign languages from various online sources, mainly broadcast material of news shows, governmental bodies and educational channels. The preparation involved several stages, including data collection, informing the content creators and seeking usage approvals, scraping, and cropping. The paper provides statistics on the collection and an overview of the methods used to collect the data.
Let $\mathbf{M}$ be the basic set theory that consists of the axioms of extensionality, emptyset, pair, union, powerset, infinity, transitive containment, $Δ_0$-separation and set foundation. This paper studies the relative strength of set theories obtained by adding fragments of the set-theoretic collection scheme to $\mathbf{M}$. We focus on two common parameterisations of collection: $Π_n$-collection, which is the usual collection scheme restricted to $Π_n$-formulae, and strong $Π_n$-collection, which is equivalent to $Π_n$-collection plus $Σ_{n+1}$-separation. The main result of this paper shows that for all $n \geq 1$, (1) $\mathbf{M}+Π_{n+1}\textrm{-collection}+Σ_{n+2}\textrm{-induction on } ω$ proves the consistency of Zermelo Set Theory plus $Π_{n}$-collection, (2) the theory $\mathbf{M}+Π_{n+1}\textrm{-collection}$ is $Π_{n+3}$-conservative over the theory $\mathbf{M}+\textrm{strong }Π_n \textrm{-collection}$. It is also shown that (2) holds for $n=0$ when the Axiom of Choice is included in the base theory. The final section indicates how the proofs of (1) and (2) can be modified to obtain analogues of these results for theories obtained by adding fragments of collection t
Large archival collections, such as email or government documents, must be manually reviewed to identify any sensitive information before the collection can be released publicly. Sensitivity classification has received a lot of attention in the literature. However, more recently, there has been increasing interest in developing sensitivity-aware search engines that can provide users with relevant search results, while ensuring that no sensitive documents are returned to the user. Sensitivity-aware search would mitigate the need for a manual sensitivity review prior to collections being made available publicly. To develop such systems, there is a need for test collections that contain relevance assessments for a set of information needs as well as ground-truth labels for a variety of sensitivity categories. The well-known Enron email collection contains a classification ground-truth that can be used to represent sensitive information, e.g., the Purely Personal and Personal but in Professional Context categories can be used to represent sensitive personal information. However, the existing Enron collection does not contain a set of information needs and relevance assessments. In this
We study model theoretic characterizations of various collection schemes over $\mathbf{PA}^-$ from the viewpoint of Gaifman's splitting theorem. Among other things, we prove that for any $n \geq 0$ and $M \models \mathbf{PA}^-$, the following are equivalent: 1. $M$ satisfies the collection scheme for $Σ_{n+1}$ formulas. 2. For any $K, N \models \mathbf{PA}^-$, if $M \subseteq_{\mathrm{cof}} K$, $M \prec_{Δ_0} K$ and $M \prec N$, then $M \prec_{Σ_{n+2}} K$ and $\sup_N(M) \prec_{Σ_n} N$. 3. For any $N \models \mathbf{PA}^-$, if $M \prec N$, then $M \prec_{Σ_{n+2}} \sup_N(M) \prec_{Σ_{n}} N$. Here, $\sup_N(M)$ is the unique $K$ satisfying $M \subseteq_{\mathrm{cof}} K \subseteq_{\mathrm{end}} N$. We also investigate strong collection schemes and parameter-free collection schemes from the similar perspective.
Given an action of a finite group on a triangulated category with a suitable strong exceptional collection, a construction of Elagin produces an associated strong exceptional collection on the equivariant category. We prove that the endomorphism algebra of the induced exceptional collection is the basic reduction of the skew group algebra of the endomorphism algebra of the original exceptional collection.
In this paper, we introduce the following new concept in graph drawing. Our task is to find a small collection of drawings such that they all together satisfy some property that is useful for graph visualization. We propose investigating a property where each edge is not crossed in at least one drawing in the collection. We call such collection uncrossed. This property is motivated by a quintessential problem of the crossing number, where one asks for a drawing where the number of edge crossings is minimum. Indeed, if we are allowed to visualize only one drawing, then the one which minimizes the number of crossings is probably the neatest for the first orientation. However, a collection of drawings where each highlights a different aspect of a graph without any crossings could shed even more light on the graph's structure. We propose two definitions. First, the uncrossed number, minimizes the number of graph drawings in a collection, satisfying the uncrossed property. Second, the uncrossed crossing number, minimizes the total number of crossings in the collection that satisfy the uncrossed property. For both definitions, we establish initial results. We prove that the uncrossed cro
Imitation learning from human demonstrations has become a dominant approach for training autonomous robot policies. However, collecting demonstration datasets is costly: it often requires access to robots and needs sustained effort in a tedious, long process. These factors limit the scale of data available for training policies. We aim to address this scalability challenge by involving a broader audience in a gamified data collection experience that is both accessible and motivating. Specifically, we develop a gamified remote teleoperation platform, RoboCade, to engage general users in collecting data that is beneficial for downstream policy training. To do this, we embed gamification strategies into the design of the system interface and data collection tasks. In the system interface, we include components such as visual feedback, sound effects, goal visualizations, progress bars, leaderboards, and badges. We additionally propose principles for constructing gamified tasks that have overlapping structure with useful downstream target tasks. We instantiate RoboCade on three manipulation tasks -- including spatial arrangement, scanning, and insertion. To illustrate the viability of g
In this note we study the uniqueness problem for collections of pennies and marbles. More generally, consider a collection of unit $d$-spheres that may touch but not overlap. Given the existence of such a collection, one may analyse the contact graph of the collection. In particular we consider the uniqueness of the collection arising from the contact graph. Using the language of graph rigidity theory, we prove a precise characterisation of uniqueness (global rigidity) in dimensions 2 and 3 when the contact graph is additionally chordal. We then illustrate a wide range of examples in these cases. That is, we illustrate collections of marbles and pennies that can be perturbed continuously (flexible), are locally unique (rigid) and are unique (globally rigid). We also contrast these examples with the usual generic setting of graph rigidity.
The rapid advancement of workflows and methods for software engineering using AI emphasizes the need for a systematic evaluation and analysis of their ability to leverage information from entire projects, particularly in large code bases. In this challenge on optimization of context collection for code completion, organized by JetBrains in collaboration with Mistral AI as part of the ASE 2025 conference, participants developed efficient mechanisms for collecting context from source code repositories to improve fill-in-the-middle code completions for Python and Kotlin. We constructed a large dataset of real-world code in these two programming languages using permissively licensed open-source projects. The submissions were evaluated based on their ability to maximize completion quality for multiple state-of-the-art neural models using the chrF metric. During the public phase of the competition, nineteen teams submitted solutions to the Python track and eight teams submitted solutions to the Kotlin track. In the private phase, six teams competed, of which five submitted papers to the workshop.
Stellar rotation, a key factor influencing stellar structure and evolution, also drives magnetic activity, which is manifested as spots or flares on stellar surface. Here, we present a collection of 18 443 rotating variables located toward the Galactic bulge, identified in the photometric database of the Optical Gravitational Lensing Experiment (OGLE) project. These stars exhibit distinct magnetic activity, including starspots and flares. With this collection, we provide long-term, time-series photometry in Cousins I- and Johnson V-band obtained by OGLE since 1997, and basic observational parameters, i.e., equatorial coordinates, rotation periods, mean brightness, and brightness amplitudes in both bands. This is a unique dataset for studying stellar magnetic activity, including activity cycles.
Most state-of-the-art image retrieval and recommendation systems predominantly focus on individual images. In contrast, socially curated image collections, condensing distinctive yet coherent images into one set, are largely overlooked by the research communities. In this paper, we aim to design a novel recommendation system that can provide users with image collections relevant to individual personal preferences and interests. To this end, two key issues need to be addressed, i.e., image collection modeling and similarity measurement. For image collection modeling, we consider each image collection as a whole in a group sparse reconstruction framework and extract concise collection descriptors given the pretrained dictionaries. We then consider image collection recommendation as a dynamic similarity measurement problem in response to user's clicked image set, and employ a metric learner to measure the similarity between the image collection and the clicked image set. As there is no previous work directly comparable to this study, we implement several competitive baselines and related methods for comparison. The evaluations on a large scale Pinterest data set have validated the eff
Research community evaluations in information retrieval, such as NIST's Text REtrieval Conference (TREC), build reusable test collections by pooling document rankings submitted by many teams. Naturally, the quality of the resulting test collection thus greatly depends on the number of participating teams and the quality of their submitted runs. In this work, we investigate: i) how the number of participants, coupled with other factors, affects the quality of a test collection; and ii) whether the quality of a test collection can be inferred prior to collecting relevance judgments from human assessors. Experiments conducted on six TREC collections illustrate how the number of teams interacts with various other factors to influence the resulting quality of test collections. We also show that the reusability of a test collection can be predicted with high accuracy when the same document collection is used for successive years in an evaluation campaign, as is common in TREC.
We present a collection recommender system that can automatically create and recommend collections of items at a user level. Unlike regular recommender systems, which output top-N relevant items, a collection recommender system outputs collections of items such that the items in the collections are relevant to a user, and the items within a collection follow a specific theme. Our system builds on top of the user-item representations learnt by item recommender systems. We employ dimensionality reduction and clustering techniques along with intuitive heuristics to create collections with their ratings and titles. We test these ideas in a real-world setting of music recommendation, within a popular music streaming service. We find that there is a 2.3x increase in recommendation-driven consumption when recommending collections over items. Further, it results in effective utilization of real estate and leads to recommending a more and diverse set of items. To our knowledge, these are first of its kind experiments at such a large scale.
Answering a question of Kaye, we show that the compositional truth theory with a full collection scheme is conservative over Peano Arithmetic. We demonstrate it by showing that countable models of compositional truth which satisfy the internal induction or collection axioms can be end-extended to models of the respective theory.
Let $G$ be a simple graph with vertex set $V(G)$. A subset $S$ of $V(G)$ is independent if no two vertices from $S$ are adjacent. The graph $G$ is known to be a Konig-Egervary (KE in short) graph if $α(G) + μ(G)= |V(G)|$, where $α(G)$ denotes the size of a maximum independent set and $μ(G)$ is the cardinality of a maximum matching. Let $Ω(G)$ denote the family of all maximum independent sets. A collection $F$ of sets is an hke collection if $|\bigcup Γ|+|\bigcap Γ|=2α$ holds for every subcollection $Γ$ of $F$. We characterize an hke collection and invoke new characterizations of a KE graph. We prove the existence and uniqueness of a graph $G$ such that $Ω(G)$ is a maximal hke collection. It is a bipartite graph. As a result, we solve a problem of Jarden, Levit and Mandrescu \cite{jlm}, proving that $F$ is an hke collection if and only if it is a subset of $Ω(G)$ for some graph $G$ and $|\bigcup F|+|\bigcap F|=2α(F)$. Finally, we show that the maximal cardinality of an hke collection $F$ with $α(F)=α$ and $|\bigcup F|=n$ is $2^{n-α}$.
The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understanding their origin, process of development, and ethical considerations. However, data collection for AI is still typically a one-off practice, and oftentimes datasets collected for a certain purpose or application are reused for a different problem. Additionally, dataset annotations may not be representative over time, contain ambiguous or erroneous annotations, or be unable to generalize across issues or domains. Recent research has shown these practices might lead to unfair, biased, or inaccurate outcomes. We argue that data collection for AI should be performed in a responsible manner where the quality of the data is thoroughly scrutinized and measured through a systematic set of appropriate metrics. In this paper, we propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics for an iterative
Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets. Additionally, this formulation generalizes to tasks requiring multiple data sources, such as labeled and unlabeled data used in semi-supervised learning. To solve our problem, we develop Learn-Optimize-Collect (LOC), which minimizes expected future collection costs. Finally, we numerically compare our framework to the conventional baseline of estimating data requirements by extrapolating from neural scaling laws. We significantly reduce the risks of failing to meet desired performance targets on several classification, segmentation, and detection tasks, while maintaining low total collection costs.
We introduce Private Collection Matching (PCM) problems, in which a client aims to determine whether a collection of sets owned by a server matches their interests. Existing privacy-preserving cryptographic primitives cannot solve PCM problems efficiently without harming privacy. We propose a modular framework that enables designers to build privacy-preserving PCM systems that output one bit: whether a collection of server sets matches the client's set. The communication cost of our protocols scales linearly with the size of the client's set and is independent of the number of server elements. We demonstrate the potential of our framework by designing and implementing novel solutions for two real-world PCM problems: determining whether a dataset has chemical compounds of interest, and determining whether a document collection has relevant documents. Our evaluation shows that we offer a privacy gain with respect to existing works at a reasonable communication and computation cost.