Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves
Open effective field theories provide a systematic framework for describing systems coupled to an environment, where dissipation, noise, and modified conservation laws naturally arise. Working within the Schwinger-Keldysh formalism, we examine open extensions of three well-studied theories: the superfluid, Maxwell theory, and Einstein gravity. In gauge and gravitational theories, open terms that break advanced symmetries while preserving physical ones are not automatically consistent; they are allowed only if they lead to deformed identities among the equations of motion. We explicitly construct such a term in open gravity and show that it leads to a consistent deformation of the diffeomorphism identities.
Gaia mission offers opportunities to search for compact binaries not involved in binary interactions (hereafter inert compact binaries), and results in the discoveries of binaries containing one black hole (BH) or one neutron star (NS), called "Gaia BHs" and "Gaia NSs", respectively. We have assessed if Gaia BHs and NSs can be formed in open clusters through dynamical interactions. In order to obtain a large number of inert compact binaries similar to Gaia BHs and NSs, we have performed gravitational $N$-body simulations for a large number of open clusters whose total mass is $1.2 \times 10^8 M_\odot$. These clusters have various masses, metallicities, densities, and binary fractions. We have found that open clusters form Gaia BHs ($10^{-6}$-$10^{-5} M_\odot^{-1}$) much more efficiently than Gaia NSs ($\lesssim 10^{-7} M_\odot^{-1}$) for any cluster parameters. This is quite inconsistent with observational results, because the reported numbers of Gaia BHs and NSs are $3$ and $21$, respectively. Additionally, we have switched off NS natal kicks for $10^4$ open clusters each weighing $10^3 M_\odot$ in order to retain a large number of NSs in open clusters. Then, open clusters form in
We introduce open-sci-ref, a family of dense transformer models trained as research baselines across multiple model (0.13B to 1.7B parameters) and token scales (up to 1T) on 8 recent open reference datasets. Evaluating the models on various standardized benchmarks, our training runs set establishes reference points that enable researchers to assess the sanity and quality of alternative training approaches across scales and datasets. Intermediate checkpoints allow comparison and studying of the training dynamics. The established reference baselines allow training procedures to be compared through their scaling trends, aligning them on a common compute axis. Comparison of open reference datasets reveals that training on NemoTron-CC HQ consistently outperforms other reference datasets, followed by DCLM-baseline and FineWeb-Edu. In addition to intermediate training checkpoints, the release includes logs, code, and downstream evaluations to simplify reproduction, standardize comparison, and facilitate future research.
Fully open multimodal large language models (MLLMs) currently lag behind proprietary counterparts, primarily due to a significant gap in data quality for supervised fine-tuning (SFT). Existing open-source datasets are often plagued by widespread noise and a critical deficit in complex reasoning data, such as Chain-of-Thought (CoT), which hinders the development of advanced model capabilities. Addressing these challenges, our work makes three primary contributions. First, we introduce Honey-Data-15M, a new SFT dataset comprising approximately 15 million QA pairs, processed through multiple cleaning techniques and enhanced with a novel dual-level (short and long) CoT enrichment strategy. Second, we introduce HoneyPipe, the data curation pipeline, and its underlying framework DataStudio, providing the community with a transparent and adaptable methodology for data curation that moves beyond static dataset releases. Finally, to validate our dataset and pipeline, we train Bee-8B, an 8B model on Honey-Data-15M. Experiments show that Bee-8B establishes a new state-of-the-art (SOTA) for fully open MLLMs, achieving performance that is competitive with, and in some cases surpasses, recent se
This paper reviews research literature on Diamond Open Access (DOA) journals - sometimes also called Platinum Open Access - that was produced after this journal segment started to become a priority in European research policy around 2020. It contextualizes the current science policy debate, critically examines different understandings of DOA, and reviews studies on the role of such journals in scholarly communication. Most existing research consists of quantitative studies focusing on aspects such as the number of DOA journals, their publication output, the diversity of the landscape in terms of subject areas, languages, publishing entities, indexing in major databases, awareness and perception among scholars, cost analyses, as well as insights into the internal operations of DOA journals. The review shows that research on DOA journals is partly influenced by the science policy discourse in at least two ways: first, through the normativity inherent in that discourse, and second, through the temporality of policy-driven research of practical relevance, which leaves important aspects of the phenomenon understudied. Moreover, research on the DOA journal landscape has implications beyo
Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polym
This paper examines the state of Open Data in Latvia at the middle of 2014. The study is divided into two parts: (i) a survey of open data situation and (ii) an overview of available open data sets. The first part examines the general open data climate in Latvia according to the guidelines of the OKFN Open Data Index making the results comparable to those of other participants of this index. The second part examines datasets made available on the Latvia Open Data community catalogue, the only open data catalogue available in Latvia at the moment. We conclude that Latvia public sector open data mostly fulfil the basic criteria (e.g., data is available) of the Open Data Index but fail on more advanced criteria: the majority of data considered in the study are not published in machine-readable form, are not available for bulk download and none of the data sources have open license statements.
In this paper, we prove that the open and closed strings are $O(D,D)$ equivalent. The equivalence requires an AdS geometry near the boundaries. The $O(D,D)$ invariance is introduced into the Polyakov action by the Tseytlin's action. Traditionally, there exist disconnected open-open or closed-closed configurations in the solution space of the Tseytlin's action. The open-closed configuration is ruled out by the mixed terms of the dual fields. We show that, under some very general guidances, the dual fields are consistently decoupled if and only if the near horizon geometry is $AdS_5$. We then have open-closed and closed-open configurations in different limits of the distances of the $D3$-brane pairs. Inherited from the definition of the theory, these four configurations are of course related to each other by $O(D,D)$ transformations. We therefore conclude that both the open/closed relation and open/closed duality can be derived from $O(D,D)$ symmetries. We then demonstrate the open/closed relation does connect commutative open and closed strings. By analyzing the couplings of the configurations, the low energy effective limits of our results consequently predicts the AdS/CFT correspo
In this work, I collect and discuss a series of open questions in one-dimensional geometric optimization in Euclidean spaces. The focus is on two classes of problems: maximal distance minimizers and Steiner trees. Maximal distance minimizers concern finding a connected set of minimal length whose closed $r$-neighborhood covers a given compact set, whereas Steiner trees aim to find a minimal-length set connecting a prescribed set of points. For both problems, I briefly summarize known results and highlight the remaining open questions. While some questions can be approached with elementary methods, others remain highly challenging.
Social coding platforms have revolutionized collaboration in software development, leading to using software bots for streamlining operations. However, The presence of open-source software (OSS) bots gives rise to problems including impersonation, spamming, bias, and security risks. Identifying bot accounts and behavior is a challenging task in the OSS project. This research aims to investigate bots' behavior in open-source software projects and identify bot accounts with maximum possible accuracy. Our team gathered a dataset of 19,779 accounts that meet standardized criteria to enable future research on bots in open-source projects. We follow a rigorous workflow to ensure that the data we collect is accurate, generalizable, scalable, and up-to-date. We've identified four types of bot accounts in open-source software projects by analyzing their behavior across 17 features in 5 dimensions. Our team created BotHawk, a highly effective model for detecting bots in open-source software projects. It outperforms other models, achieving an AUC of 0.947 and an F1-score of 0.89. BotHawk can detect a wider variety of bots, including CI/CD and scanning bots. Furthermore, we find that the numbe
This text is a short introduction to the physics of driven-dissipative many-body systems, focusing on a few selected topics. Beyond its more ``historical'' interest in the study of atomic physics and quantum optics, presently the modeling and studying dissipative phenomena in open quantum systems is pivotal to understanding quantum hardware platforms. While the lack of a thermodynamic potential for these out-of-equilibrium open systems makes it theoretically challenging to investigate their physics, at the same time it allows going beyond the thermodynamic paradigms and investigating new and exotic phenomena. We will focus on one of the simplest, yet most effective, descriptions of open quantum systems, namely the (Gorini-Kossakowski-Sudarshan-) Lindblad master equation. This phenomenological approach describes quantum systems that weakly interact with their surrounding environment. Although many of the results derived below will apply to any quantum system, we will focus in particular on bosonic/spin systems.
Deploying robots in open-ended unstructured environments such as homes has been a long-standing research problem. However, robots are often studied only in closed-off lab settings, and prior mobile manipulation work is restricted to pick-move-place, which is arguably just the tip of the iceberg in this area. In this paper, we introduce Open-World Mobile Manipulation System, a full-stack approach to tackle realistic articulated object operation, e.g. real-world doors, cabinets, drawers, and refrigerators in open-ended unstructured environments. The robot utilizes an adaptive learning framework to initially learns from a small set of data through behavior cloning, followed by learning from online practice on novel objects that fall outside the training distribution. We also develop a low-cost mobile manipulation hardware platform capable of safe and autonomous online adaptation in unstructured environments with a cost of around 20,000 USD. In our experiments we utilize 20 articulate objects across 4 buildings in the CMU campus. With less than an hour of online learning for each object, the system is able to increase success rate from 50% of BC pre-training to 95% using online adaptat
Innovation and standardization in 5G have brought advancements to every facet of the cellular architecture. This ranges from the introduction of new frequency bands and signaling technologies for the radio access network (RAN), to a core network underpinned by micro-services and network function virtualization (NFV). However, like any emerging technology, the pace of real-world deployments does not instantly match the pace of innovation. To address this discrepancy, one of the key aspects under continuous development is the RAN with the aim of making it more open, adaptive, functional, and easy to manage. In this paper, we highlight the transformative potential of embracing novel cellular architectures by transitioning from conventional systems to the progressive principles of Open RAN. This promises to make 6G networks more agile, cost-effective, energy-efficient, and resilient. It opens up a plethora of novel use cases, ranging from ubiquitous support for autonomous devices to cost-effective expansions in regions previously underserved. The principles of Open RAN encompass: (i) a disaggregated architecture with modular and standardized interfaces; (ii) cloudification, programmabi
Testing the aerodynamics of micro- and nano-UAVs without actually flying is highly challenging. To address this issue, we introduce Open Gimbal, a specially designed 3 Degrees of Freedom platform that caters to the unique requirements of micro- and nano-UAVs. This platform allows for unrestricted and free rotational motion, enabling comprehensive experimentation and evaluation of these UAVs. Our approach focuses on simplicity and accessibility. We developed an open-source, 3D printable electro-mechanical design that has minimal size and low complexity. This design facilitates easy replication and customization, making it widely accessible to researchers and developers. Addressing the challenges of sensing flight dynamics at a small scale, we have devised an integrated wireless batteryless sensor subsystem. Our innovative solution eliminates the need for complex wiring and instead uses wireless power transfer for sensor data reception. To validate the effectiveness of open gimbal, we thoroughly evaluate and test its communication link and sensing performance using a typical nano-quadrotor. Through comprehensive testing, we verify the reliability and accuracy of open gimbal in real-w
Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.
This chapter addresses emergent ethical issues in producing, using, curating, and providing services for open data. Our goal is to provide an introduction to how ethical topics in open data manifest in practical dilemmas for scholarly communications and some approaches to understanding and working through them. We begin with a brief overview of what can be thought of as three basic theories of ethics that intersect with dilemmas in openness, accountability, transparency, and fairness in data: Virtue, Consequential, and Non-consequential ethics. We then map these kinds of ethics to the practical questions that arise in provisioning infrastructures, providing services, and supporting sustainable research in science and scholarship that depends upon open access to data. Throughout, we attempt to offer concrete examples of potential ethical dilemmas facing scholarly communication with respect to open data, and try to make clear what kinds of ethical positions are helpful to practitioners. In doing so, we hope to both clarify the ethical questions facing librarians doing practical work to support open data access, as well as situate current debates in the field with respect to these thr
The fourth industrial revolution promotes the integration of Information Technology (IT) and strategic resources. New IT demands and uses have been leading to changes in business processes and corporate governance. Lately, the financial industry has adopted a new integrated banking model known as Open Banking (OB) and the advent of cryptocurrencies has led to the Digital Economy (DE) materialization. Considering these facts, this paper expects to point out through literature review some IT enabling factors that allow the conception of a new industry design (or governance) specifically in the financial industry illustrated by the cases of the Open Banking and Digital Economy. This paper is structured mostly on literature review, accompanied by results, discussions, and finally, conclusions are presented. It was found five potential enabling factors. Keywords: Digital Economy, Information Technology (IT), Open Banking.
Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered duri
In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the prelimin