共找到 20 条结果
http://diversityworkbench.net/Portal/DiversityImageInspector.
A drowning definition is available for use with National Syndromic Surveillance Program (NSSP) data. However, its accuracy in capturing drowning emergency department and urgent care visits at the regional level is unknown. We tested the ability of the syndromic surveillance (SS) definition in capturing unintentional and undetermined intent drowning (UUID) and describe UUID SS visit trends in a large metropolitan area. We applied the drowning definition to NSSP data from 2016 to 2022 for the 8-county metropolitan Houston region. We queried the dataset for UUID ICD-10-CM codes and manually reviewed the chief complaint (CC) and discharge diagnosis (DD) for SS visits. True-positives were calculated by dividing the number of UUID cases identified by UUID ICD-10-CM codes and CC/DD review by the total visits captured by the SS definition. Demographics and trends of UUID visits were calculated from 2018 to 2022 due to limited data from 2016 to 2017 in NSSP. 2,759 visits were captured by the SS definition. After case review, 2,019 (73.2%) had ICD-10-CM drowning codes of any intent; and 2,015 of those (99.8%) were classified as UUID. Of the remaining 740 cases with no ICD-10-CM codes that were pulled by the SS definition, 690 (93.2%) had a CC/DD diagnosis of drowning/submersion/underwater related to aquatic exposure. Taken together, 2,705 (98.0%) were true-positive UUID visits based on the SS drowning definition.. Children (aged < 18 years) constituted 79% of UUID visits. Black, White and Asian/Pacific Islander persons comprised 17%, 60% and 4% of UUID visits respectively. Rates of UUID visits were lowest in 2020. Syndromic surveillance is a novel and accurate method to conduct real-time drowning surveillance in a large metropolitan region.
The aim of this study is to apply a novel hybrid framework incorporating a Vision Transformer (ViT) and bidirectional long short-term memory (Bi-LSTM) model for classifying physical activity intensity (PAI) in adults using gravity-based acceleration. Additionally, it further investigates how PAI and temporal window (TW) impacts the model' s accuracy. This research used the Capture-24 dataset, consisting of raw accelerometer data from 151 participants aged 18 to 91. Gravity-based acceleration was utilised to generate images encoding various PAIs. These images were subsequently analysed using the ViT-BiLSTM model, with results presented in confusion matrices and compared with baseline models. The model's robustness was evaluated through temporal stability testing and examination of accuracy and loss curves. The ViT-BiLSTM model excelled in PAI classification task, achieving an overall accuracy of 98.5% ± 1.48% across five TWs-98.7% for 1s, 98.1% for 5s, 98.2% for 10s, 99% for 15s, and 98.65% for 30s of TW. The model consistently exhibited superior accuracy in predicting sedentary (98.9% ± 1%) compared to light physical activity (98.2% ± 2%) and moderate-to-vigorous physical activity (98.2% ± 3%). ANOVA showed no significant accuracy variation across PAIs (F = 2.18, p = 0.13) and TW (F = 0.52, p = 0.72). Accuracy and loss curves show the model consistently improves its performance across epochs, demonstrating its excellent robustness. This study demonstrates the ViT-BiLSTM model's efficacy in classifying PAI using gravity-based acceleration, with performance remaining consistent across diverse TWs and intensities. However, PAI and TW could result in slight variations in the model's performance. Future research should concern and investigate the impact of gravity-based acceleration on PAI thresholds, which may influence model's robustness and reliability.
The number of samples in high-throughput comparative "omics" studies is increasing rapidly due to declining experimental costs. To keep sample data and metadata manageable and to ensure the integrity of scientific results as the scale of these projects continues to increase, it is essential that we transition to better-designed sample identifiers. Ideally, sample identifiers should be globally unique across projects, project teams, and institutions; short (to facilitate manual transcription); correctable with respect to common types of transcription errors; opaque, meaning that they do not contain information about the samples; and compatible with existing standards. We present cual-id, a lightweight command line tool that creates, or mints, sample identifiers that meet these criteria without reliance on centralized infrastructure. cual-id allows users to assign universally unique identifiers, or UUIDs, that are globally unique to their samples. UUIDs are too long to be conveniently written on sampling materials, such as swabs or microcentrifuge tubes, however, so cual-id additionally generates human-friendly 4- to 12-character identifiers that map to their UUIDs and are unique within a project. By convention, we use "cual-id" to refer to the software, "CualID" to refer to the short, human-friendly identifiers, and "UUID" to refer to the globally unique identifiers. CualIDs are used by humans when they manually write or enter identifiers, while the longer UUIDs are used by computers to unambiguously reference a sample. Finally, cual-id optionally generates printable label sticker sheets containing Code 128 bar codes and CualIDs for labeling of sample collection and processing materials. IMPORTANCE The adoption of identifiers that are globally unique, correctable, and easily handwritten or manually entered into a computer will be a major step forward for sample tracking in comparative omics studies. As the fields transition to more-centralized sample management, for example, across labs within an institution, across projects funded under a common program, or in systems designed to facilitate meta- and/or integrated analysis, sample identifiers generated with cual-id will not need to change; thus, costly and error-prone updating of data and metadata identifiers will be avoided. Further, using cual-id will ensure that transcription errors in sample identifiers do not require the discarding of otherwise-useful samples that may have been expensive to obtain. Finally, cual-id is simple to install and use and is free for all use. No centralized infrastructure is required to ensure global uniqueness, so it is feasible for any lab to get started using these identifiers within their existing infrastructure.
Over the past few years, studies have increasingly focused on the development of mobile apps as complementary tools to existing traditional pharmacovigilance surveillance systems for improving and facilitating adverse drug reaction (ADR) reporting. In this research, we evaluated the potentiality of a new mobile app (vaxEffect@UniMiB) to perform longitudinal studies, while preserving the anonymity of the respondents. We applied the app to monitor the ADRs during the COVID-19 vaccination campaign in a sample of the Italian population. We administered vaxEffect@UniMiB to a convenience sample of academic subjects vaccinated at the Milano-Bicocca University hub for COVID-19 during the Italian national vaccination campaign. vaxEffect@UniMiB was developed for both Android and iOS devices. The mobile app asks users to send their medical history and, upon every vaccine administration, their vaccination data and the ADRs that occurred within 7 days postvaccination, making it possible to follow the ADR dynamics for each respondent. The app sends data over the web to an application server. The server, along with receiving all user data, saves the data in a SQL database server and reminds patients to submit vaccine and ADR data by push notifications sent to the mobile app through Firebase Cloud Messaging (FCM). On initial startup of the app, a unique user identifier (UUID) was generated for each respondent, so its anonymity was completely ensured, while enabling longitudinal studies. A total of 3712 people were vaccinated during the first vaccination wave. A total of 2733 (73.6%) respondents between the ages of 19 and 80 years, coming from the University of Milano-Bicocca (UniMiB) and the Politecnico of Milan (PoliMi), participated in the survey. Overall, we collected information about vaccination and ADRs to the first vaccine dose for 2226 subjects (60.0% of the first dose vaccinated), to the second dose for 1610 subjects (43.4% of the second dose vaccinated), and, in a nonsponsored fashion, to the third dose for 169 individuals (4.6%). vaxEffect@UniMiB was revealed to be the first attempt in performing longitudinal studies to monitor the same subject over time in terms of the reported ADRs after each vaccine administration, while guaranteeing complete anonymity of the subject. A series of aspects contributed to the positive involvement from people in using this app to report their ADRs to vaccination: ease of use, availability from multiple platforms, anonymity of all survey participants and protection of the submitted data, and the health care workers' support.
The application of mass spectrometry imaging (MS imaging) is rapidly growing with a constantly increasing number of different instrumental systems and software tools. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software. imzML data is divided in two files which are linked by a universally unique identifier (UUID). Experimental details are stored in an XML file which is based on the HUPO-PSI format mzML. Information is provided in the form of a 'controlled vocabulary' (CV) in order to unequivocally describe the parameters and to avoid redundancy in nomenclature. Mass spectral data are stored in a binary file in order to allow efficient storage. imzML is supported by a growing number of software tools. Users will be no longer limited to proprietary software, but are able to use the processing software best suited for a specific question or application. MS imaging data from different instruments can be converted to imzML and displayed with identical parameters in one software package for easier comparison. All technical details necessary to implement imzML and additional background information is available at www.imzml.org.
Nowadays, camera networks are part of our every-day life environments, consequently, they represent a massive source of information for monitoring human activities and to propose new services to the building users. To perform human activity monitoring, people must be detected and the analysis has to be done according to the information relative to the environment and the context. Available multi-camera datasets furnish videos with few (or none) information of the environment where the network was deployed. The proposed dataset provides multi-camera multi-space video sets along with the complete contextual information of the environment. The dataset regroups 11 video sets (composed of 62 single videos) recorded using 6 indoor cameras deployed on multiple spaces. The video sets represent more than 1 h of video footage, include 77 people tracks and captured different human actions such as walking around, standing/sitting, motionless, entering/leaving a space and group merging/splitting. Moreover, each video has been manually and automatically annotated to include people detection and tracking meta-information. The automatic people detection annotations were obtained by using different complexity and robustness detectors, from machine learning to state-of-art deep Convolutional Neural Network (CNN) models. Concerning the contextual information, the Industry Foundation Classes (IFC) file that represents the environment's Building Information Modeling (BIM) data is also provided. The BIM/IFC file describes the complete structure of the environment, it's topology and the elements contained in it. To our knowledge, the WiseNET dataset is the first to provide a set of videos along with the complete information of the environment. The WiseNET dataset is publicly available at https://doi.org/10.4121/uuid:c1fb5962-e939-4c51-bfd5-eac6f2935d44, as well as at the project's website http://wisenet.checksem.fr/#/dataset.
Modeling the extremes of mental/emotional conditions requires explicit accounts of evolutionary-developmental sources of human neurodiversity, not merely psychopathology. The target article's approach could be improved by incorporation of a hierarchical scheme wherein mental/emotional infrastructure interacts across differentiated layers of function. The notion of "symptom networks" thus calls for differentiation into hierarchically interacting components of mental/emotional evolution and development.
Proteomics methods, especially high-throughput mass spectrometry analysis have been continually developed and improved over the years. The analysis of complex biological samples produces large volumes of raw data. Data storage and recovery management pose substantial challenges to biomedical or proteomic facilities regarding backup and archiving concepts as well as hardware requirements. In this article we describe differences between the terms backup and archive with regard to manual and automatic approaches. We also introduce different storage concepts and technologies from transportable media to professional solutions such as redundant array of independent disks (RAID) systems, network attached storages (NAS) and storage area network (SAN). Moreover, we present a software solution, which we developed for the purpose of long-term preservation of large mass spectrometry raw data files on an object storage device (OSD) archiving system. Finally, advantages, disadvantages, and experiences from routine operations of the presented concepts and technologies are evaluated and discussed. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Preussin, a hydroxyl pyrrolidine derivative isolated from the marine sponge-associated fungus Aspergillus candidus KUFA 0062, displayed anticancer effects in some cancer cell lines, including MCF7. Preussin was investigated for its cytotoxic and antiproliferative effects in breast cancer cell lines (MCF7, SKBR3, and MDA-MB-231), representatives of major breast cancers subtypes, and in a non-tumor cell line (MCF12A). Preussin was first tested in 2D (monolayer), and then in 3D (multicellular aggregates), cultures, using a multi-endpoint approach for cytotoxicity (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT), resazurin and lactate dehydrogenase (LDH)) and proliferative (5-bromo-2'-deoxyuridine (BrdU)) assays, as well as the analysis of cell morphology by optical/electron microscopy and immunocytochemistry for caspase-3 and ki67. Preussin affected cell viability and proliferation in 2D and 3D cultures in all cell lines tested. The results in the 3D culture showed the same tendency as in the 2D culture, however, cells in the 3D culture were less responsive. The effects were observed at different concentrations of preussin, depending on the cell line and assay method. Morphological study of preussin-exposed cells revealed cell death, which was confirmed by caspase-3 immunostaining. In view of the data, we recommend a multi-endpoint approach, including histological evaluation, in future assays with the tested 3D models. Our data showed cytotoxic and antiproliferative activities of preussin in breast cancer cell lines in 2D and 3D cultures, warranting further studies for its anticancer potential.
Debates about neonatal imitation remain more open than Keven & Akins (K&A) imply. K&A do not recognize the primacy of the question concerning differential imitation and the links between experimental designs and more or less plausible theoretical assumptions. Moreover, they do not acknowledge previous theorizing on spontaneous behavior, the explanatory power of entrainment, and subtle connections with social cognition.
Early human vocal development is characterized first by emerging control of phonation and later by prosodic and supraglottal articulation. The target article has missed the opportunity to use these facts in the characterization of evolution in language-specific brain mechanisms. Phonation appears to be the initial human-specific brain change for language, and it was presumably a key target of selection in early hominin evolution.
Distributed applications need identifiers that satisfy storage efficiency, chronological sortability, origin metadata embedding, zero-lookup verifiability, confidentiality for external consumers, and multi-century addressability. Based on our literature survey, no existing scheme provides all six of these identifier properties within a unified system. This paper introduces Source Known Identifiers (SKIDs), a three-tier identity system that projects a single entity identity across trust boundaries, addressing all six properties. The first tier, Source Known ID (SKID), is a 64-bit signed integer embedding a timestamp with a 250-millisecond precision, application topology, and a per-entity-type sequence counter. It serves as the database primary key, providing compact storage (8 bytes) and natural B-tree ordering for optimized database indexing. The second tier, Source Known Entity ID (SKEID), extends the SKID into a 128-bit Universally Unique Identifier (UUID) compatible value by adding an entity type discriminator, an epoch selector, and a BLAKE3 keyed message authentication code (MAC). SKEIDs enable zero-lookup verification of identifier origin, integrity, and entity type within tr
What are the limits of controlling language models via synthetic training data? We develop a reinforcement learning (RL) primitive, the Dataset Policy Gradient (DPG), which can precisely optimize synthetic data generators to produce a dataset of targeted examples. When used for supervised fine-tuning (SFT) of a target model, these examples cause the target model to do well on a differentiable metric of our choice. Our approach achieves this by taking exact data attribution via higher-order gradients and using those scores as policy gradient rewards. We prove that this procedure closely approximates the true, intractable gradient for the synthetic data generator. To illustrate the potential of DPG, we show that, using only SFT on generated examples, we can cause the target model's LM head weights to (1) embed a QR code, (2) embed the pattern $\texttt{67}$, and (3) have lower $\ell^2$ norm. We additionally show that we can cause the generator to (4) rephrase inputs in a new language and (5) produce a specific UUID, even though neither of these objectives is conveyed in the generator's input prompts. These findings suggest that DPG is a powerful and flexible technique for shaping mode
Personal information retrieval fails when systems ignore how human memory works. While existing platforms force keyword searches across isolated silos, humans naturally recall through episodic cues like when, where, and in what context information was encountered. This dissertation presents the Unified Personal Index (UPI), a memory-aligned architecture that bridges this fundamental gap. The Indaleko prototype demonstrates the UPI's feasibility on a 31-million file dataset spanning 160TB across eight storage platforms. By integrating temporal, spatial, and activity metadata into a unified graph database, Indaleko enables natural language queries like "photos near the conference venue last spring" that existing systems cannot process. The implementation achieves sub-second query responses through memory anchor indexing, eliminates cross-platform search fragmentation, and maintains perfect precision for well-specified memory patterns. Evaluation against commercial systems (Google Drive, OneDrive, Dropbox, Windows Search) reveals that all fail on memory-based queries, returning overwhelming result sets without contextual filtering. In contrast, Indaleko successfully processes multi-di
Pre-deployment evaluations inspect only a limited sample of model actions. A malicious model seeking to evade oversight could exploit this by randomizing when to "defect": misbehaving so rarely that no malicious actions are observed during evaluation, but often enough that they occur eventually in deployment. But this requires taking actions at very low rates, while maintaining calibration. Are frontier models even capable of that? We prompt the GPT-5, Claude-4.5 and Qwen-3 families to take a target action at low probabilities (e.g. 0.01%), either given directly or requiring derivation, and evaluate their calibration (i.e. whether they perform the target action roughly 1 in 10,000 times when resampling). We find that frontier models are surprisingly good at this task. If there is a source of entropy in-context (such as a UUID), they maintain high calibration at rates lower than 1 in 100,000 actions. Without external entropy, some models can still reach rates lower than 1 in 10,000. When target rates are given, larger models achieve good calibration at lower rates. Yet, when models must derive the optimal target rate themselves, all models fail to achieve calibration without entropy
The Multi-platform Aggregated Dataset of Online Communities (MADOC) is a comprehensive dataset that facilitates computational social science research by providing FAIR-compliant standardized access to cross-platform analysis of online social dynamics. MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users. The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis. By providing UUID-anonymized user histories and temporal alignment of banned communities' activity patterns, MADOC supports research on content moderation impacts and platform migration trends. Distributed via Zenodo with persistent identifiers and Python/R toolkits, the dataset adheres to FAIR principles while addressing post-API-era research challenges through ethical aggregation of public social media archives.
Reasoning over long contexts is essential for large language models. While reinforcement learning (RL) enhances short-context reasoning by inducing "Aha" moments in chain-of-thought, the advanced thinking patterns required for long-context reasoning remain largely unexplored, and high-difficulty RL data are scarce. In this paper, we introduce LoongRL, a data-driven RL method for advanced long-context reasoning. Central to LoongRL is KeyChain, a synthesis approach that transforms short multi-hop QA into high-difficulty long-context tasks by inserting UUID chains that hide the true question among large collections of distracting documents. Solving these tasks requires the model to trace the correct chain step-by-step, identify the true question, retrieve relevant facts and reason over them to answer correctly. RL training on KeyChain data induces an emergent plan-retrieve-reason-recheck reasoning pattern that generalizes far beyond training length. Models trained at 16K effectively solve 128K tasks without prohibitive full-length RL rollout costs. On Qwen2.5-7B and 14B, LoongRL substantially improves long-context multi-hop QA accuracy by +23.5% and +21.1% absolute gains. The resultin
Effective recommendation systems rely on capturing user preferences, often requiring incorporating numerous features such as universally unique identifiers (UUIDs) of entities. However, the exceptionally high cardinality of UUIDs poses a significant challenge in terms of model degradation and increased model size due to sparsity. This paper presents two innovative techniques to address the challenge of high cardinality in recommendation systems. Specifically, we propose a bag-of-words approach, combined with layer sharing, to substantially decrease the model size while improving performance. Our techniques were evaluated through offline and online experiments on Uber use cases, resulting in promising results demonstrating our approach's effectiveness in optimizing recommendation systems and enhancing their overall performance.
This technical note presents a reproducible workflow for converting a legacy archaeological image collection into a structured and segmentation ready dataset. The case study focuses on the Lower Palaeolithic hand axe and biface collection curated by the Archaeology Data Service (ADS), a dataset that provides thousands of standardised photographs but no mechanism for bulk download or automated processing. To address this, two open source tools were developed: a web scraping script that retrieves all record pages, extracts associated metadata, and downloads the available images while respecting ADS Terms of Use and ethical scraping guidelines; and an image processing pipeline that renames files using UUIDs, generates binary masks and bounding boxes through classical computer vision, and stores all derived information in a COCO compatible Json file enriched with archaeological metadata. The original images are not redistributed, and only derived products such as masks, outlines, and annotations are shared. Together, these components provide a lightweight and reusable approach for transforming web based archaeological image collections into machine learning friendly formats, facilitati