搜索 — ResearchTracker

The data volumes stored in telescope archives is constantly increasing due to the development and improvements in the instrumentation. Often the archives need to be stored over a distributed storage architecture, provided by independent compute centres. Such a distributed data archive requires overarching data management orchestration. Such orchestration comprises of tools which handle data storage and cataloguing, and steering transfers integrating different storage systems and protocols, while being aware of data policies and locality. In addition, it needs a common Authorisation and Authentication Infrastructure (AAI) layer which is perceived as a single entity by end users and provides transparent data access. The scientific domain of particle physics also uses complex and distributed data management systems. The experiments at the Large Hadron Collider\,(LHC) accelerator at CERN generate several hundred petabytes of data per year. This data is globally distributed to partner sites and users using national compute facilities. Several innovative tools were developed to successfully address the distributed computing challenges in the context of the Worldwide LHC Computing Grid (W

SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors

arXiv2021-01-08作者：Kai Zhao, Sheng Di, Xin Liang

Efficient error-controlled lossy compressors are becoming critical to the success of today's large-scale scientific applications because of the ever-increasing volume of data produced by the applications. In the past decade, many lossless and lossy compressors have been developed with distinct design principles for different scientific datasets in largely diverse scientific domains. In order to support researchers and users assessing and comparing compressors in a fair and convenient way, we establish a standard compression assessment benchmark -- Scientific Data Reduction Benchmark (SDRBench). SDRBench contains a vast variety of real-world scientific datasets across different domains, summarizes several critical compression quality evaluation metrics, and integrates many state-of-the-art lossy and lossless compressors. We demonstrate evaluation results using SDRBench and summarize six valuable takeaways that are helpful to the in-depth understanding of lossy compressors.

搜索结果：Scientific data

Astronomical data organization, management and access in Scientific Data Lakes

SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors

ACL-Fig: A Dataset for Scientific Figure Classification

Assessing Scientific Contributions in Data Sharing Spaces

RADx Data Hub: A Cloud Platform for FAIR, Harmonized COVID-19 Data

Interoperability-oriented Quality Assessment for Czech Open Data

Amplify Initiative: Building A Localized Data Platform for Globalized AI

Augmenting Anonymized Data with AI: Exploring the Feasibility and Limitations of Large Language Models in Data Enrichment

Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data

Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Data-driven Summarization of Scientific Articles

Small Data Explainer -- The impact of small data methods in everyday life

Characterizing the Usefulness of Code Review Comments in Scientific Software for Software Quality and Scientific Rigor

On the Convergence of Federated Learning Algorithms without Data Similarity

Managing large-scale scientific hypotheses as uncertain and probabilistic data

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Analysing Scientific Collaborations of New Zealand Institutions using Scopus Bibliometric Data

Technical report: Linking the scientific and clinical data with KI2NA-LHC

IVOA Recommendation: Data Model for Astronomical DataSet Characterisation