共找到 20 条结果
This paper presents UPV_RIR_DB, a structured database of measured room impulse responses (RIRs) designed to provide acoustic data with explicit spatial metadata and traceable acquisition parameters. The dataset currently contains 166 multichannel RIR files measured in three rooms of the Universitat Politècnica de València (UPV). Each multichannel RIR file contains impulse responses for multiple source-receiver pairs, with each pair covering a 25 cm2 area - the typical size of a personal sound zone. Considering the number of sources and receiver channels associated with each microphone modality, the database contains a total of 18,976 single impulse responses. A hierarchical organization is adopted in which directory structure and metadata jointly describe the measurement context. Each room includes a metadata file containing acquisition parameters, hardware description, spatial coordinates of zones and microphones, and acoustic indicators such as reverberation time. A central index links each RIR file with its experimental context, ensuring traceability and enabling reproducible analysis. The resulting database provides a consistent framework for storing, inspecting, and reusing re
The successful integration of high-temperature superconductors (HTS) into modern technologies requires consistent, accessible, and comprehensive material data, a need that is currently unmet due to the fragmented and incomplete nature of existing resources. This paper introduces a new collaborative, open-access database specifically designed to address this gap by providing standardized data on HTS materials and crucial auxiliary components for HTS applications. The database encompasses extensive data on structural, cryogenic, electrical, magnetic, and superconducting materials, supporting diverse requirements from HTS modelling to magnet design. Developed through collaborative efforts and organized using an ontology-driven data model, this platform is dynamically adaptable, ensuring that it can grow as new materials and data emerge. Key features include user-driven contributions, peer-reviewed data validation, and advanced filtering capabilities for efficient data retrieval. This innovative database, to the knowledge of the authors, being the largest publicly available for material properties of HTS technologies is positioned as a valuable tool for the HTS community, promoting mor
The use of the iris as a biometric identifier has increased dramatically over the last 30 years, prompting privacy and security concerns about the use of iris images in research. It can be difficult to acquire iris image databases due to ethical concerns, and this can be a barrier for those performing biometrics research. In this paper, we describe and show how to create a database of realistic, biometrically unidentifiable colored iris images by training a diffusion model within an open-source diffusion framework. Not only were we able to verify that our model is capable of creating iris textures that are biometrically unique from the training data, but we were also able to verify that our model output creates a full distribution of realistic iris pigmentations. We highlight the fact that the utility of diffusion networks to achieve these criteria with relative ease, warrants additional research in its use within the context of iris database generation and presentation attack security.
Good database design is crucial to obtain a sound, consistent database, and - in turn - good database design methodologies are the best way to achieve the right design. These methodologies are taught to most Computer Science undergraduates, as part of any Introduction to Database class. They can be considered part of the "canon", and indeed, the overall approach to database design has been unchanged for years. Moreover, none of the major database research assessments identify database design as a strategic research direction. Should we conclude that database design is a solved problem? Our thesis is that database design remains a critical unsolved problem. Hence, it should be the subject of more research. Our starting point is the observation that traditional database design is not used in practice - and if it were used it would result in designs that are not well adapted to current environments. In short, database design has failed to keep up with the times. In this paper, we put forth arguments to support our viewpoint, analyze the root causes of this situation and suggest some avenues of research.
The international database community refers to the manipulation of data with inaccuracy and uncertainty using the term fuzzy, which has been translated into Spanish as "borroso" and into French as "flou". Semantically, this term conveys two main ideas: first, the natural concept of ambiguity or vagueness in human reasoning, and second, its connection to fuzzy set theory, fuzzy logic, and possibility theory, as developed by Zadeh between 1965 and 1977. This article explores two key aspects: the attributes of the fuzzy data model GEFRED (GENeralized model for Fuzzy RElational Database) and their implementation in a Relational Database (RDB). The modeling of these attributes was conducted in a Chilian cardboard manufacturing company located in the Maule Region, where the described phenomena involve imprecise and uncertain attributes and values. Specifically, our focus is on the knowledge related to the manufacturing process of coated cardboard, particularly the quality control process for finished products in the company's Conversion Department. The quality of these products, categorized as either stacks or rolls, is characterized using both classical and fuzzy attributes. Classical a
Large language models (LLMs) have become essential for applications such as text summarization, sentiment analysis, and automated question-answering. Recently, LLMs have also been integrated into relational database management systems to enhance querying and support advanced data processing. Companies such as Amazon, Databricks, Google, and Snowflake offer LLM invocation directly within SQL, denoted as LLM queries, to boost data insights. However, open-source solutions currently have limited functionality and poor performance. In this work, we present an early exploration of two open-source systems and one enterprise platform, using five representative queries to expose functional, performance, and scalability limits in today's SQL-invoked LLM integrations. We identify three main issues: enforcing structured outputs, optimizing resource utilization, and improving query planning. We implemented initial solutions and observed improvements in accommodating LLM powered SQL queries. These early gains demonstrate that tighter integration of LLM+DBMS is the key to scalable and efficient processing of LLM queries.
The growing reliance on data-driven decision-making highlights the need for more intuitive ways to access and analyze information stored in relational databases. However, the requirement of SQL knowledge has long been a significant barrier for non-technical users. This article introduces an innovative solution that leverages Generative AI to bridge this gap, enabling users to query databases using natural language. Our approach automatically translates natural language queries into SQL, ensuring both syntactic and semantic correctness, while also generating clear, natural language responses from the retrieved data. By streamlining the interaction between users and databases, this method empowers individuals without technical expertise to engage with data directly and efficiently, democratizing access to valuable insights and enhancing productivity.
The popularity of the Mobile Database is increasing day by day as people need information even on the move in the fast changing world. This database technology permits employees using mobile devices to connect to their corporate networks, hoard the needed data, work in the disconnected mode and reconnect to the network to synchronize with the corporate database. In this scenario, the data is being moved closer to the applications in order to improve the performance and autonomy. This leads to many interesting problems in mobile database research and Mobile Database has become a fertile land for many researchers. In this paper a survey is presented on data and Transaction management in Mobile Databases from the year 2000 onwards. The survey focuses on the complete study on the various types of Architectures used in Mobile databases and Mobile Transaction Models. It also addresses the data management issues namely Replication and Caching strategies and the transaction management functionalities such as Concurrency Control and Commit protocols, Synchronization, Query Processing, Recovery and Security. It also provides Research Directions in Mobile databases.
Digital multimedia watermarking technology was suggested in the last decade to embed copyright information in digital objects such images, audio and video. However, the increasing use of relational database systems in many real-life applications created an ever increasing need for watermarking database systems. As a result, watermarking relational database systems is now merging as a research area that deals with the legal issue of copyright protection of database systems. Approach: In this study, we proposed an efficient database watermarking algorithm based on inserting binary image watermarks in non-numeric mutli-word attributes of selected database tuples. Results: The algorithm is robust as it resists attempts to remove or degrade the embedded watermark and it is blind as it does not require the original database in order to extract the embedded watermark. Conclusion: Experimental results demonstrated blindness and the robustness of the algorithm against common database attacks.
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for storing and managing data with graph-like structure. Today, they represent a requirement for many applications that manage graph-like data, like social networks. Most of the techniques, applied to optimize queries in graph databases, have been used in traditional databases, distribution systems... or they are inspired from graph theory. However, their reuse in graph databases should take care of the main characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to efficiently access data relationships. In this paper, we survey the query optimization techniques in graph databases. In particular, we focus on the features they have introduced to improve querying graph-like data.
JIT (Just-in-Time) technology has garnered significant attention for improving the efficiency of database execution. It offers higher performance by eliminating interpretation overhead compared to traditional execution engines. LLVM serves as the primary JIT architecture, which was implemented in PostgreSQL since version 11. However, recent advancements in WASM-based databases, such as Mutable, present an alternative JIT approach. This approach minimizes the extensive engineering efforts associated with the execution engine and focuses on optimizing supported operators for lower latency and higher throughput. In this paper, we perform comprehensive experiments on these two representative open-source databases to gain deeper insights into the effectiveness of different JIT architectures.
We investigate the query evaluation problem for fixed queries over fully dynamic databases, where tuples can be inserted or deleted. The task is to design a dynamic algorithm that immediately reports the new result of a fixed query after every database update. We consider queries in first-order logic (FO) and its extension with modulo-counting quantifiers (FO+MOD), and show that they can be efficiently evaluated under updates, provided that the dynamic database does not exceed a certain degree bound. In particular, we construct a data structure that allows to answer a Boolean FO+MOD query and to compute the size of the result of a non-Boolean query within constant time after every database update. Furthermore, after every update we are able to immediately enumerate the new query result with constant delay between the output tuples. The time needed to build the data structure is linear in the size of the database. Our results extend earlier work on the evaluation of first-order queries on static databases of bounded degree and rely on an effective Hanf normal form for FO+MOD recently obtained by Heimberg, Kuske, and Schweikardt (LICS 2016).
This discussion was conducted at a recent panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023), held April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM (ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was moderated by Lei Chen and Xiaochun Yang. The discussion raised several questions on how large language models (LLMs) and database research and education can help each other and the potential risks of LLMs.
Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the performance of a single component, but by the cooperation of multiple ones. As such, learning based database components need to collaborate during both training and execution in order to develop policies that meet end performance goals. Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?". To answer this question, we introduce MADB (Multi-Agent DB), a proof-of-concept system that incorporates a learned query scheduler and a learned query optimizer. MADB leverages a cooperative multi-agent reinforcement learning approach that allows the two components to exchange the context of their decisions with each other and collaboratively work towards reducing the query
This paper presents a simple data dump and load utility for Firebird databases which mimics mysqldump in MySQL. This utility, fb_dump and fb_load, for dumping and loading respectively, retrieves each database table using kinterbasdb and serializes the data using marshal module. This utility has two advantages over the standard Firebird database backup utility, gbak. Firstly, it is able to backup and restore single database tables which might help to recover corrupted databases. Secondly, the output is in text-coded format (from marshal module) making it more resilient than a compressed text backup, as in the case of using gbak.
High quality vibrational spectra of solid-phase molecules in ice mixtures and for temperatures of astrophysical relevance are needed to interpret infrared observations toward protostars and background stars. Over the last 25 years, the Laboratory for Astrophysics at Leiden Observatory has provided more than 1100 spectra of diverse ice samples. Timely with the recent launch of the James Webb Space Telescope, we have fully upgraded the Leiden Ice Database for Astrochemistry (LIDA) adding recently measured spectra. The goal of this manuscript is to describe what options exist to get access to and work with a large collection of IR spectra, and the UV/vis to mid-infrared refractive index of H2O ice and astronomy-oriented online tools to support the interpretation of IR ice observations. LIDA uses Flask and Bokeh for generating the web pages and graph visualization, respectively, SQL for searching ice analogues within the database and Jmol for 3D molecule visualization. The infrared data in the database are recorded via transmission spectroscopy of ice films condensed on cryogenic substrates. The real UV/vis refractive indices of H2O ice are derived from interference fringes created fro
Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680x3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360 video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditor
To provide insight into cloth perception and manipulation with an active binocular robotic vision system, we compiled a database of 80 stereo-pair colour images with corresponding horizontal and vertical disparity maps and mask annotations, for 3D garment point cloud rendering has been created and released. The stereo-image garment database is part of research conducted under the EU-FP7 Clothes Perception and Manipulation (CloPeMa) project and belongs to a wider database collection released through CloPeMa (www.clopema.eu). This database is based on 16 different off-the-shelve garments. Each garment has been imaged in five different pose configurations on the project's binocular robot head. A full copy of the database is made available for scientific research only at https://sites.google.com/site/ugstereodatabase/.
This is an exciting era for exo-planetary exploration. The recently launched JWST, and other upcoming space missions such as Ariel, Twinkle and ELTs are set to bring fresh insights to the convoluted processes of planetary formation and evolution and its connections to atmospheric compositions. However, with new opportunities come new challenges. The field of exoplanet atmospheres is already struggling with the incoming volume and quality of data, and machine learning (ML) techniques lands itself as a promising alternative. Developing techniques of this kind is an inter-disciplinary task, one that requires domain knowledge of the field, access to relevant tools and expert insights on the capability and limitations of current ML models. These stringent requirements have so far limited the developments of ML in the field to a few isolated initiatives. In this paper, We present the Atmospheric Big Challenge Database (ABC Database), a carefully designed, organised and publicly available database dedicated to the study of the inverse problem in the context of exoplanetary studies. We have generated 105,887 forward models and 26,109 complementary posterior distributions generated with Nes
(Abridged) Electron-molecule interaction is a fundamental process in radiation-driven chemistry in space, from the interstellar medium to comets. Therefore, knowledge of interaction cross-sections is key. While there has been a plethora of studies of total ionization cross-sections, data is often spread over many sources, or not public or readily available. We introduce the Astrochemistry Low-energy Electron Cross-Section (ALeCS) database, a public database for electron interaction cross-sections and ionization rates for molecules of astrochemical interest. In this work, we present the first data release comprising total ionization cross-sections and ionization rates for over 200 neutral molecules. We include optimized geometries and molecular orbital energies at various levels of theory, and for a subset of the molecules, the ionization potentials. We compute total ionization cross-sections using the binary-encounter Bethe model and screening-corrected additivity rule, and ionization rates and reaction network coefficients for molecular cloud environments for $>$200 neutral molecules ranging from diatomics to complex organics. We demonstrate that our binary-encounter Bethe cros