This data article describes a publicly available dataset supporting the study "Towards Personalised Assessment of Abdominal Aortic Aneurysm Structural Integrity" [1], published in the International Journal for Numerical Methods in Biomedical Engineering. The dataset is hosted on the Zenodo data repository and provides patient-specific imaging, geometric, and biomechanical data for abdominal aortic aneurysm (AAA) analysis. The dataset consists of electrocardiogram (ECG)-gated, time-resolved three-dimensional computed tomography angiography (4D CTA) data acquired over a full cardiac cycle from 20 patients diagnosed with AAA. For each patient, the dataset includes up to ten 3D CTA image frames representing different phases of the cardiac cycle, including systolic and diastolic phases. Patient-specific AAA wall geometries and finite element (FE) meshes derived from the image data and used for biomechanical computations are provided. In addition, computational outputs are included for each patient, comprising wall strain and tension maps, as well as structural integrity index (SII) and relative structural integrity index (RSII) maps of the AAA wall, enabling further investigation of AAA wall structural integrity. The imaging data were acquired at three clinical centres: Fiona Stanley Hospital (Australia), Medical University of Innsbruck (Austria), and University Hospitals Leuven (Belgium). All data were processed at the Intelligent Systems for Medicine Laboratory, The University of Western Australia (ISML-UWA). AAA geometries were extracted through a workflow integrating AI-assisted segmentation and automated surface model generation from the resulting segmentations. Segmentation was performed using PRAEVAorta software (NUREA) for patients 1-10 and the nnInteractive extension within the 3D Slicer platform for patients 11-20, followed by surface and mesh generation using the BioPARR (Biomechanics-based Prediction of Aneurysm Rupture Risk) software package. All files are provided in widely used formats, including NRRD for image data, VTP for geometries and computational results, and Abaqus input files for finite element models and meshes. This dataset can be reused for benchmarking AAA biomechanical analysis pipelines, including image-based strain computation, stress analysis, and structural integrity assessment. More broadly, by providing the biomechanical computation results, it supports further investigation of AAA wall structural integrity in relation to the severity and progression of AAA disease.
In this paper we present our virtual reality dataset for understanding how participants perceive waiting in a queue to confirm an appointment at a virtual doctor's office. Our data studies two scenarios, namely when the participant has been provided a reason on why there is a delay by a virtual receptionist and when they have not been notified of the reason for the delay. In our study, the reason for the delay is an issue with the virtual receptionist's computer system. Our dataset consists of 30 participants interacting in the virtual doctor's office using a Meta Quest Pro. We designed the virtual doctor's office to represent the typical scene from a real world setting where a patient has to wait in queue before checking in for their appointment with the receptionist. The participant is placed in a queue with one other patient, in this case a virtual non-playable character, and must interact with a receptionist, also a virtual non-playable character. Each participant completes both scenarios, or treatments, with the order of treatments being assigned at random. To the best of our knowledge, our dataset is the first dataset for studying how people perceive delays when waiting to receive service in an immersive virtual environment. Participants provided data in a single 1-hour long session. Each participant completed a demographics survey consisting of their age, ethnicity, race, self-identified gender, education level, glasses and/or contact use, experience with video games, experience with VR, and experience with telehealth services. Prior to immersion participants completed the Frustration Discomfort Scale (FDS) to understand frustration tolerance levels. After each treatment immersion participants answered questions on their perception of the length of delay, their frustration level on a 5-point Likert scale, and their likelihood to exit the simulation, also on a 5-point Likert scale. After both treatments, we administered the FDS again and obtained system usability using the System Usability Scale, task load using NASA Task Load Index, and cybersickness using the Virtual Reality Sickness Questionnaire. Additionally, for each participant we provide head, left-hand, and right-hand position and orientation data as well as eye-gaze data consisting of the eye gaze hit location and a human readable object name for the scene element being observed. Since our dataset is collected in a single virtual clinic with participants coming from a single university, it can be used as an exploratory resource for understanding participant behavior and to design larger scale studies and collections. The provided dataset enables research in understanding how participant frustration levels, likelihood to exit, or perception of time changes when they are provided knowledge of the reason for the delay as opposed to when knowledge is absent. The dataset allows researchers to use the eye gaze, head, and hand movement data to understand how participants engage during idle times associated with waiting. The dataset can enable the design of AI algorithms for predicting participant frustration levels, likelihood of exit, and knowledge of why there is a delay based on head, hands, and eye gaze movement. The dataset can enable the development of interruption management systems for immersive applications that provide user engagement based on delays to reduce frustration. Finally, the dataset can be used for behavior-based security measures for protecting critical applications, such as VR-based telehealth applications.
This dataset provides detailed measurements of internal air temperature, product temperature, and relative humidity in domestic refrigerators in households in Bangkok, Thailand. Data were collected from 123 households using Freshliance TagPlus-TH data loggers installed both inside and outside the refrigerators. For each refrigerator, air temperature and relative humidity were monitored at five internal locations: top shelf, middle shelf, vegetable drawer, door shelf, and freezer compartment. In addition, the temperature of a meat-like test product placed on the middle shelf was recorded. Ambient air conditions surrounding each refrigerator were also measured to characterize the external thermal environment. Prior to deployment, all sensors were prepared and activated by the research team, with time zero (T0) defined at this stage. Participants were only required to place the sensors according to the provided instructions while minimizing disturbances to their usual refrigerator usage habits. The sensors remained inside the refrigerators for seven consecutive days, recording measurements every 5 minutes before being removed. Data collected during the first and seventh days of monitoring were excluded from the analysis to account for the uncertainty regarding the exact times at which consumers placed and removed the sensors. This exclusion also helped eliminate transient effects caused by the introduction of heat loads during sensor installation and removal, which are not representative of normal refrigerator operating conditions. Consequently, only the five-day dataset corresponding to stabilized operating conditions is presented. The dataset is provided as tab-delimited.txt files. File names identify both the refrigerator number (1-123) and refrigerator configuration, including single-door, two-door top-freezer, and two-door bottom-freezer models. An accompanying R script enables the generation of temperature and humidity profiles for each refrigerator over the 5-day monitoring period. A README document is also included and provides detailed descriptions of column names, sensor locations, and metadata associated with each measurement campaign. Temperature sensors have a resolution of 0.1°C over a range of -30°C to +70°C, and an expanded uncertainty of ±0.5°C (k = 2). Relative humidity sensors have a resolution of 0.1 % and an expanded uncertainty of ±3 % (k = 2). These uncertainty values are based on the manufacturer's specifications, additional calibration could not be performed within the scope of this study due to the lack of suitable calibration equipment. This high-resolution dataset provides valuable insights into the thermal and hygrometric conditions of domestic refrigeration systems under real household usage conditions. The data may support a wide range of applications, including the assessment and optimization of refrigeration performance through analysis of temperature heterogeneity and thermal behavior inside refrigerating compartments, the evaluation of food safety and risks associated with domestic storage temperatures, the identification of potential temperature abuse conditions in challenge-test studies, the development of consumer recommendations, the development and validation of simplified thermal models and CFD simulations for predicting airflow and heat transfer within domestic refrigerators, studies related to energy consumption and food preservation, and the optimization of domestic refrigerator design parameters using CFD and artificial neural network approaches. The dataset is expected to provide valuable insights for researchers, stakeholders, and policymakers working on refrigeration technologies, food safety, and cold-chain management.
This article presents a multispectral imaging dataset dedicated to training a machine learning algorithm for the in situ detection of Huanglongbing (HLB). HLB, also known as citrus greening disease, is a major pathology caused by the bacterial pathogen Candidatus Liberibacter asiaticus, particularly in species of the citrus genus. The dataset is constituted of terrestrial images acquired in a commercial sweet orange orchard of the variety Pera Rio (Citrus sinensis (L.) Osbeck). The images describe large portions of canopy, with healthy leaves and sections infected by HLB as well as some confounding factors naturally present in orchards. Multispectral images were acquired with a multi-lens camera within the visible-near-infrared domain, resulting in 14 narrow spectral bands. The image acquisition was conducted during two field campaigns in 2023 and 2024. In total, the dataset contains 2,978 images divided into two classes HLB (1,681) and non-HLB (1,297). Originally, data are stored in TIFF format as 14 monochromatic images, organised by spectra band. Additionally, an HDF5-format version is provided, where images are stored as 3D arrays with spectral bands in ascending order. This format is compatible with various programming languages, enables efficient data handling, and is optimised for machine learning and image processing applications, supporting reproducible and portable analysis. This dataset is a valuable resource for the development and benchmarking of classification models, including deep learning approaches, aimed at the detection of HLB. Phytopathology imaging datasets are scarce yet essential for advancing digital agriculture and the development of robust tools for crop disease detection worldwide.
A dataset of two spectral lighting simulation reference models - one office and one factory hall - is presented. It aims to demonstrate and support full-spectral daylight and electric lighting simulations and facilitate evaluation of non-visual effects of light. The dataset includes Rhino CAD geometry, comprehensive spectral material and light source data and window system BSDF data. Example implementations in the two software tools, Radiance and OWL, enable reproducible workflows and support adoption in other software. The dataset is openly available on Zenodo. The office model reproduces Room 518 at the University of Innsbruck, including a west-facing façade and interior furnishings. The factory hall model follows the proposed geometry in the European standard 15193 for building energy performance. Interior reflectances in the office were measured in-situ using a handheld spectrometer. Exterior spectra and factory hall materials matching specified reflectances were obtained from an online spectral materials database. Glazing transmittance was derived from IGDB data using LBNL Optics/WINDOW. BSDFs for venetian blinds at various tilt angles, and for a diffusing pane adapted from the Complex Glazing Database, were generated in WINDOW. Luminaires in both models are specified with photometric files (Eulumdat/IES) and lamp spectra (Fluorescent 840, 4000 K LED). The provided example implementations (Radiance, OWL) include prepared input data and scripts to run first spectral simulations; example results are also included. The dataset is prepared to support reuse by researchers, designers and software developers for method validation, software engineering and comparison, and development of spectral metrics and controls.
European data spaces constitute pro-competitive infrastructures deliberately designed to align with EU competition law objectives. Unlike proprietary ecosystems that risk entrenching market dominance, data spaces embed neutrality, openness, and non-discrimination into their architectural design, thereby addressing market contestability through governance frameworks rather than ex post regulatory intervention. The main argument advances that data spaces represent a transformative "soft-market intervention" that shifts competitive dynamics from exclusive data hoarding to innovation-based rivalry on shared datasets. The paper demonstrates how data spaces extend the principle of free movement to data as a potential "fifth freedom" within the EU internal market, while their federated governance structure prevents any single participant from monopolizing data access or determining participation terms. Through ex ante pro-competitive design, data spaces reconcile the need for large-scale data aggregation with competition law safeguards, enabling economies of scale and scope without creating dominance risks. The technical infrastructure emphasizes interoperability by design, federated cloud architecture, and open standards that prevent vendor lock-in and foreclosure effects. By providing structured alternatives to proprietary data marketplaces controlled by dominant gatekeepers, European data spaces demonstrate how institutional architecture can embed pro-competitive safeguards into market design itself, while fostering digital sovereignty and sustainable competitive advantage within the EU's evolving data economy.
Urdu is spoken by over 230 million people worldwide, yet it remains significantly underrepresented in digital resources, with limited availability of large-scale, publicly accessible training datasets for optical character recognition (OCR). The diversity of Urdu font styles encountered in printed books, newspapers, and digital publications poses a substantial barrier to developing generalizable OCR systems, while the absence of standardized benchmarks hinders fair and reproducible comparison across recognition approaches. This data article presents FIPU-OCR-CHAR, a benchmark dataset of printed Urdu characters encompassing 48 classes: 38 alphabets and 10 numerals in their isolated forms. The dataset was constructed through a fully systematic pipeline comprising five sequential stages: font collection and validation, character set definition, base image rendering, augmentation, and dataset organization with split generation. Each character class was rendered from 201 distinct Urdu TrueType/OpenType font files, producing 9,648 base images (201 fonts × 48 classes). Each base image was subsequently processed through 34 augmentation operations encompassing geometric transforms, photometric adjustments, blur filters, noise injection, and morphological operations, producing 328,032 augmented images. The complete dataset totals 337,680 labeled PNG images, each stored at 28×28 pixel resolution with 24-bit color depth. The dataset is organized into three predefined splits: training (70%; 236,376 images), validation (20%; 67,536 images), and testing (10%; 33,768 images), each accompanied by a CSV annotation file mapping image filenames to integer class labels (0-47). The repository additionally contains a Jupyter Notebook implementing a ResNet-34 baseline classification pipeline, a results summary image, and a README file documenting dataset structure and label definitions. The dataset is publicly available on Mendeley Data under a CC BY 4.0 license and is intended for use in OCR model development, font-invariant classifier training, Urdu script digitization, transfer learning for word- and line-level recognition, and benchmarking of convolutional neural network and Vision Transformer architectures on low-resource script character recognition tasks.
This data article presents experimental datasets generated from ordinary Portland cement systems incorporating waste-derived ferrous sulfate hydrates (FeSO₄·xH₂O) as an alternative sulfate-regulating component to natural gypsum. The datasets document material synthesis conditions, cement binder preparation, hydration-related characteristics, microstructural features, mechanical performance, chromium speciation, and environmental inventory inputs associated with the investigated cement systems. The dataset includes X-ray diffraction (XRD) patterns and quantitative phase analysis files, thermogravimetric and derivative thermogravimetric (TG/DTG) data obtained over a broad temperature range, and scanning electron microscopy (SEM) images of hydrated cement pastes. Chemical composition data for raw materials and cement binders were obtained using X-ray fluorescence (XRF) and inductively coupled plasma optical emission spectroscopy (ICP-OES). Additional datasets include Vicat initial and final setting-time results, compressive strength measurements obtained at different curing ages, soluble hexavalent chromium [Cr(VI)] concentration data, and life-cycle inventory inputs compiled for the cradle-to-gate environmental assessment. All data files are systematically organized according to experimental technique, sample designation, and curing age, and include both raw instrument outputs and processed data tables to support transparency, reproducibility, and reuse. These datasets are intended to facilitate data-driven analysis, comparative evaluation of alternative sulfate-regulating additives, modelling of cement hydration behavior, and environmental assessment of cementitious materials incorporating industrial by-products.
This dataset was collected using a detailed and rigorous methodology to examine Green IT practices in the Indonesian higher education sector. A structured online questionnaire was distributed to various institutions through a secure digital platform, targeting academics with expertise in IT and sustainability. A random sampling method was employed to ensure a diverse and representative sample of respondents. This approach facilitated high participation rates and enabled the collection of standardized data, ready for immediate analysis, thereby enhancing the quality and reliability of the findings. The dataset comprises quantified responses captured through Likert scales, focusing on multiple dimensions of Green IT, including compliance, awareness, challenges, and benefits. This structured approach ensured precise measurement of attitudes and perceptions of Green IT practices, minimizing variability that might arise from open-ended questions. The comprehensive nature of the data, gathered from individuals with diverse demographic and professional backgrounds, provides robust insights into the factors influencing Green IT implementation in the educational sector. The dataset's structured format and depth of information offer substantial potential for reuse in future research. It can serve as a benchmark for comparative studies, support longitudinal research to monitor changes over time or aid in developing targeted interventions to enhance Green IT practices in higher education institutions. Overall, this dataset contributes significantly to the academic discourse on Green IT and provides actionable insights for policymakers and institutional leaders striving to promote more sustainable IT practices within the educational sector.
The Research Software Engineering (RSE) Survey dataset contains longitudinal survey data collected by the Software Sustainability Institute between 2016 and 2022. The survey was initially conducted in the United Kingdom and expanded in subsequent years to include multiple countries. From 2018 onward, a single international instrument was used across participating countries, with only minor contextual adaptations to maintain comparability. Participation was voluntary and open to individuals who self-identified as Research Software Engineers or as performing research-software-related work, irrespective of formal job title. The dataset includes anonymized responses covering demographics, employment conditions, coding practices, training and collaboration, publication contributions, sustainability practices, professional networks, and job satisfaction. The RSE Survey Report, an interactive dashboard (W. Kijewska, M. Donnay, H. S. Packer, S. Hettrick, RSE survey report, [software] (2026). URL https://rse-survey.soton.ac.uk/superset/dashboard/RSE_survey/), also enables exploration of trends, cross-country comparisons, and changes across survey waves. The dataset is openly licensed and supported by publicly available analysis code, enabling reproducible workflows. The longitudinal and international design supports reuse for cross-sectional and temporal analyses, workforce comparisons, and integration with external datasets such as national workforce statistics or research output indicators. The dataset will be expanded with results from future surveys, including the 2026 survey that is pending completion.
This paper investigates the fusion of photovoltaic (PV) system performance data with weather data to create a robust dataset for energy production forecasting and anomaly detection. The study integrates data from multiple PV stations across various manufacturers and locations, addressing challenges in data quality, timestamp alignment, and inconsistent granularity. The resulting dataset underwent rigorous cleaning and aggregation, providing both aggregated and consolidated views, as well as raw telemetry data, for a wide variety of downstream use cases. The refined dataset includes 13,193 rows of station-level data and 88,285 rows of inverter-level data, reflecting energy production and weather parameters. The dataset also includes granular, unfiltered data from the photovoltaic stations, comprising > 1.2 million rows of combined telemetry data. The integration of energy production data with weather data provides a comprehensive foundation for future machine learning applications to improve the efficiency and reliability of PV systems. This dataset is a valuable contribution to the growing body of work focused on optimising solar energy generation through intelligent data integration.
Despite recent advancements in standardized data exchange and governance, facilitated by the concept of data spaces, data providers still lack essential tools to enable participation in data spaces. This limitation primarily stems from the fact that data is typically collected or generated in arbitrary formats, employing diverse types, flexible, and evolving schemas, as well as different data modalities (text, image, video, time series, etc.). Consequently, semantic interoperability and compliance with the FAIR principles are often overlooked, which compromises the utility of shareable data. In this work, we propose a framework that transforms raw or semi-structured data into trustworthy data made available via a data space. Our approach can be utilized to facilitate accessibility, interoperability and reuse by generating an RDF representation of a given dataset, in compliance with a specified input ontology that describes the application domain. The proposed framework also produces valuable metadata during data transformation, which is registered in a catalog to support the findability of the datasets. Furthermore, a change tracking algorithm is applied to detect modifications in the data between consecutive versions of datasets, thereby improving the overall user experience in identifying the most suitable dataset and version for each use case. We evaluate the applicability of our framework in a real-world use case scenario from the urban domain that involves multiple diverse datasets. The proposed framework enables the seamless onboarding of new participants in data spaces.
This article presents a curated dataset of multi-view digital images of full-coverage crown preparations performed on artificial teeth in a preclinical dental education setting. The dataset comprises composite images generated from three predefined photographic views (facial, occlusal, and lingual) for each tooth preparation, along with corresponding expert annotation records. The presented evaluation results reflect qualitative, image-based expert judgment rather than direct instrument-based quantitative measurements. Tooth preparations were completed by fourth-year undergraduate dental students as part of routine preclinical training, collected retrospectively from routine teaching activities, and documented using digital photography. Each tooth preparation was evaluated using a predefined educational rubric commonly applied in preclinical prosthodontic teaching. The released annotation data include a dichotomous pass/fail outcome, a feedback-oriented ordinal score, and a numeric score ranging from 0 to 5, reflecting the number of criteria met. Reference annotations were provided by a specialist prosthodontist. Image files and annotation records are linked through standardized naming to ensure traceability across dataset components. The dataset is intended for reuse in dental education research, the development and testing of image-based assessment methods, and machine learning applications focused on automated feedback and evaluation of preclinical tooth preparations. No patient data or clinical variables are included.
Induction motors are foundational components across industrial applications, valued for their inherent robustness, operational simplicity, and cost-efficiency. Maintaining their reliable function is paramount for a wide range of mechanical and electrical systems. However, they are prone to various mechanical and electrical faults, such as bearing defects, rotor issues, and voltage imbalances, which can significantly impair their performance and reliability. This study presents a novel vibration dataset for induction motor fault diagnosis, uniquely acquired using a smartphone-based inertial sensor rather than conventional industrial accelerometers. Vibration signals were recorded along three orthogonal axes (gx, gy, gz), alongside gravity-compensated acceleration components (guserx, gusery, guserz), enabling detailed analysis of both raw and gravity-free vibration characteristics. Data were collected under diverse conditions, including healthy operation and several fault types, across varying rotational speeds and load states. The dataset features long-duration vibration recordings sampled at 100 Hz, suitable for both time-domain analysis and window-based feature extraction. Its inclusion of multiple operating speeds and load conditions is ideal for studying the impact of operational variability on fault signatures. By leveraging low-cost and readily accessible smartphone sensors, this dataset supports practical and accessible vibration data acquisition for supporting the development, benchmarking, and validation of data-driven fault diagnosis methods. This resource is expected to significantly advance research in condition monitoring of induction motor, particularly for machine learning and signal processing applications using vibration data.
The paper presents a cross-sectional dataset of low- and semi-skilled gig workers in India, collected during a single survey wave in the pandemic. Respondents retrospectively reported their experiences at two time points pre-pandemic (July-November 2019) and during the pandemic (December 2020-January 2021) across multiple human security dimensions, framed within the United Nations Human Security Framework (2016). The dataset represents original primary data and captures the economic, food, health, environmental, personal, community, and political experiences of gig workers during the crisis. The dataset focuses specifically on low- and semi-skilled adult gig workers, including drivers, domestic workers, delivery personnel, beauticians, street vendors, small business owners, and self-employed service providers. The dataset has two parts. Part A captures sociodemographic details, employment status, income, loans, COVID-19 impacts on livelihood, food security, health access, living conditions, and government/community support. Part B records fear and apprehensions, including financial security, social support, community tensions, housing issues, and vaccine attitudes. Data were collected using a structured questionnaire and variables independently developed by the authors; community volunteers from SJS (Mitr Sanketa initiative) only facilitated survey administration. The survey covers 136 variables aligned with the UN Human Security Framework (2016), including economic, food, health, environmental, personal, community, and political security. The dataset provides valuable insights for research and education in understanding the vulnerabilities, resilience, and lived experiences of gig workers during crises.
This data article presents a reproducible dataset designed to measure time-varying efficiency in selected ASEAN foreign exchange (FX) markets. The dataset covers six ASEAN currencies against the U.S. dollar-Indonesia, Malaysia, the Philippines, Singapore, Thailand, and Vietnam-over the period January 2000 to August 2025. Daily exchange rate observations (open, high, low, and close) are obtained from publicly accessible financial data providers and official sources. These raw observations are not redistributed; instead, they serve as inputs to a fully documented transformation pipeline. Using rolling-window procedures applied to daily data, ten efficiency sub-indices are constructed to capture distinct dimensions of deviations from random-walk behavior, including serial dependence, volatility dynamics, long-memory characteristics, distributional asymmetry, and tail risk. All sub-indices are standardized and aggregated through principal component analysis to form a Composite Efficiency Index (CEI), yielding a harmonized monthly measure that is comparable across markets and over time. The repository provides the processed monthly efficiency sub-indices, the Composite Efficiency Index, and complete R scripts that reproduce the entire data-generation workflow. By separating data acquisition from transformation and ensuring full computational reproducibility, the dataset supports transparent empirical research on FX market efficiency, cross-country comparisons, forecasting applications, and methodological extensions in financial econometrics.
This dataset supports the research article titled "Exploring housing price dynamics in sustainable cities through a cooperated big data driven machine learning method: case study on a typical city in China", published in City and Environment Interactions. The data were collected from multiple sources, including web-scraped real estate listings, air quality monitoring stations, public amenities using the Gaode Map API, and population data from the LandScan global dataset. The dataset includes variables describing property characteristics, accessibility, environmental quality, and land use patterns. Random Forest modeling and SHAP values were used to interpret the contribution of each feature to housing price volatility. This dataset is valuable for urban economists, planners, and data scientists studying housing market dynamics, land use policy, or spatial machine learning. It enables replication, benchmarking, and comparative studies in similar urban contexts across developing cities.
This paper presents a dataset of commercial court rulings in Saudi Arabia collected from the official Saudi Ministry of Justice website and recently made publicly available. The dataset consists of judicial decisions written in Arabic, including case narratives, legal reasoning, and final rulings. Although the original corpus includes multiple case types, this work focuses specifically on commercial cases which represent the majority of the data. The dataset was systematically extracted, cleaned, and anonymized by the redaction of personal names to support ethical use and reproducible analysis. It provides a valuable resource for research in Arabic legal text processing and judicial reasoning. The dataset can support a wide range of applications in natural language processing and artificial intelligence, including text classification, information extraction, judgment ruling prediction, and predictive modeling. The dataset can also be used to enable comparative studies between traditional machine learning and deep learning approaches. The dataset is publicly available at: https://data.mendeley.com/datasets/np538c95yy/2.
Audio sensing provides a low-cost, non-contact modality for monitoring mechanical equipment health. Many degradations and faults manifest as gradual changes in spectral and temporal structure (e.g., increased broadband friction noise, harmonic shifts, airflow turbulence changes, and transient impulses), enabling early-warning systems that can support condition-based maintenance and reduce downtime. This article presents a multi-device acoustic dataset designed to study degradation monitoring under realistic cross-device and multi-session variability. The dataset contains labeled recordings from three common motor-driven tools: a shop-vac with (i) discrete fill-level gradations (0%, 30%, 50%, 70%, 100%) and (ii) a mechanically-induced faulty state; a vacuum with (i) discrete clogging gradations (0%, 30%, 50%, 70%, 100% air-filter occlusion) and (ii) power-dial settings (0-8); and an orbital sander with wear-state gradations corresponding to sandpaper lifecycle (New, Moderate, and Worn/Faulty). Recordings were captured with multiple commodity microphones spanning smartphones and external microphones, and were intentionally split into training/testing device groups for cross-device evaluation. Each training condition was captured at two different times and locations, with microphone placement varied during capture to reduce overfitting to environment and geometry. The dataset supports research in robust acoustic condition monitoring, cross-device generalization, domain shift, and data-efficient learning for early fault detection and prognostics.
This data article contributes a dataset capturing the factors affecting Generation Z's green purchase intentions (GPI). The examined factors include green marketing (MAR), green brand image (IMA), attitude toward green products (ATT), environmental knowledge (KNO), and environmental concern (CON). Specifically, the survey focuses on green purchase intention in the non-alcoholic beverages industry, an attractive sector in Vietnam that receives significant attention and investment from both domestic and foreign businesses, mainly targeting young consumers. The data collection process was carried out from December 2024 to April 2025 using a structured questionnaire, yielding 436 valid responses from participants across all 13 provinces and cities of the Mekong Delta in Vietnam. This delta region's youthful demographic structure presents significant opportunities for businesses' market development. The dataset provides valuable insights into young consumers' green purchase intention. This is critical information for businesses in the non-alcoholic beverages industry, particularly those interested in green marketing strategies. Additionally, it serves as a foundation for broader studies across different markets or fields.