We present FORWARD, a high-resolution multimodal dataset of a cut-to-length forwarder operating in rough terrain on two harvest sites in the middle part of Sweden. The forwarder is a large Komatsu model equipped with vehicle telematics sensors, including global positioning via satellite navigation, movement sensors, accelerometers, and engine sensors. The forwarder was additionally equipped with cameras, operator vibration sensors, and multiple Inertial Measurement Units (IMUs). The data includes event time logs recorded at 5 Hz of driving speed, fuel consumption, machine position with centimeter accuracy, and crane use while the forwarder operates in forest areas, aerially laser-scanned with a resolution of around 1500 points per square meter. Production log files (Standard for Forestry Data, StanForD) with time-stamped machine events, extensive video material, and terrain data in various formats are included as well. About 18 h of regular wood extraction work during three days is annotated from 360°-video material into individual work elements and included in the dataset. We also include scenario specifications of conducted experiments on forest roads and in terrain. Scenarios include repeatedly driving the same routes with and without steel tracks, different load weights, and different target driving speeds. The dataset is intended for developing models and algorithms for trafficability, perception, and autonomous control of forest machines using artificial intelligence, simulation, and experiments on physical testbeds. In part, we focus on forwarders traversing terrain, avoiding or handling obstacles, and loading or unloading logs, with consideration for efficiency, fuel consumption, safety, and environmental impact. Other benefits of the open dataset include the ability to explore auto-generation and calibration of forestry machine simulators and automation scenario descriptions using the data recorded in the field. The data and scripts for data exploration and analysis are made long-term publicly available through the Swedish National Data Service.
The European Data Space for Smart Communities (DS4SSCC) is a flagship initiative demonstrating how interoperable, cross-domain data infrastructures can drive smart city transformation in Europe. Yet achieving interoperability across legal, organisational, semantic, and technical layers remains difficult for local governments constrained by legacy systems, rigid procurement, and fragmented ICT landscapes. This paper examines how interoperability is operationalised within DS4SSCC through the Minimal Interoperability Mechanisms (MIMs) Plus framework. Using a qualitative case study approach- combining document analysis, stakeholder survey data, and workshop insights from the DS4SSCC Preparatory Action-the study analyses how governance structures, technical standards, and procurement mechanisms interact to enable cross-domain collaboration. The findings identify five key enablers of interoperability-by-design: modular data architectures, shared governance, lightweight semantic alignment, interoperability clauses in procurement, and capacity building. Foundational MIMs (Accessing, Representing, Interlinking, Securing, and Sharing Data) underpin DS4SSCC's core building blocks, while application-specific MIMs (Personal Data, Geospatial, Interoperable AI, and Local Digital Twin) support more complex, cross-domain use cases. Persistent challenges include uneven capacity, vendor lock-in, and variable conformance maturity. The paper concludes that MIMs Plus translates the four layers of the European Interoperability Framework into actionable mechanisms, positioning interoperability as a continuous governance capability essential for scalable and trustworthy data spaces.
This paper presents a telemetry dataset capturing resource utilization and power consumption metrics across the ENACT edge-cloud continuum. The dataset contains empirical telemetry collected in real-time for both infrastructure nodes and application workloads. More specifically, a distributed weather forecasting scenario has been emulated, comprising five pods: two different weather data sources, two forecasting services (one per node/computing layer) and one long-term storage pool. A cloud-based machine and an edge device belonging to the same Kubernetes cluster have been considered for the deployment of the application pods, corresponding to heterogeneous computing tiers. Data acquisition was performed using ENACT's Telemetry Data Collector and Monitoring Engine which measures telemetry and energy metrics at node and pod levels in real-time. The resulting dataset provides time-series records including CPU, memory and disk utilization, network throughput, and energy consumption for the cloud node, the edge node and the five application pods. Telemetry data was collected during two distinct phases: for a period with application workloads running normally and for a baseline period when applications were removed from the cluster. This allows for assessing the impact of the applications activity in terms of resource usage and energy consumption. This dataset offers valuable insights for the research community in distributed systems, the edge-cloud continuum and cognitive computing, wherein datasets on real-world data, especially reflecting both infrastructure-level and application-level telemetry, are currently very limited. It is particularly useful for developers and research scientists that require such data for tasks such as training and fine-tuning time-series forecasting models, benchmarking anomaly detection models and validating scheduling algorithms and energy-aware strategies, to name a few.
This article introduces a dataset that investigates the physiological responses of drivers when using advanced driver assistance systems (ADAS) in real-world traffic conditions. The study, conducted in the Federal District, Brazil, involved seven drivers in controlled driving sessions. The time of day and the days of the week were standardized to ensure comparable traffic conditions. The data collection was centered on ADAS Level 2 systems, specifically the Lane Keeping Assist System (LKAS) and the Forward Collision Warning System (FCWS). The dataset includes five physiological signals: respiration, heart rate, galvanic skin response (GSR), leg muscle activity, and brain activity. These signals were continuously acquired using a dedicated instrumentation system installed in the vehicle. Given the complexity of collecting data under real traffic conditions, the acquisition sessions generated a large volume of raw data. Considerable post-processing was conducted to identify and segment portions of the signals with sufficient integrity for subsequent analysis. The dataset is structured as time-stamped raw signal spreadsheets, each corresponding to a specific driver and direction of the pre-established route (outbound and return). Such organization enables researchers to navigate the dataset easily, explore specific segments of interest, and conduct comparative analyses across participants and varying traffic conditions. The dataset is relevant to researchers in biomedical signal processing, driver state monitoring, intelligent transportation systems, and human-machine interaction. It may be used by academic laboratories investigating physiological responses during driving tasks, as well as by engineers and developers working on advanced driver assistance systems (ADAS), including automotive manufacturers and ADAS technology suppliers. The dataset, which includes synchronized physiological and vehicle dynamics data collected under real traffic conditions may contribute to the study of human responses during semi-automated driving, supporting research and development of driver-centered mobility technologies.
The persistent gender gap in Science, Technology, Engineering, and Mathematics (STEM) fields, particularly in leadership roles, remains a pressing global challenge, including in Latin America. Understanding the factors that influence individuals' interest in STEM careers and their motivation to assume leadership positions is essential for advancing gender equity and inclusive innovation. Although existing studies have explored these topics, most rely on qualitative methods, small samples, or single-country analyses, limiting regional comparability. In contrast, this manuscript introduces a comprehensive, openly accessible dataset and a detailed multilingual data dictionary that enable systematic comparisons across three Latin American countries, representing an innovative contribution to the study of gender and STEM in the region. We present the methodology and data structure of a large-scale, multi-country survey conducted in Bolivia, Brazil, and Peru, each with >3000 respondents. The study forms part of the Equality in Leadership for Latin America STEM (ELLAS) research network, funded by the International Development Research Centre (IDRC), which promotes the principles of open science and open data. We describe the survey design, ethical protocols, and data curation procedures implemented to ensure transparency and methodological rigor. The resulting ELLAS Survey Dataset offers one of the most extensive quantitative resources on gender and STEM in Latin America. Our goal is to make the ELLAS data accessible to the research community for further analysis. This article aims to guide researchers interested in using the dataset and to support the generation of evidence that can inform policies fostering women's participation and leadership in STEM fields.
Tax evasion remains a challenge for public revenue systems, and is influenced by institutional and cultural factors. This article provides a dataset to empirically evaluate and operationalize the interrelationships among these factors in the context of Iraq. Although previous research has investigated deviant culture, weak law enforcement, and administrative corruption on an individual basis, few datasets document these factors within a unified measurement structure. An online questionnaire was employed to gather data from 459 taxpayers in Baghdad, Iraq, from September-November 2021. The dataset operationalizes four constructs: tax evasion (illegal non-payment or underpayment of taxes through underreporting income, inflating deductions, or concealing financial sources), deviant culture (social norms and behavioral patterns that eschew accepted moral standards), weak law enforcement (limitations in the state's capacity or willingness to enforce laws and regulations), and administrative corruption (misuse of public office for private benefit, commonly reflected through, e.g., bribery and favoritism). Data were collected using an online questionnaire administered via Google Forms and developed using reflective measurement models, with each construct using multiple indicators adapted from sources validated in the literature. Responses were rated on a 7-point Likert-type scale. The instrument was translated into Arabic, subjected to cultural review, and pilot tested to ensure clarity and contextual suitability. In addition to the indicators, the dataset included five demographic variables that described respondent characteristics. The collection consisted of anonymized raw responses, coding schemes, construct indicators, and metadata files, which included variable definitions, response coding, and item references. This dataset offers a comprehensive measurement framework for analyzing tax-related behaviors in developing contexts. It encourages the replication of measurement models, cross-country comparative research, and secondary analyses of taxpayer perceptions of governance, culture, and enforcement. Transparency, reusability, and evidence-based policymaking in tax governance are facilitated by the dataset's public availability and accompanying documentation on Mendeley Data.
Over the past few decades, black soldier fly (BSF) larvae have emerged as a promising alternative protein source for animal feed. In addition to providing essential amino acids that support animal performance, BSF larvae also contain certain bioactive components that may contribute to maintaining intestinal health of animals. However, it is assumed that depending on the process used to obtain these larvae, their composition in proteins and bioactive molecules may vary. In this context, we analyzed the proteomes of live BSF larvae (BSFL), dehydrated BSFL, and a protein concentrate prepared from the dehydrated BSFL (all from the same commercial source). The larvae samples were frozen, freeze-dried, and ground before protein extraction. Proteins from each form were solubilized using a high concentration of sodium chloride and subsequently analyzed by mass spectrometry. The resulting protein lists were compared to identify shared proteins, sample-specific proteoforms, and gene identifiers. Venn diagrams showed that the three samples shared 888 proteins. Notably, live larvae exhibited the highest protein complexity, with 814 unique proteins, compared to none in dehydrated larvae, and 36 in the protein concentrate. The compilation of all the data enabled the identification of 2232 unique gene identifiers. Gene ontology analysis using EnsemblMetazoa database revealed numerous proteins associated with metabolism (amino acids, lipids, carbohydrates), while 57 proteins were assigned to innate defense and detoxification processes. Approximately 30 antimicrobial proteins/peptides were identified (peptidoglycan-binding proteins, lysozymes, cecropins, defensins, and attacin-like peptides). These data highlight that BSF larvae are a natural source of potential bioactive compounds, including antimicrobial proteins and peptides.
This work introduces the new agricultural contamination elements (ACE) dataset, which is comprised of annotated images representing four classes of elements, including bag, bottle, can, and trash. The bag annotation tag includes plastic bags and thin plastic sheet material. The trash annotation is a general-purpose tag for anything that does not fit into one of the other categories and that is not cotton. All the annotations included in the dataset are of the bounding box type. An unmanned aerial system (UAS) captured the images used for the annotations. The data capture included random heights and speeds to allow for more variation in the dataset. The current dataset includes images of cotton fields in Mississippi from three growing seasons (2021, 2022, and 2023). Researchers randomly placed the contamination elements in the cotton field before imaging and removed them from the field immediately after. The elements used within each class were random, based on what was available. The items serve as a general example of trash types and are not necessarily the most common examples of any type. In addition to three years, the data also represents different stages of the growing season. The data collected in 2021 contained the most variation in growing stages. The focus of the 2022 and 2023 data is defoliated cotton plants just before harvest. There are over 21,500 box annotations in the dataset, with 2021 accounting for 59%, 2022 accounting for 12% and 2023 accounting for 29%. The full-size images from the UAS were either still images taken at 16 megapixels (MP) or 4 K video. In either case, the full-size images were pre-processed and broken into square tiles of size 720×720 pixels with no overlap in height but some overlap in width. The extensible markup language (XML) files contain all the annotations and share the same name as their associated image. Folders separate the images and annotations by year to facilitate future studies that incorporate temporal aspects, such as testing performance across years.
SEED-ML (Semen Examination and Evaluation Dataset for Machine Learning) is an openly available, multi-parametric clinical dataset specifically designed to support research in male infertility diagnostics and prediction. SEED-ML refers specifically to the dataset repository and its clinical structure, and not to a specific machine learning model or diagnostic method. In this sense, SEED-ML comprises records from 10,124 patients, including detailed semen analysis parameters (pre- and post-capacitation), morphological classifications, and clinical alterations. Infertility diagnosis is categorized into nine clinically relevant classes, ranging from normal fertility to complex multi-factor conditions such as oligoasthenoteratozoospermia. All data were anonymized and curated following strict ethical and privacy guidelines to ensure compliance with applicable medical data protection regulations. The dataset reflects real-world clinical distributions across nine diagnostic classes: Normozoospermia (62.68%), Oligoasthenoteratozoospermia (14.22%), Asthenozoospermia (11.66%), Teratozoospermia (6.71%), Oligozoospermia (1.90%), Asthenoteratozoospermia (1.38%), Oligoasthenozoospermia (0.96%), Oligoteratozoospermia (0.34%), and Azoospermia (0.16%). This detailed categorization provides a realistic clinical distribution for machine learning evaluation. SEED-ML offers a resource for developing and benchmarking machine learning models, enabling research in predictive analytics, decision support systems, and computational andrology. This dataset aims to facilitate interdisciplinary collaboration between clinicians, data scientists, and AI (artificial intelligence) researchers. The dataset is publicly available in Mendeley under a CC BY 4.0 license.
Kidney disease is a major global health concern that requires timely diagnosis and effective monitoring to prevent severe complications and improve patient outcomes. This data article presents BD-KDD, a structured clinical dataset designed to facilitate research on kidney disease diagnosis. The dataset was collected retrospectively from electronic medical records obtained from Popular Diagnostic Center, Savar Branch, Dhaka, Bangladesh, following institutional authorization for academic research. The BD-KDD dataset contains 988 patient records with 26 variables, including demographic attributes, physiological measurements, biochemical laboratory tests, urinalysis indicators, hematological parameters, comorbidity indicators, and clinical symptoms. Key laboratory features include serum creatinine, blood urea, blood glucose, sodium, potassium, hemoglobin, packed cell volume, red blood cell count, and white blood cell count, along with urinalysis indicators such as specific gravity, albumin, and sugar levels. Each record is assigned a binary diagnostic label representing either healthy individuals or kidney disease cases based on clinical evaluation and laboratory findings. The curated dataset includes 481 healthy and 507 kidney disease cases and is provided in CSV format together with a dataset dictionary describing variable definitions and coding schemes. BD-KDD offers a valuable resource for biomedical data analysis, health informatics research, and the development of machine learning-based diagnostic models and clinical decision support systems for renal health assessment.
This study presents a comprehensive BDFlower growth stage dataset designed to support research in precision agriculture and floriculture. The dataset encompasses eight common flower species found in Bangladesh: Bush Allamanda, Red Hibiscus, Yellow Bell, Pinwheel Flower, Pink Periwinkle, White Madagascar Periwinkle, Marvel of Peru, and White Hibiscus. Each species is represented across three growth stages-Early, Mid, and Full-resulting in 24 distinct classes. A total of 23,334 colour images are included, comprising 3889 original photographs and 19,445 augmented samples generated with five augmentation techniques. Bush Allamanda contains 499 images, Red Hibiscus contains 489 images, Yellow Bell contains 483 images, Pinwheel Flower contains 497 images, Pink Periwinkle contains 452 images, White Madagascar Periwinkle contains 472 images, Marvel of Peru contains 468 images and White Hibiscus contains 529 images. Each image was collected using smartphone camera at three-time intervals per day, spaced eight hours apart, to capture natural variations in lighting and appearance. The dataset is further organized into training, validation, and testing splits, enabling direct application to machine learning workflows. This is a publicly available dataset specifically curated for flower growth stage classification. In addition to dataset collection, we also conducted a simple experiment using a CNN model to evaluate its performance on this dataset. It is intended to facilitate the development of robust computer vision models that can monitor flower development, with potential applications in automated plant phenotyping, crop monitoring, and digital floriculture systems.
This article provides a salmon fillet dataset to investigate the detection of distinct regions, undesirable spots, and possibly the higher nutrient content measurements. Since we know that the belly of salmon is high in omega-3 fatty acids, we can use computer vision and image processing to identify the belly areas of salmon fillets (for trim A, B, and C cuts, trim A cut has the largest belly area) and determine the percentage of these fatty acids. As a result, this dataset becomes essential for training models that identify and examine the belly regions. Datasets were acquired from Lerøy Aurora, a salmon processing plant in Skjervøy, Norway, as well as images taken in our lab during experiments. To acquire the images at the Lerøy plant, two settings were used: (i) using a stand with 3 Intel RealSense RGB-D cameras and (ii) using a stand with 1 Intel RealSense RGB-D camera, depending on the amount of space available to put our setup near the production line. The camera equipment was positioned close to the production line. In total, 712 RGB images, 10 ROS (Robot Operating System) bags with 3 camera settings, and 5 ROS bags with 1 camera setting were taken in the Lerøy plant, while 60 RGB images were captured at the NMBU lab. ROS nodes were utilized to capture both the ROS bags (which carried RGB-D information) and the RGB images. To facilitate further research on salmon fillets, this collection also contains 509 multispectral images of fish fillets. The dataset is intended primarily as a benchmarking and pre-training resource, demonstrating the potential of computer vision for salmon fillet analysis. In conclusion, this comprehensive dataset provides a solid base for potential research on automated salmon fillet analysis. This will enable computer vision and image processing to enhance quality control and nutritional evaluation of salmon fillets.
With the rapid expansion of smart cities and the growing need for accurate and real-time analysis of road infrastructure, the development of artificial intelligence systems capable of perceiving, analyzing, and recording environmental information has become increasingly vital. In this context, the present study introduces a novel system for the detection of Iranian traffic signs, making a significant contribution not only in the data collection domain but also in the detection of visual data. A comprehensive and unique dataset is constructed, consisting of 14,111 images and 19,000 traffic signs across 118 distinct classes, has been collected over a two-year period under diverse temporal conditions (morning, noon, dusk, and night) and throughout all four seasons, covering urban, rural, and intercity areas across the country. The image annotation process was conducted with high precision using the MakeSense tool in two standard formats: YOLO (.txt) and Pascal VOC (.xml). Subsequently, an automated detection system was developed based on the advanced deep neural network model YOLOv12, enabling precise identification of traffic signs. The model's performance, evaluated through 6-Fold Cross Validation, demonstrated outstanding accuracy achieving an mAP@50 of 96%. These results highlight the model's remarkable efficiency in real-world scenarios and its superiority over previous YOLO versions. Beyond detection capabilities, the proposed system can be employed for the extraction of digital traffic sign maps, serving as a foundational tool for navigation systems, intelligent vehicles, spatial analytics, and the development of a national traffic sign map of Iran and other similar countries. By integrating cutting-edge technologies, data-driven localization, and state-of-the-art deep learning architectures, this research represents a significant step toward a smarter and safer future for the nation's road networks.
This article describes a new, publicly available data resource linking numerous existing datasets and analyses at the county/FIPS code level. Linked datasets include publicly available data on county demographic makeup (e.g., race, population, etc.), social vulnerability factors, and pregnancy-related hospital utilization. It also includes a new dataset, measuring travel time from each county to the nearest hospital for each of four levels of maternal care services. A wide variety of datasets, including population-level health data, are published by the United States Government for public use. While these data sets are immensely valuable for research and can be used to improve healthcare delivery and outcomes, it can be challenging to find, access, and link the data. Maternal health, population health, and demography researchers as well as healthcare providers and health policy experts, can use this dataset to better understand the causes of maternal morbidity and mortality and to improve health outcomes. By linking these varied data sources, this new data resource allows examination of individual contributing factors as well as the interplay between and relative strengths of multiple correlated factors. Additionally, this dataset allows researchers to add any additional data, simply by using FIPS codes for linking.
Seaweed cultivation is considered a potential tool to face the environmental pressures derived from the intensification of global food demand. This article presents a dataset supporting a comparative sustainability assessment of two nearshore cultivation systems, longline and tube-net, for Saccharina latissima in Danish waters. In this work, a comprehensive Life Cycle Assessment (LCA) was used to quantify the environmental performance of two main nearshore cultivation systems: traditional longline and tube-net setup. The system boundary includes the whole life cycle of the seaweed production until the harvest, excluding the distribution and end-of-life stages. The dataset is based on empirical pilot-scale data collected at two cultivation sites in Limfjorden (Denmark) and scaled to a reference farm area of 18.75 ha. Foreground data were obtained from pilot cultivation sites located in Limfjorden (Denmark), while background data were sources from Ecoinvent 3.10 and AGRIBALYSE v3.1.1. Environmental impacts are presented for several categories included in the ReCiPe (H) 2016 method, covering a wide range of environmental impacts. Results are provided for multiple functional units, including per cultivation site per year, per hectare, per kg fresh biomass, per kg dry biomass, and per meter of cultivation infrastructure. Moreover, a Techno-Economic Assessment (TEA) with a 7% discount rate and several seaweed prices is included, evaluated over a 10-year project horizon using Net Present Value (NPV). The dataset includes detailed Life Cycle Inventories (LCI), Life Cycle Impact Assessment (LCIA) results, techno-economic calculations, and modelling assumptions, enabling transparency, reproducibility, and reuse in future environmental and economic assessments of seaweed cultivation systems.
Urban smart city traffic management increasingly relies on UAV-based sensing, yet many widely used drone datasets annotate vehicles with axis-aligned bounding boxes that include unnecessary background and do not encode vehicle orientation. We present UAV-OBB, an aerial urban vehicle dataset with oriented bounding boxes (OBBs), designed for rotation-aware computer vision object detection and traffic monitoring from predominantly nadir-view UAV imagery. UAV-OBB contains 1617 RGB images at 1920 × 1080 resolution captured over roads in Chongqing and Wuhan (China), together with OBB Nannotations in YOLOv8-OBB label format and supplementary MP4 evaluation videos. The dataset provides 46,807 oriented annotations across six vehicle classes: bike, bus, car, other_vehicle, taxi, and truck, and is split into 1383 training images, 218 validation images, and 16 test images. Data were collected at 75 to 108 m altitude under diverse real-world conditions, including morning, midday, evening, night, rain, and mist or light fog, with both wide field-of-view and zoom settings to introduce strong scale variation. All instances were manually annotated using rotation-capable tools and double-checked for consistency, and occluded and truncated vehicles were included when the majority of the object was visible. To support practical smart city evaluation beyond static mAP, UAV-OBB also includes a short video clip with sparsely annotated reference frames and a longer unannotated sequence for qualitative assessment of temporal stability and deployment behaviour. UAV-OBB provides a realistic benchmark for rotation-aware detection, tracking, counting, and traffic flow analysis in urban UAV surveillance scenarios.
Modern homes are increasingly relying on networks of internet-connected devices for smart home applications. Unlike an enterprise network, such smart home networks typically tend to have lax cybersecurity protocols in place and are prone to cyber-attacks. Attack patterns have also evolved over time, with multi-stage attacks being more difficult to detect and mitigate. This article presents a novel dataset captured under normal operation as well as from seven different multi-stage attack scenarios that can arise in a smart home environment that hosts end devices of heterogeneous nature. Unlike conventional network flow datasets which describe individual single-stage attacks on devices, this dataset aims to provide researchers with valuable insights into the behaviour of network parameters when each attack scenario includes a combination of attacks. This dataset with a total of 178,831 samples, is split into two parts: Training- consisting of 148,959 samples and Testing- consisting of 29,872 samples. The captured dataset can be used to develop intrusion detection systems capable of detecting and classifying such interleaved attack patterns that include multiple sequential attack steps in a smart home environment. The dataset captures the common network flow parameters in the network and hence, the dataset can be used to extend the research to other type of networks as well.
A comprehensive wound-related image repository was developed to address critical gaps in existing medical imaging resources, particularly the lack of balanced datasets representing both healthy and pathological lower-limb conditions. The collection comprises 5443 images sourced from two complementary streams: real-world clinical wound cases and controlled acquisition of healthy feet images. The wound component includes 2686 expertly annotated images representing eight clinically significant wound types-diabetic, pressure, trauma, venous, surgical, arterial, cellulitis, and miscellaneous categories. These images were gathered across diverse clinical environments between 2015 and 2019 and meticulously annotated by certified wound specialists, ensuring high-quality segmentation masks including peri‑wound regions. The healthy-foot component consists of 2757 images captured from volunteer participants in naturalistic settings using consumer-grade smartphone cameras. Each participant contributed eight multi-angle images under consistent protocols, enabling robust representation of anatomical variability across sex, skin tone, and foot structure. All images were standardized through controlled resizing procedures, while the wound dataset underwent additional mask generation and augmentation strategies to support downstream segmentation and classification tasks. This unified dataset provides a balanced foundation for developing machine learning models capable of distinguishing between normal and pathological foot conditions while supporting advanced tasks such as wound segmentation, severity assessment, and clinical decision support. By integrating healthy and wound images within a single accessible collection, the dataset mitigates class imbalance issues prevalent in existing resources and enables scalable, generalizable deep learning research in wound detection, monitoring, and medical image analysis.
Accurate fruit detection in citrus orchards is essential for yield estimation, precision harvesting, and automated orchard monitoring. Although UAV-based imaging has become a powerful tool in precision agriculture, publicly available datasets for orange fruit detection remain scarce, particularly those integrating multispectral data under real field conditions. This lack of open resources limits the development and benchmarking of robust deep-learning models for cross-spectral and illumination-invariant detection. We present CampanetaOrangeFruit, a dataset acquired with a DJI Mavic 3 Multispectral UAV flying at 14 m above ground level over a commercial citrus orchard in Corbera, Valencia, Spain. The dataset comprises 550 synchronized captures (RGB + four multispectral bands: R, G, RE, NIR) for a total of 2750 images and 301,232 annotated orange instances. Each image includes YOLOv5-format annotations generated through a homography-based reprojection process, ensuring geometric consistency across spectral modalities. CampanetaOrangeFruit uniquely provides pixel-aligned, cross-spectral UAV imagery with fine-grained fruit-level annotations, enabling research on fruit detection, yield estimation, and domain adaptation in real-world orchard environments. It represents a valuable benchmark for advancing deep-learning approaches in precision agriculture and sustainable citrus production.
Ontogenetic changes in long bone structure provide critical insights into the interplay between growth, mechanical loading, and skeletal development, yet open-access datasets focusing on non-adult individuals remain limited. This study presents an open dataset of femoral cross-sectional geometric properties spanning infancy to adolescence, derived from high-resolution computed tomography (CT) scans of archaeological human remains from the San Pablo Convent (Burgos, Spain). The dataset includes 72 unfused femora representing a continuous developmental sequence from 2.00 to 16.18 years of age. Cross-sectional geometric variables were extracted at four standardized and biomechanically meaningful locations along the femur: distal diaphysis, midshaft, proximal diaphysis, and femoral midneck. For each section, the dataset provides measurements of cortical and medullary areas, area ratios, second moments of area, and torsional rigidity, enabling detailed assessments of structural strength and biomechanical adaptation during growth. Age at death was estimated primarily from dental development, with complementary femoral-length-based regressions when dentition was unavailable. A key feature of this dataset is the inclusion of biological sex estimation based on amelogenin peptide identification from tooth enamel using a minimally invasive proteomic approach, allowing the investigation of sexual dimorphism prior to skeletal maturity. This resource supports reproducible research in bioarchaeology, biomechanics, and developmental skeletal biology.