In recent years, numerous hashing techniques have been developed to boost efficient cross-modal retrieval. Once a retrieval model is deployed, the hash code length is fixed to achieve optimal performance. To address different retrieval scenarios while maintaining retrieval accuracy, a common approach is to redesign and retrain the original model with different hash code length. However, this retraining process can increase the training load and may lead to worse results. To tackle these challenges, we present Regenerated Cross-Modal Hashing (RCMH), a novel cross-modal hashing framework designed to improve the quality of existing hash codes and convert them to arbitrary lengths with high efficiency. First, we clip or pad the existing hash codes to initialize them with the target length, under the supervision of the similarity matrix generated by the augmented label information. Second, we introduce a linear-nonlinear competitive reconstruction approach to reduce the semantic gaps and further capture the deeper relationships from linear image features and nonlinear text features. In this way, each pair of samples is compared and selected to obtain reconstructed binary codes that can preserve the modality-specific properties. Finally, to reduce the training costs caused by iterations of variables, the regenerate hashing term is utilized to regenerate final hash codes with the reconstructed binary codes while preserving the information from the existing hash codes without iterative optimization. Notably, RCMH can be integrated with existing state-of-the-art (SOTA) methods with robustness, helping them to adjust the hash code length and achieve better retrieval performance. Extensive experiments on several public datasets show that RCMH outperforms SOTA methods across various evaluation metrics. The code is available at https://github.com/kaihang-jiang/RCMH.git.
Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.
Pure chromatic background images (PCBIs) consist of large uniform regions with minimal structural content, which makes standard perceptual hashing based on low-frequency DCT unreliable. High similarity scores may be assigned to visually different images, leading to false acceptance. In blockchain-based copyright registration systems, such transactions are irreversible once confirmed, necessitating a conservative similarity model. This paper proposes DBpHash, a dual-band perceptual hashing framework for blockchain-based copyright registration and verification of PCBIs. During computation, low-frequency DCT bands are excluded while mid and high-frequency bands are energy-normalized and binarized to create a 128-bit hash. Similarity is calculated independently for each band and fused using inverse-variance weighting. Thresholds are derived from real and imposter distributions. Experiments conducted on a PCBI dataset are further validated via cross-dataset evaluation on BSDS500 and DTD without retraining. False positive rate remains at 0.095 while similarity exceeds 0.92 under common photometric distortions. Statistical analysis confirms strong hash properties, including near-ideal bit balance and entropy. The framework is integrated with blockchain-based registration and off-chain similarity computation. Results indicate that DBpHash provides a reliable and distortion-resilient solution for copyright authentication of PCBIs.
Biometric data, particularly fingerprints, provide a reliable means of personal identification; however, protecting sensitive biometric templates against unauthorized access, tampering, and reconstruction remains a significant challenge in biometric security systems. Existing fingerprint hashing approaches often face limitations in terms of computational efficiency, robustness to geometric distortions, and preservation of discriminative ridge features. To address these limitations, this article proposes a fingerprint hashing technique that integrates the Fast Wavelet Transform (FWT), Fourier-Mellin Transform (FMT), and fractal coding to improve robustness and computational efficiency. In the proposed framework, FWT is employed for noise reduction and multi-resolution feature extraction, effectively preserving essential ridge structures and improving image quality. The FMT is utilized to achieve rotation-invariant feature representation, while fractal coding performs efficient image compression and enhances robustness against distortions and dimensional variations. The integration of FWT with fractal coding further reduces computational complexity while maintaining strong discriminative capability in the generated hash codes. Comprehensive experiments conducted on the FVC2002, FVC2004, and SOCOFing fingerprint databases demonstrate the effectiveness of the proposed hashing framework. The method achieves strong discriminability with an Equal Error Rate (EER) as low as 0.3864%, while maintaining low False Match Rate (FMR) and False Non-Match Rate (FNMR). In addition, the proposed approach significantly reduces computational cost, achieving an average execution time of 2.507 seconds compared to 1048.888 seconds in existing fractal-coding-based hashing approaches. Performance evaluation using PSNR, SNR, SSIM, entropy, mutual information, and edge preservation metrics confirms that the proposed framework improves noise resilience, preserves structural fingerprint features, and maintains robustness against geometric variations. These results demonstrate that the proposed approach generates compact, secure, and robust fingerprint hashes suitable for biometric authentication, forensic identification, and digital identity verification systems.
While recent deep learning-based object detection has achieved great success in various fields, it remains challenging to find tiny objects in aerial imagery on-the-fly using mobile devices. Since mobile platforms such as drones operate with limited onboard computing power, handling high-resolution images to find tiny objects with compute-intensive deep learning-based applications often fails to meet their real-time constraints. To mitigate this problem, we propose HashEye, a novel framework that enables fast on-drone tiny object detection by efficiently suppressing spatial redundancy in aerial imagery. HashEye utilizes a lightweight hashing algorithm to rapidly scan image patches; patches exhibiting high hash collision frequencies are identified as background and suppressed. Subsequently, the remaining salient patches are dynamically rearranged into a hardware-friendly dense image for efficient inference. Experimental results on two real-world datasets demonstrate that HashEye achieves up to a 5.25× speedup compared to the baseline, maintaining detection capability.
Drone (or unmanned aerial vehicle) has been extensively applied in many modern artificial intelligence systems in the past decade. In this work, we propose a novel deep hashing framework that can detect objects from drone-captured pictures extremely fast. Our method can intrinsically and flexibly encode various topological structures from each target object, based on which multiscale objects can be discovered in a view- and altitude-invariant way. Moreover, by leveraging $l_{F}$ and $l_{1}$ norms collaboratively, the calculated hash codes are robust to low-quality drone pictures and possibly contaminated semantic labels. More specifically, for each drone picture, we extract visually/semantically salient object parts inside it. To characterize their topological structure, we construct a graphlet by linking the spatially adjacent object patches into a small graph. Subsequently, a binary matrix factorization (MF) is designed to hierarchically exploit the semantics of these graphlets, wherein three attributes: 1) deep binary hash codes learning; 2) contaminated pictures/labels denoising; and 3) adaptive data graph updating are seamlessly incorporated. Accordingly, a manifold-regularized feature selector is adopted to further obtain more discriminative deep hash codes. Finally, the selected hash codes corresponding to graphlets within each drone photograph are utilized for ranking-based object discovery. Comprehensive experiments on the DAC-SDC, MOHR, and our self-compiled dataset have demonstrated the competitive speed and accuracy of our method.
This paper presents a supervised hashing framework built on a Siamese architecture, where gated residual connections substantially enhance the quality of compressed representations. The model employs ConvNeXt-Base as the backbone, combining the inductive biases of convolutional networks with modern architectural principles inspired by transformers. In the hash-generation pathway, the learned representation must simultaneously preserve essential identity information and apply nonlinear transformations for effective compression and separation. To achieve this, a learnable gate is introduced between two complementary branches: (1) a shallow identity/residual branch that preserves the core feature structure extracted by the backbone, and (2) a deeper transformed branch that performs nonlinear projection and disentanglement. The adaptive gating mechanism dynamically balances these two paths, enabling the network to retain discriminative local and global cues while suppressing irrelevant variations. As a result, the proposed design produces stable, semantically consistent, and highly discriminative binary hash codes. The Siamese network is trained using Triplet Loss to enforce similarity preservation in Hamming space. Extensive experiments on fine-grained benchmarks (CUB-200-2011 and NABirds) as well as a coarse-grained dataset (CIFAR-10) demonstrate that the proposed framework consistently outperforms or matches state-of-the-art hashing methods, validating its robustness and generalization across different levels of semantic granularity.
Locality-sensitive hashing is a technique for approximate similarity search, but fixed thresholds and analog encoding inefficiencies have limited its hardware implementation. This work introduces a dual-domain adaptive spatial hashing (DASH) architecture on a monolithic one-transistor-one-resistor active array operating in both digital and analog domains, with an integrated multifunctional memristor. DASH performs entropy-maximized random projection and data-driven bias adaptation in the analog domain, followed by efficient Hamming-distance computation in the digital domain. Using dual-domain vector-matrix multiplication, multidimensional inputs are compressed into binary hash codes, enabling compact similarity encoding within a unified memristive hardware platform. Experimental validation on three-dimensional-clustered synthetic data shows that DASH maintains spatial separability and improves bit entropy through adaptation. Large-scale simulations on a digit dataset demonstrate improved semantic preservation, similarity recall, and noise resilience compared to non-adaptive hashing. These results position DASH as a scalable, hardware-native solution for energy-efficient, locality-aware similarity search in edge and neuromorphic systems.
Sensor-based human activity recognition (HAR) plays a fundamental role in healthcare monitoring, sports analytics, and ambient-assisted living. Although deep learning has substantially advanced HAR performance, two practical issues still limit its real-world deployment: (i) the distribution shift caused by changes in users or sensor placements can degrade generalization, and (ii) the quadratic O(L2) complexity of standard self-attention hinders efficient long-sequence modeling on resource-constrained wearable devices. To address these issues, we propose DSHformer, which is an accuracy-oriented HAR framework that combines compact channel-temporal encoding with locality-sensitive hashing (LSH)-based attention. Specifically, DSHformer (i) employs a low-parameter patch-based graph-attention encoder to jointly model latent relationships among sensor channel-temporal dynamics; (ii) introduces a trainable prototype pool together with a multi-layer decomposition network to improve intra-class compactness and inter-class separability via prototype alignment; and (iii) introduces a decomposition-stable LSH-based attention mechanism tailored for HAR, whose core design couples prototype-guided feature decomposition with locality-sensitive hashing to ensure that semantically related tokens remain consistently grouped in the same hash bucket even after decomposition-induced attenuation. The mechanism thereby operates at O(LlogL) attention complexity on longer sensor sequences. Extensive experiments on five public benchmarks (WISDM, UCI-HAR, PAMAP2, Opportunity, and UniMiB-SHAR) show that DSHformer achieves accuracies of 98.6%, 93.7%, 98.4%, 88.5%, and 96.6%, respectively, achieving competitive or superior performance compared with both Transformer variants and HAR-specific baselines under the adopted benchmark protocols. Ablation studies further confirm the complementary contribution of each component.
Rapid retrieval of marine oil spill data from SAR and optical remote sensing is vital for emergency response. Addressing modal confusion and insufficient cross-scale feature utilization, we propose DCS-ViT, a bimodal deep hashing framework specifically optimized for intra-modality retrieval within large-scale hybrid SAR and optical databases. It incorporates three core modules: (1) A dual-modal dedicated ViT encoder that separates and extracts features from SAR (focusing on noise suppression) and optical (focusing on detail enhancement) images; (2) a cross-scale dynamic fusion module that adaptively integrates multi-granularity features via a dynamic attention mechanism; (3) a modality-tagged hash encoding layer that explicitly distinguishes modalities while reducing parameter redundancy. We validated the framework using a new OilSpill-Multidata dataset containing 75,086 images. Results demonstrate that DCS-ViT outperforms mainstream CNN and ViT-based hashing algorithms, achieving mAP improvements across various code lengths. Furthermore, an interactive web demonstration system based on the Flask framework was developed to provide intuitive visualization of the retrieval process. Compared to traditional manual screening, DCS-ViT accelerates retrieval efficiency and accuracy, offering a robust tool for large-scale multimodal oil spill remote sensing image management.
Smartphone manufacturers' enhanced privacy and security measures, such as File-Based Encryption (FBE), have disrupted traditional data extraction techniques, necessitating the adoption of Full File System Extraction (FFS). FFS requires booting a smartphone, decrypting its UserData partition, and accessing files individually, a process that risks data modifications caused by postboot application activity and network connections. This study evaluates the impact of FFS on evidence integrity by analyzing hash value changes across repeated acquisitions from Android smartphones. Using mobile forensic tools and ADB (Android Debug Bridge) for validation, we assessed whether FFS complies with the principles of repeatability and reproducibility. Files were categorized into five potential forensic relevance classes to evaluate how hash value changes affect the reliability of digital evidence. Results highlight that system-generated files and logs are prone to changes during FFS, while user-generated content largely retains integrity. To address these challenges, we suggest two possible solutions. The first is a technical approach that uses an initial reference image to identify and restore altered files, effectively mitigating hash value discrepancies. The second is a procedural measure emphasizing detailed documentation and systematic management of acquisition changes, particularly for newly created files. These findings and proposed approaches aim to improve the reliability of FFS in digital forensics, ensuring evidence admissibility and supporting cross-validation across forensic tools. This research contributes to advancing standardized practices for smartphone evidence acquisition in forensic investigations.
IoT devices face mounting security threats: tightly bounded computing budgets on one side, the imminent arrival of quantum adversaries on the other. We present a bidirectional authentication protocol that pairs Physical Unclonable Functions with hash-based primitives so that no long-term key ever sits in device-side non-volatile memory. The scheme is best characterised as Grover-tolerant rather than fully post-quantum: hash one-wayness incurs a quadratic-only slowdown under quantum search, and PUF unclonability adds a physical, non-cryptographic uniqueness assumption - we do not claim parity with NIST-standardised lattice or code-based schemes. Mutual authentication is achieved with conditional forward secrecy and weak unlinkability against external eavesdroppers, while delivering 24.3 ms latency, 15.7 mJ per-authentication energy, and a 136-byte exchange on ESP32. ProVerif symbolic verification, complemented by a sketched QROM reduction with explicit PUF-leakage modelling, covers replay, MITM, impersonation, and physical-capture adversaries. Environmental stressing from 0 °C to 70 °C confirms practical reliability. We deliberately omit ephemeral key exchange and anonymous credentials - the price of doing so is the conditional and partial (rather than full) flavours of forward secrecy and anonymity, which we make explicit throughout.
Unstructured volumetric meshes serve as fundamental data representations in various scientific simulations and analyses. They play a crucial role in representing complex computational domains and are essential for important numerical techniques, such as finite element analysis. Whenever such a mesh is read from a file, streamed in-situ, or generated by algorithms, scientific visualization libraries rely on calculating the external surface of a geometry, named "external facelist", to produce a polygonal mesh for rendering. Consequently, external facelist calculation has become one of the most widely used algorithms in the scientific visualization domain, necessitating optimal performance. In this paper, we explore relevant work on external facelist calculation algorithms in two common visualization libraries, VTK and Viskores, assess their performance and memory constraints, and introduce a novel memory-aware external facelist calculation algorithm employing an atomic hash counting approach. This algorithm fully leverages Viskores' data-parallel primitive operations, facilitating its execution across diverse many-core architectures. Our algorithm features the lowest memory footprint on the GPU and the second-lowest on the CPU among all evaluated methods, and it also delivers the fastest performance on both CPU and GPU. It has been made available under an open-source license in the VTK and Viskores visualization systems.
The use of fog computing is on the rise, adding new dimensions to security and, more specifically, to data protection in fog cloud environments. Storing fog-computing data increases the likelihood of data exploitation when it is uploaded to fog-computing storage. In this paper, Adaptable User Activity Tracking (ASUT) is introduced, integrating AES-256, SHA-512, and user activity tracking (UAT). The need to integrate activity monitoring into the ASUT to collect statistical information on user actions has been stated. The file uploaded to the fog computing storage is encrypted using a 256-bit AES key. Then, this key is hashed with SHA-512 and stored in the fog cloud. The AES expansion is used to decrypt the data, while the SHA-512 hash of the AES key is used to verify that the user-provided key matches the original before decryption proceeds-the hash is irreversible, and the original key is never stored in plaintext. The user must know the initial key to access the file further. When the client re-enters the fog, the algorithm compares the hashes of the two: the initial and the second entry keys. In parallel, the fog cloud broadcasts the user's actions to track any abnormal activity on the account. This mechanism helps mitigate risks of unauthorized data access and suggests ways to improve user protection. The proposed ASUT is designed using Python and PHP. Experimental results show that ASUT achieves 43.39% faster encryption, 66% faster decryption, and 19.86% higher throughput compared to the best-performing competing method, indicating improved computational efficiency and practical feasibility under the evaluated conditions.
Many organisations collect sensitive data that cannot be freely shared. Hospitals store brain magnetic resonance imaging (MRI) scans on internal servers; banks keep transaction records behind strict firewalls; agricultural services retain crop images in isolated repositories. Federated learning (FL) allows models to be trained without centralising raw data, yet most existing systems address a single domain and offer limited insight into model behaviour and provenance over time. BlockFedX is a cross-domain federated learning system designed to address three simultaneous tasks: credit card fraud detection on tabular data, brain tumour detection on MRI images, and plant disease recognition on leaf images. These three domains were deliberately selected because they represent the principal data modalities in real-world privacy-sensitive deployments-structured tabular records, greyscale medical images, and colour natural images-and because public benchmark datasets exist for all three, enabling reproducible evaluation. The system uses a shared backbone that is updated only where model layers have compatible tensor shapes, while domain-specific output layers remain local at each client. Explanations are computed at the clients using SHAP feature-attribution for tabular data and Grad-CAM visual heatmaps for images; the server receives only compact statistical summaries. The server also applies a distance-based anomaly test on client updates and records model hashes, explanation summaries, and anomaly flags in a hash-chained ledger. Experiments on three public datasets under non-identical client data distributions show that BlockFedX achieves an average fraud-detection F1-score of 0.92, 74.32% mean validation accuracy on BrainMRI, and 77% test accuracy on PlantVillage, while keeping all raw data local. These results are below strong centralised baselines, as expected under compact models and non-IID splits, but the system simultaneously provides three properties rarely combined in prior work: cross-domain federated training via a shape-safe backbone, client-side explanations integrated into the learning loop, and a lightweight tamper-evident record of model evolution across rounds.
Aircraft engine blade maintenance relies on inspection records shared across manufacturers, airlines, maintenance organizations, and regulators. Yet current systems are fragmented, difficult to audit, and vulnerable to tampering. This paper presents BladeChain, a blockchain-based system providing immutable traceability for blade inspections throughout the component life cycle. BladeChain is the first system to integrate multi-stakeholder endorsement, automated inspection scheduling, AI model provenance, and cryptographic evidence binding, delivering auditable maintenance traceability for aerospace deployments. Built on a four-stakeholder Hyperledger Fabric network (OEM, Airline, MRO, Regulator), BladeChain captures every life cycle event in a tamper-evident ledger. A chaincode-enforced state machine governs blade status transitions and automatically triggers inspections when configurable flight hour, cycle, or calendar thresholds are exceeded, eliminating manual scheduling errors. Inspection artifacts are stored off-chain in IPFS and linked to on-chain records via SHA-256 hashes, with each inspection record capturing the AI model name and version used for defect detection. This enables regulators to audit both what defects were found and how they were found. The detection module is pluggable, allowing organizations to adopt or upgrade inspection models without modifying the ledger or workflows. We built a prototype and evaluated it on workloads of up to 100 blades, demonstrating 100% life cycle completion with consistent throughput of 26 operations per minute. A centralized SQL baseline quantifies the consensus overhead and highlights the security trade-off. Security validation confirms tamper detection within 17 ms through hash verification.
Although quantum key distribution (QKD) provides information-theoretically secure keys, a significant gap exists between its relatively low key generation rate and the high-speed demands of classical optical communication. To address this challenge, this paper proposes a stream cipher encryption scheme based on QKD and Secure Hash Algorithm 256 (SHA-256). Core to the scheme is an innovative group-wise dual-randomized affine construction. Integrated with configurable grouping and parallel processing, this mechanism applies a linear mapping defined by dynamic initial vectors (IV) and variable full-domain strides to each data group. This approach achieves deep orthogonalization of the hash input space while preserving architectural parallelism. Simulation results demonstrate that the scheme achieves high-strength security protection while ensuring real-time performance of the communication link. Specifically, statistical analysis confirms that the ciphertext exhibits high entropy and negligible autocorrelation, alongside high key sensitivity. The scheme expands the computational search space beyond the conventional 2256 bound, effectively countering pre-computation attacks.
The secondary use of clinical data from hospitals is increasingly demanded by both industry and academia. To balance analytical utility with patient privacy, we developed and implemented a lightweight, hospital-based data-provision framework within the Osaka University Life Design Innovation initiative. For consented participants, anonymized research IDs and hash-based data indices were created, enabling temporal reconstruction through relative days without retaining actual dates. The internally managed data catalog, composed only of research IDs, observation indices, and hash-based data indices, was separated from the dataset, which remained securely stored on hospital servers, while overall coordination and access governance were provided through the PLR platform. The framework operates through three layers-hospital, PLR platform, and secondary users-ensuring clear data governance and operational scalability. It was applied to a perinatal cohort of 22 participants, generating four data categories from the hospital data warehouse. FAIR evaluation demonstrated high maturity in Findability, Accessibility, and Reusability, while Interoperability remained limited due to non-standardized, non-machine-readable formats. This framework provides a practical and scalable model for FAIR-oriented secondary data use, supporting secure and efficient data sharing in medical institutions.
Cone-beam computed tomography (CBCT) is vital for clinical imaging, but reconstructing data acquired from circular trajectories inherently suffers from severe large-cone-angle artifacts (LCAA) due to null space deficiency. While existing methods can reduce artifacts, they typically require extensive prior information or paired data, limiting clinical applicability. To address these issues, we propose an unsupervised Neural Radiance Fields framework (LCAA-NeRF) to suppress LCAA directly from projection data. First, we design an axial-aware anisotropic scaling hash encoding (A3Hash) mechanism to enhance the representational capacity for large cone angles. Second, we introduce a path-length-adaptive ray sampling (PLARS) strategy to dynamically capture features across varying ray paths. Finally, we incorporate a stochastic structural similarity (S3IM) loss to further enforce geometric consistency. The superiority and robustness of our method are validated across both simulated and real datasets under large cone angles.
Canada is home to a very active commercial legal cannabis market, offering insight into the opportunities and challenges posed by rapid cannabis commercialization under a hybrid regulatory model. This environmental scan provides a descriptive snapshot of available cannabis products, modes of consumption, and their advertised product potencies. Researchers extracted cannabis product type, mode(s) of consumption, and drug (THC, CBD) potency information from the online inventories of ten large Ontario, Canada cannabis retailers. Products and modes were categorized according to Canadian Cannabis Survey (CCS) categories where applicable. Ontario retailers sell cannabis products including dried flower, hash/hashish, oils, vape pens, concentrates, edibles, beverages and topicals. Drug potencies varied within and across product types and were reported in non-standardized units. Modes of consumption included inhalation (smoking/vaping/dabbing of a range of flower and concentrate products), oral ingestion (eating, drinking), mucosal absorption (sublingual, vaginal) and topical, transdermal applications. We identified new products, terms, and features (e.g. multimodal products), not described in resources informed by the CCS. Ontario's hybrid cannabis retail environment demonstrates substantial product and labeling diversity. These findings underscore the importance of adaptable surveillance frameworks capable of capturing evolving products, modes, and potency reporting practices. This scan offers a structured, replicable approach to monitoring cannabis product and mode trends in a rapidly evolving legal market context and may inform regulatory and public health discussions internationally. Not applicable.