With the rising density of the elderly population, community parks, due to their high accessibility and flexibility, have become crucial resources for enhancing the physical activity levels of older adults and promoting inclusive elderly care. However, the relationship between the composition and configuration of landscape spaces and the behavioral intention and emotional benefits of older adults remains unclear. This study investigated the landscape spaces of community parks in three Chinese cities. Real-time psychological and physiological data were collected from participants by viewing sample photos, and physiological data on differential emotional arousal induced by specific visual stimuli were extracted using deconvolution techniques. Through descriptive statistics, differential analysis, correlation analysis, and mediation effect analysis, a mediation model comprising "visual attributes of park landscape space, behavioral intention for active activities, and emotional benefits" was constructed to explore the effective enhancement of the emotional states of older adults by different visual attributes of landscape spaces in community parks. (1) The landscape space visual attributes regarding positive emotional arousal can be ranked as landscape spatial enclosure (LSE) > landscape spatial hue (LSH) > green space accessibility (GSA) > artificiality of landscape space (AoLS) (p < 0.001). (2) Landscape spaces with lower enclosure degrees could significantly improve the emotional benefits of the younger older adults (p < 0.001). However, their behavioral intention was attenuated as the enclosure degree decreased (p < 0.001), which resulted in impaired effectiveness (p < 0.001). (3) For the younger older adults, both GSA and AoLS influenced emotional benefits via behavioral intention (p < 0.001), with GSA showing a partial mediation effect and AoLS exhibiting a suppressor (inconsistent) mediation effect. An increase in green accessibility was correlated with an increase in behavioral intention (p < 0.05), while artificiality was negatively correlated with both behavioral intention (p < 0.001) and emotional benefits. (4) Emotional benefits were higher in cold color landscape spaces compared with warm color ones (p < 0.001). Meanwhile, the indirect effect of LSH via behavioral intention was small and negative; however, this pathway accounted for a relatively minor proportion of the total effect and was outweighed by the significant positive direct effect of cool tones on emotional benefits (p < 0.001). Among the four visual attributes of landscape spaces, landscape enclosure has the most significant impact on the level of positive emotional arousal. However, the mechanism by which visual perception of thermal comfort effectively enhances the role of enclosure in emotional benefits has previously been overlooked by researchers and warrants further investigation in future studies to better understand the role of visual factors in effectively improving the emotional benefits for older adults.
Human-to-robot (H2R) handovers are critical in human-robot interaction but are challenged by complex environments that impact robot perception. Traditional RGB-based perception methods exhibit severe performance degradation under harsh lighting (e.g., glare and darkness). Furthermore, H2R handovers occur in unstructured environments populated with fine-grained visual details, such as multi-angle hand configurations and novel object geometries, where conventional semantic segmentation and grasp generation approaches struggle to generalize. To overcome lighting disturbances, we present an H2R handover system with a dual-path perception pipeline. The system fuses perception data from a stereo RGB-D camera (eye-in-hand) and a time-of-flight (ToF) camera (fixed scene) under normal lighting, and switches to the ToF camera for reliable perception under glare and darkness. In parallel, to address the complex spatial and geometric features, we augment the Point Transformer v3 (PTv3) architecture by integrating a T-Net module and a self-attention mechanism to fuse the relative positional angle features between human and robot, enabling efficient real-time 3D semantic segmentation of both the object and the human hand. For grasp generation, we extend GraspNet with a grasp selection module optimized for H2R scenarios. We validate our approach through extensive experiments: (1) a semantic segmentation dataset with 7500 annotated point clouds covering 15 objects and 5 relative angles and tested on 750 point clouds from 15 unseen objects, where our method achieves 84.4% mIoU, outperforming Swin3D-L by 3.26 percentage points with 3.2× faster inference; (2) 250 real-world handover trials comparing our method with the baseline across 5 objects, 5 hand postures, and 5 angles, showing an improvement of 18.4 percentage points in success rate; (3) 450 trials under controlled adverse lighting (darkness and glare), where our dual-path perception method achieves 82.7% overall success, surpassing single-camera baselines by up to 39.4 percentage points; and (4) a comparative experiment against a state-of-the-art multimodal H2R handover method under identical adverse lighting, where our system achieves 75.0% success (15/20) versus the baseline's 15.0% (3/20), further confirming the lighting robustness of our design. These results demonstrate the system's robustness and generalization in challenging H2R handover scenarios.
Augmented reality and related extended-reality technologies have been increasingly investigated in urology to support procedures characterized by complex three-dimensional anatomy and limited intraoperative visualization. This review synthesizes recent original evidence on augmented reality/extended-reality applications in urology across clinical practice and training, with a focus on procedural planning, intraoperative guidance, and educational outcomes. A total 25 studies were identified. In Endourology, randomized studies in percutaneous nephrolithotomy (58-175 patients) showed improved anatomical understanding, shorter renal access times (50-60% reduction), changes in access strategy in 30% of cases, higher stone-free rates, and fewer intermediate-grade complications, with inconsistent effects on operative duration and fluoroscopy exposure using augmented reality/extended-reality applications.In robotic urology, most evidence concerns oncological surgery. Feasibility and comparative studies in robot-assisted partial nephrectomy (20-105 patients) confirmed rapid augmented reality co-registration and acceptable perioperative safety. In radical prostatectomy, comparative and randomized data (92-133 patients) suggested lower positive surgical margin rates at preserved neurovascular bundles and improved early continence recovery, without consistent differences in short-term oncological outcomes. Applications to pelvic lymph node dissection and highly complex renal surgery remain exploratory.Educational and training applications represent the most mature domain, with randomized and validation studies (12-43 trainees) consistently demonstrating improved technical performance, procedural efficiency, and reduced cognitive workload using immersive or mixed-reality platforms, including remote training solutions. Current augmented reality/extended-reality applications in urology show reproducible benefits in anatomical understanding, procedural planning, and selected technical steps, particularly in endourology and surgical training. Clinical outcome evidence remains heterogeneous and largely limited to short-term or surrogate endpoints, while broader adoption is constrained by technical robustness, workflow integration, and scalability. Ongoing randomized studies and advances in automation and artificial intelligence-driven registration are expected to better define the role of augmented reality/extended-reality in routine urological practice.
Healthcare stakeholders are increasingly seeking comparative provider performance data to enhance data-driven decision-making and quality improvement. Traditional visualisations, like caterpillar plots, are often difficult for end users to understand and interpret. This study aimed to (1) obtain general feedback from end users on a newly proposed design solution for visualising a risk-adjusted hospital comparison and to develop an understanding of the key criteria they rely on in the evaluation process; (2) test the hypothesis that end users will better understand key messages and rate perceived usability higher with the new design solution than with a caterpillar plot. An end user-centred mixed methods study, involving end users of risk-adjusted hospital comparisons across all levels of the Swiss healthcare system, was conducted to evaluate the new design solution. In the qualitative phase, 14 end users from health authorities, insurers, hospital associations, and hospitals were surveyed in 10 semi-structured individual and group interviews, which were analysed using thematic analysis. In the quantitative phase, a non-clinical randomised controlled online trial (A/B testing) was conducted. In total, 200 of the targeted end users, comprising cantonal quality managers, hospital directors, and those responsible for quality and/or the 'National Prevalence Measurement' in hospitals, completed the questionnaire. The data were analysed using comparative descriptive and bivariate statistics. Thematic analysis revealed three key criteria that end users relied on when evaluating a risk-adjusted hospital comparison: (1) 'clarity by design', highlighting strategies for effectively conveying key messages of hospital comparisons; (2) 'usability by design', focusing on end user-centred functionalities and presentation elements; (3) 'suitability for quality development', addressing the conditions for creating a trustworthy and useful comparison to drive quality improvement. Quantitative analysis confirmed the hypothesis that end users understand key messages better and perceived usability is higher with the new design than with the caterpillar plot. The new design solution improves hospital comparison outputs for end users by combining clear displays with additional interactive features. The identified criteria underlying the evaluation should inform further design projects and research dealing with the visualisation of hospital comparisons. Not applicable.
Accurate differentiation between benign and malignant thyroid nodules remains challenging in clinical practice. Current deep learning approaches predominantly rely on single-modality analysis, failing to leverage complementary information from multiple clinical data sources. This study aims to develop and validate ThyroFusion, a multi-modal deep learning framework integrating ultrasound images, segmentation masks, and clinical text reports for improved thyroid nodule malignancy risk assessment. In this retrospective multi-center study, we developed ThyroFusion, a multi-modal fusion framework comprising: (1) a dual-stream ResNet-50 encoder with partially shared parameters for extracting features from ultrasound images and segmentation masks; (2) a Set Transformer module for aggregating variable numbers of image features; and (3) a bidirectional cross-modal attention mechanism for fusing visual and textual features extracted by frozen BioBERT. The framework was trained on 1472 cases from Xi'an International Medical Center Hospital and validated on four independent external test sets totaling 4530 cases from two clinical centers and two public datasets (DDTI and TN3K). Performance was compared against state-of-the-art deep learning models and radiologists with varying experience levels. ThyroFusion achieved an AUC of 0.937 (95% CI 0.914-0.960) on internal validation and 0.896 (95% CI 0.887-0.905) on combined external validation. Compared to single-modal approaches, ThyroFusion significantly outperformed ResNet-50 (AUC: 0.841), DenseNet-121 (AUC 0.848), EfficientNet-B4 (AUC 0.859), and Vision Transformer (AUC 0.835) on external validation (all p < 0.001). The model also outperformed senior radiologists (AUC 0.809) and demonstrated substantial improvement in junior radiologists' performance when used as an assistive tool (ΔAUC = 0.126). On public datasets, ThyroFusion achieved AUCs of 0.893 on DDTI and 0.881 on TN3K, demonstrating robust cross-domain generalization. ThyroFusion demonstrates robust performance in thyroid nodule malignancy risk assessment across multiple centers and public benchmarks, significantly outperforming state-of-the-art single-modal methods and experienced radiologists. The integration of visual and textual information through bidirectional cross-modal attention offers a promising tool for clinical decision support.
The quality and production of high-grade textiles largely depend on accurate recognition of weave patterns, which are traditionally identified through manual visual inspection. However, this approach is subjective, time-consuming, and error-prone. While machine learning methods offer automation, they often rely on handcrafted features sensitive to lighting and imaging variations, limiting their robustness and scalability. Even deep learning models face generalization issues due to domain shifts in real-world acquisition conditions, and to address these challenges, a novel deep learning framework that combines a Convolutional Neural Network (CNN) with Generative Adversarial Networks (GANs) for end-to-end fabric classification is proposed. The approach integrates geometric and photometric data augmentation with UNet-based image denoising, while the GAN component generates high-quality synthetic images to enhance training diversity and feature learning. Experiments on a woven fabric dataset demonstrate that this method achieves state-of-the-art performance, with a balanced accuracy of 99.1%, outperforming baseline models in accuracy, generalizability, and robustness to visual distortions. This framework offers a scalable and reliable solution for automated textile inspection, with significant implications for improving efficiency and reducing manual labor in industrial fabric manufacturing. This work integrates CNN, GAN, and U-shaped Convolutional Neural Network (UNet)modules into a single optimization-based learning pipeline rather than handling denoising, data creation, and classification as distinct processes. The joint training mechanism creates a feedback loop between the generative and discriminative networks, going a step further methodologically than standard fine-tuning practices.
Nucleus segmentation in immunohistochemistry (IHC) images plays a critical role in cancer diagnosis and treatment assessment. However, existing methods remain limited in segmentation accuracy and boundary delineation due to staining heterogeneity, densely packed cell distributions, and complex background interference. To address these challenges, this paper proposes a two-stage nucleus segmentation framework, termed SAM2HIPT. In the first stage, the pre-trained Segment Anything Model 2(SAM2) is employed to generate initial segmentation predictions for input images, wherein the image encoder is kept frozen to preserve the pre-trained visual representation capacity while the mask decoder is fine-tuned to adapt to the characteristics of the pathological image domain; local texture, morphological, and boundary information are extracted through visual feature encoding to produce initial nucleus segmentation masks and spatial prior representations. In the second stage, the Hierarchical Image Pyramid Transformer(HIPT) is introduced to refine the initial segmentation results, performing multi-scale, multi-level feature representation and fusion of morphological, textural, and spatial structural information through a hierarchical vision Transformer architecture, thereby enhancing nuclear structural representation and boundary consistency. To enable collaborative optimization across both stages, a joint loss function is designed to impose unified constraints on segmentation accuracy and feature representation. Evaluated on two public histopathological benchmark datasets, BCData and DeepLIIF, the proposed method achieves Dice coefficients of 0.92 and 0.91, respectively, and HD95 boundary error values of 1.05 pixels and 1.10 pixels, demonstrating superior segmentation performance and robustness over multiple state-of-the-art baseline methods.
With the rapid expansion of smart cities and the growing need for accurate and real-time analysis of road infrastructure, the development of artificial intelligence systems capable of perceiving, analyzing, and recording environmental information has become increasingly vital. In this context, the present study introduces a novel system for the detection of Iranian traffic signs, making a significant contribution not only in the data collection domain but also in the detection of visual data. A comprehensive and unique dataset is constructed, consisting of 14,111 images and 19,000 traffic signs across 118 distinct classes, has been collected over a two-year period under diverse temporal conditions (morning, noon, dusk, and night) and throughout all four seasons, covering urban, rural, and intercity areas across the country. The image annotation process was conducted with high precision using the MakeSense tool in two standard formats: YOLO (.txt) and Pascal VOC (.xml). Subsequently, an automated detection system was developed based on the advanced deep neural network model YOLOv12, enabling precise identification of traffic signs. The model's performance, evaluated through 6-Fold Cross Validation, demonstrated outstanding accuracy achieving an mAP@50 of 96%. These results highlight the model's remarkable efficiency in real-world scenarios and its superiority over previous YOLO versions. Beyond detection capabilities, the proposed system can be employed for the extraction of digital traffic sign maps, serving as a foundational tool for navigation systems, intelligent vehicles, spatial analytics, and the development of a national traffic sign map of Iran and other similar countries. By integrating cutting-edge technologies, data-driven localization, and state-of-the-art deep learning architectures, this research represents a significant step toward a smarter and safer future for the nation's road networks.
Audio-Visual Speech Recognition (AVSR) has been studied for a long time in the literature. By leveraging the complementary information from both acoustic and visual modalities, this approach offers a promising solution for robust speech transcription. While recent AVSR models have achieved impressive performance on large-scale, uniformly distributed datasets, they often overlook the challenges posed by real-world scenarios-where data is collected across multiple sessions and environments, leading to significant domain shifts and heterogeneous distributions. Such heterogeneity can result in catastrophic forgetting and hinder the generalization ability of the conventional models. To bridge this gap, we introduce the Continual Audio-Visual Speech Recognition (CL-AVSR) problem, which formulates AVSR as a continual learning task. We establish a dedicated benchmark for CL-AVSR by designing three experimental scenarios that reflect real-world challenges: introducing varying background noise for the audio stream, degrading video quality for the visual stream, and dividing tasks by speaker characteristics to jointly affect both modalities. These scenarios systematically evaluate the model's ability to adapt and retain knowledge across dynamic and non-stationary data streams. To address the unique challenges of CL-AVSR, we propose the Interaction-enhanced Multimodal Prompt learning (IMP) framework. IMP builds upon a pre-trained AV-HuBERT backbone and integrates task-relevant soft prompts with cross-modal and cross-task interactions, enabling efficient knowledge transfer from high-quality source domains to typical low-quality target domains with minimal parameter overhead. The interactive prompts facilitate fine-grained alignment and adaptation between modalities and tasks, while contrastive regularization further mitigates catastrophic forgetting. Furthermore, we devise a multi-modal prompt selection strategy that leverages clustering-based feature analysis, empowering the model to dynamically select optimal prompts for unseen data distributions during inference. Extensive experiments on the LRS2 dataset demonstrate that IMP achieves substantial improvements over strong baselines, setting new state-of-the-art performance in all CL-AVSR scenarios. Our results highlight the effectiveness of IMP in enhancing continual learning capabilities for AVSR, paving the way for more robust and adaptable multi-modal speech recognition systems in real-world applications.
Conventional structural design studies often prioritize mechanical metrics, yet lack a unified narrative that renders the aesthetic expression of form both quantifiable and verifiable. To address this gap, we develop a GAN-based framework for biomimetic topology fusion generation, leveraging Cycle-Consistent GANs (CycleGAN) to learn bidirectional mappings and morphological translations between two classes of natural prototypes under unpaired supervision: performance-oriented morphologies (e.g., dragonfly wing venation and leaf venation), which exhibit high structural efficiency but comparatively weak visual order, and aesthetics-oriented patterns (e.g., honeycomb cells and pinecone spirals), which display pronounced geometric regularity and proportional structure but limited load-bearing capacity. Through cross-domain translation and fusion, the model synthesizes hybrid topological textures that simultaneously encode cues of structural robustness and ordered geometric features. These synthesized morphologies are subsequently validated via flexural (bending) testing in terms of load-carrying capacity and energy absorption efficiency, and are objectively characterized by a multi-metric aesthetic quantification scheme-computed on binary, vectorized structural maps-covering symmetry, complexity, and order. Across multiple morphology-pair settings, the fusion-generated structures exhibit a more balanced overall profile in both mechanical response and aesthetic metrics, indicating effective synergy between engineering usability and visual expression. In addition, we provide an application example in conceptual form design for orthopedic exoskeletal products, illustrating the cross-domain potential of the proposed approach at the interface of engineering design and aesthetic design.
The annual mortality rate from prostate cancer (PCa), a common malignant neoplasm affecting middle-aged and elderly men, is on the rise. Biparametric magnetic resonance imaging (bpMRI) is indispensable to PCa imaging analysis since it can capture distinct disease-related information from two modalities that exhibit synergistic performance. The majority of state-of-the-art PCa diagnostic techniques currently available are focus on a single modality or task, neglecting the information sharing across the two modalities and task correlations inherent in multi-task learning. We provide a dual-modality image fusion and multi-task learning model that can accomplish both automatic PI-RADS grading and prostate and PCa region segmentation simultaneously. First, to extract complementary information between the prostate and PCa in bimodal images via T2-weighted imaging (T2WI) and diffusion-weighted imaging (DWI) feature extraction, a shared block fusion module and an independent encoder block were developed; Subsequently, in the encoder stage, the dual visual attention module was designed to extract features from multiple receptive field and deliver more accurate contextual information, and a novel decoder was designed to effectively integrate encoder features, yielding more refined global and local detail information; Next, to capture more precise detail information during the classification task stage, a high-level feature fusion technique was developed; To address class imbalance, a multitask mixed loss function is finally suggested. The segmentation results of prostate and PCa on multiple diverse male pelvic MRI datasets demonstrate the superior performance of our proposed method. Both the basic performance evaluation and comparative model evaluation of the proposed model have validated its effectiveness in prostate and PCa segmentation as well as PI-RADS automatic grading. External validation on the independent PROMISE12 dataset further confirms the strong generalizability of our model across different institutions, scanning devices and patient cohorts.
Accurate prediction of drug-target interactions (DTIs) and drug-disease interactions (DDIs) are critical for accelerating the drug discovery process. However, conventional unimodal approaches struggle to capture the complex biochemical and pharmacological relationships between drugs and targets, thereby limiting model accuracy and generalisability. Hence, it is essential to develop innovative approaches that can enhance predictive performance. To overcome these limitations, we introduce DrugGPS as a novel multimodal predictive framework. DrugGPS integrates heterogeneous biological data, using a multi-channel feature fusion strategy guided by attention mechanisms for highly accurate DTI and DDI prediction. DrugGPS integrates structural and sequential representations, biological relational networks and similarity-based graphs to learn enriched feature embeddings. It adopts an attention-based fusion module to distil and integrate cross-channel information, strengthening its ability to characterise complex chemical-biological interactions. Additionally, the framework incorporates MeSH-derived disease features to unify the modelling of drug-target-disease associations, thus providing deeper insights into therapeutic mechanisms. A case study on the mineralocorticoid receptor (MR) verifies its utility with the model by identifying four experimentally validated active compounds. An interactive visualisation platform was developed to explore predicted DTI and DDI interactions. Extensive experiments on public datasets show that DrugGPS outperforms state-of-the-art methods in accuracy, robustness and computational efficiency, demonstrating its potential for intelligent drug discovery and repositioning. These findings indicate the potential of DrugGPS to improve DTI and DDI interaction predictions for drug repositioning, and its potential in accelerating drug development and discovery.
The noise of Magnetic Resonance Imaging (MRI) poses challenges for Deep Learning (DL) when tumor boundaries are obscured, tumor location and appearance are complex due to overlap between tumor and non-tumor cells, and modality identification is difficult because tumor features vanish in the later layers of the DL. Effective feature extraction from given MRI is a possible solution to overcome this challenge. Therefore, we develop BrainFusionNet that combines Convolutional Neural Networks (CNNs), Vision Transformers (ViT), and Gated Recurrent Units (GRUs) to extract spatial, contextual, and sequential features from MRI images for improved brain tumor classification. Furthermore, explainable AI such as SHAP, LIME, and Grad-CAM are integrated to visualise and highlight image regions that contribute to BrainFusionNet's decision-making process. The proposed BrainFusionNet model is evaluated on two publicly available MRI datasets. K-fold validation suggests 98% accuracy on both datasets. The model was compared with the six state-of-the-art (SOTA) CNNs and transfer learning. Among the SOTA CNNs, DenseNet121 and VGG16 achieved the highest accuracy of 96%. The novelty of BrainFusionNet is that the hybrid model effectively extracts local and global features from MRI images, even in small-scale tumor regions and small tumor sizes. The model has a balanced sequential CNN architecture to capture low-level and deeper-layer features; a customized ViT that captures local features, stabilizes gradient flow, and reduces the risk of vanishing gradients during MRI image training. The CNN and ViT outputs are fed into a GRU for final classification. Furthermore, we analyze pixel intensities to determine whether MRI image quality affects image classification. Our findings are very novel in image interpretation, as we found that the distribution of pixel intensities in MRI images affects DL performance.
Land use and land cover (LULC) classification is essential for environmental monitoring, urban planning, and resource management. This study explores the performance of three state-of-the-art deep learning architectures, MobileNetV3, ResNet34, and GoogleNet, which were enhanced with transfer learning, data augmentation, and adaptive learning rate scheduling. We evaluate these models on two benchmark datasets: EuroSAT, consisting of Sentinel-2 satellite imagery across 10 land cover classes, and PatternNet, a high-resolution aerial dataset with 38 diverse classes. The results demonstrate that MobileNetV3 achieved the highest overall accuracy (97.83% on EuroSAT and 99.23% on PatternNet) with minimal inference time, making it ideal for real-time applications. ResNet34 achieved 97.56% and 99.06% accuracy, respectively, excelling in classifying complex, visually similar classes due to its residual learning blocks. GoogleNet's balanced performance and efficiency achieved 97.36% and 99.58% accuracy across both datasets. An ablation study confirmed that data augmentation, transfer learning, and learning rate scheduling contributed to improvements in accuracy of 5-13%. This research highlights the effectiveness of modern deep learning architectures and optimized training pipelines for LULC classification across diverse datasets, providing a foundation for future advancements in cross-domain remote sensing applications.
Text-to-video retrieval refers to the task of finding the most relevant videos in a large-scale unlabeled video collection based on a given text query. In recent years, CLIP-based text-video retrieval methods have developed rapidly, with research primarily focusing on feature-enhancement techniques and interaction strategies. However, due to the concise nature of text and the rich modalities of video, computing similarity scores alone is insufficient for high-precision cross-modal retrieval. To address the inherent imbalance in cross-modal matching, we propose a novel text-video retrieval model, named V-Sparse, which includes visual semantic compression for feature enhancement and coarse-to-fine alignment for feature interaction. First, we propose a text-guided Visual Semantic Compression (VSC) module, consisting of Temporal (TVSC) frame-level and Spatial (SVSC) patch-level compression, aimed at reducing feature redundancy and providing precision support for coarse-to-fine interaction. Second, benefiting from visual semantic compression, we propose a novel Coarse-to-Fine granularity Interaction module (CFI), which aligns sentences with frames, sentences with patches, and words with patches from a unified joint feature encoding perspective. VSC and CFI jointly facilitate cross-modal text-video alignment from the perspectives of feature enhancement and feature interaction, greatly mitigating the inherent imbalance in modal pairing. We evaluate the performance of V-Sparse on six benchmark datasets and achieve state-of-the-art results in both long-video and short-text retrieval. Importantly, V-Sparse demonstrates the importance of feature compression in cross-modal interaction through extensive ablations and offers an effective intermediate pathway for modality interaction. Code will be available at https://github.com/OPA067/V-Sparse.
This study aimed to clarify the effects of the following cooling garments on performance during and after vigorous, heart-rate-clamped exercise under hot and humid conditions: base layers made of cross-shaped fibers (C), sugar alcohol-printed base layers (S), and a combination of S with a fan-attached jacket (S+F). Fifteen healthy male participants wore the cooling garments and rested for 20 min in a room set to ~30°C and ~60% relative humidity. The participants then completed a 20-min cycle ergometer exercise with heart rate clamped at 65% of heart rate reserve and rated their perceived exertion (RPE). Before and after exercise, we assessed thermal, comfort, and wetness sensations and measured body temperature, vertical jump height, ground reaction force during rising from the chair, visual reaction time, and the Stroop interference. Cooler sensations were consistently reported in the order of S+F, S, and C. Despite the lowest RPE, pedaling load was highest in S+F. Sweat loss was comparable among the conditions, while garment sweat absorption and post-exercise skin temperature were lowest in S+F. These results suggest that S+F improves endurance performance under hot and humid conditions through efficient evaporative heat loss mainly facilitated by increased airflow from the fans.
Accurate detection and segmentation of moving objects constitute a fundamental challenge in computer vision, particularly for intelligent video surveillance systems operating under variable illumination, dynamic backgrounds, and environmental noise. This paper presents a fully unsupervised dual-phase motion analysis framework that effectively combines statistical independence modeling and geometric contour evolution to achieve high-precision motion detection and segmentation. In the first phase, an enhanced Fast Independent Component Analysis (Fast-ICA) algorithm is employed to perform statistical decomposition of video sequences, exploiting temporal independence to distinguish moving foregrounds from static backgrounds. This process generates an initial motion mask with strong robustness to illumination variation and noise artifacts. In the second phase, a hybrid level set segmentation model integrating the global Chan-Vese formulation and a locally adaptive Yezzi-based energy function refines object boundaries through an adaptive energy minimization process. A stabilization term and a self-regulating convergence criterion are further incorporated to ensure contour smoothness, numerical stability, and resilience to topological changes. Comprehensive experiments conducted on the CDNet-2014 benchmark dataset demonstrate that the proposed method achieves an average recall of 0.9613, precision of 0.9089, and F-measure of 0.9310, outperforming several state-of-the-art supervised, semi-supervised and unsupervised background subtraction algorithms. The proposed Fast-ICA-Level Set fusion framework thus provides a robust, adaptive, and computationally efficient solution for real-world intelligent surveillance and autonomous visual monitoring applications.
Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios. Code and data will be released. Project page: https://sixiaozheng.github.io/VidCRAFT3/.
T cell receptor (TCR) and peptide interactions (TPI) are one of the most important parts of T cell immunity. Experimental identification of TPI is time-consuming and labor-intensive; therefore, it is necessary to develop computational prediction method that exploit existing data to predict TPI. We use huge TCR and peptide sequences to pre-train two language models (∼152M parameters), respectively, and integrate them into a sequence-based only prediction framework (i.e., RoBERTcr) with supervised fine-tuning (SFT). Visualization of amino acids embedding from pre-trained language model (PLM) shows biochemical clusters based on different properties, and our PLMs outperform existing protein language models (i.e., ESM and ProtTrans) under the same condition. RoBERTcr achieved higher performance than other state-of-the-art methods based on structures or sequences without dataset bias. The visualization of attention from our framework implies valuable spatial information that residues in TCR contacting peptides are the key to their interaction. RoBERTcr is free available at https://fca_icdb.mpu.edu.mo/robertcr/ and https://zenodo.org/records/19042627. Supplementary data are available at Bioinformatics online.
Transorbital neuroendoscopic surgery (TONES) has adopted an increasingly prominent role as a minimally invasive technique for the management of orbitocranial pathology. It can be applied to a range of conditions, from intraorbital lesions to complex tumors that cross the boundaries between the orbit and cranial fossa. The primary approaches include the superior lid crease incision and lateral retrocanthal incisions, with other alternatives including the precaruncular and inferior transconjunctival approaches. This review will examine the role of the oculoplastic surgeon in multidisciplinary TONES procedures. Cadaveric and clinical studies have defined anatomical corridors and demonstrated the technical feasibility and functional outcomes of providing access to the anterior and middle cranial fossae. Adjunctive maneuvers, such as a lateral orbitotomy or creation of extra-orbital portals, can be performed to improve surgical freedom and access. The choice of technique can be tailored to the type and location of pathology and surgical objectives regarding diagnosis and resection.