Accurate vehicle analysis from aerial imagery has become increasingly vital for emerging technologies and public service applications such as intelligent traffic management, urban planning, autonomous navigation, and military surveillance. However, analyzing UAV-captured video poses several inherent challenges, such as the small size of target vehicles, occlusions, cluttered urban backgrounds, motion blur, and fluctuating lighting conditions which hinder the accuracy and consistency of conventional perception systems. To address these complexities, our research proposes a fully end-to-end deep learning-driven perception pipeline specifically optimized for UAV-based traffic monitoring. The proposed framwork integrates multiple advanced modules: RetinexNet for preprocessing, segmentation using HRNet to preserve high-resolution semantic information, and vehicle detection using the YOLOv11 framework. Deep SORT is employed for efficient vehicle tracking, while CSRNet facilitates high-density vehicle counting. LSTM networks are integrated to predict vehicle trajectories based on temporal patterns, and a combination of DenseNet and SuperPoint is utilized for robust feature extraction. Finally, classification is performed using Vision Transformers (ViTs), leveraging attention mechanisms to ensure accurate recognition across diverse categories. The modular yet unified architecture is designed to handle spatiotemporal dynamics, making it suitable for real-time deployment in diverse UAV platforms. The framework suggests using today's best neural networks that are made to solve different problems in aerial vehicle analysis. RetinexNet is used in preprocessing to make the lighting of each input frame consistent. Using HRNet for semantic segmentation allows for accurate splitting between vehicles and their surroundings. YOLOv11 provides high precision and quick vehicle detection and Deep SORT allows reliable tracking without losing track of individual cars. CSRNet are used for vehicle counting that is unaffected by obstacles or traffic jams. LSTM models capture how a car moves in time to forecast future positions. Combining DenseNet and SuperPoint embeddings that were improved with an AutoEncoder is done during feature extraction. In the end, using an attention function, Vision Transformer-based models classify vehicles seen from above. Every part of the system is developed and included to give the improved performance when the UAV is being used in real life. Our proposed framework significantly improves the accuracy, reliability, and efficiency of vehicle analysis from UAV imagery. Our pipeline was rigorously evaluated on two famous datasets, AU-AIR and Roundabout. On the AU-AIR dataset, the system achieved a detection accuracy of 97.8%, a tracking accuracy of 96.5%, and a classification accuracy of 98.4%. Similarly, on the Roundabout dataset, it reached 96.9% detection accuracy, 94.4% tracking accuracy, and 97.7% classification accuracy. These results surpass previous benchmarks, demonstrating the system's robust performance across diverse aerial traffic scenarios. The integration of advanced models, YOLOv11 for detection, HRNet for segmentation, Deep SORT for tracking, CSRNet for counting, LSTM for trajectory prediction, and Vision Transformers for classification enables the framework to maintain high accuracy even under challenging conditions like occlusion, variable lighting, and scale variations. The outcomes show that the chosen deep learning system is powerful enough to deal with the challenges of aerial vehicle analysis and gives reliable and precise results in all the aforementioned tasks. Combining several advanced models ensures that the system works smoothly even when dealing with problems like people being covered up and varying sizes.
The biological sensorimotor system is a source of inspiration for the design of neuromorphic ballistic control systems. A large portion of sensorimotor-inspired research focuses on the sensory encoding and information processing stages of the system. However, research on broader task-performance systems, involving actuator control on the output side remains scarce. In this work, we develop and train a neuromuscular-inspired model to perform ballistic control. In the model, a spiking neural network's output spikes are used to generate twitch-like signals. These twitches are the basis for generating a continuous fluctuating output signal that is used to operate an actuator. We refer to the the used model as the Twitch Neural Network (TwNN). As a test case, the model is trained to control the paddle of an adapted version of the game of Pong. An adapted version of the Direct Feedback Alignment learning rule, specifically for integrate-and-fire neurons, is introduced. The new rule avoids the update-locking problem of backpropagation, allowing network weight updates in parallel. The model output consists of one group of agonist-innervating motor neurons, and one group of antagonist-innervating motor neurons. We find that it is possible to teach a neuromuscular-inspired system to control the paddle in the game of Pong with the adapted Direct Feedback Alignment learning rule. The best-performing baseline model achieved a hit rate of 96%. By applying logarithmic scaling to the output activity, a hit rate of 98% could be achieved. Finally, by replacing the neuromorphically unrealistic exact summation steps with leaky integrators in training, the range of good learning parameters became more narrow and clear. The best-performing model reaches a hit rate of 99%. Threshold analysis during training has shown that learning is robust to a variety of neuron thresholds. Noise analysis has shown that the system is robust to membrane potential noise during inference for uniform noise up to values in the order of around 0.1-1% of the neuron threshold value per time step.
To enhance the obstacle avoidance performance and autonomous decision-making capabilities of robots in complex dynamic environments, this paper proposes an end-to-end intelligent obstacle avoidance method that integrates deep reinforcement learning, spatiotemporal attention mechanisms, and a Transformer-based architecture. Current mainstream robot obstacle avoidance methods often rely on system architectures with separated perception and decision-making modules, which suffer from issues such as fragmented feature transmission, insufficient environmental modeling, and weak policy generalization. To address these problems, this paper adopts Deep Q-Network (DQN) as the core of reinforcement learning, guiding the robot to autonomously learn optimal obstacle avoidance strategies through interaction with the environment, effectively handling continuous decision-making problems in dynamic and uncertain scenarios. To overcome the limitations of traditional perception mechanisms in modeling the temporal evolution of obstacles, a spatiotemporal attention mechanism is introduced, jointly modeling spatial positional relationships and historical motion trajectories to enhance the model's perception of critical obstacle areas and potential collision risks. Furthermore, an end-to-end Transformer-based perception-decision architecture is designed, utilizing multi-head self-attention to perform high-dimensional feature modeling on multi-modal input information (such as LiDAR and depth images), and generating action policies through a decoding module. This completely eliminates the need for manual feature engineering and intermediate state modeling, constructing an integrated learning process of perception and decision-making. Experiments conducted in several typical obstacle avoidance simulation environments demonstrate that the proposed method outperforms existing mainstream deep reinforcement learning approaches in terms of obstacle avoidance success rate, path optimization, and policy convergence speed. It exhibits good stability and generalization capabilities, showing broad application prospects for deployment in real-world complex environments.
Telepresence robots (TPRs) must co-navigate with humans in constrained hospital environments, where safety depends on anticipating rather than merely reacting to human motion. Existing approaches rarely integrate short-horizon human-motion forecasting with safety-constrained control, which reduces robustness in dense corridors and ward bays. This study addresses this gap by evaluating an anticipatory, safety-aware co-navigation framework for TPRs. We developed a modular framework that couples a lightweight transformer-based forecaster that predicts multi-agent trajectories under occlusion with a safe reinforcement learning (RL) controller. The forecaster produces short-term distributions over pedestrian states that are embedded into the RL policy state and cost as risk-aware occupancy features. Safety is enforced via constrained policy optimization augmented by a run-time control barrier function (CBF) shield that filters unsafe actions. We benchmarked the approach against a social-force or dynamic window approach (DWA), an attention-based crowd-RL policy, and model predictive control (MPC) with CBF. Experiments were conducted across two hospital-like benchmarks (a crowded corridor and a four-bed ward), totaling 2,400 episodes. Outcomes included task success, collision count, minimum human-robot clearance, near-miss events ( ≤ 0.3 m), time-to-goal, CBF violations, and ablations removing forecasting and the CBF shield. Relative to the best-performing baseline, the proposed method improved task success by 21.6% and reduced collisions by 47.3%. Median minimum human-robot clearance increased by 0.19 m, and near-miss events decreased by 38.5%. Time-to-goal was maintained within +2.7% of MPC+CBF while incurring zero CBF violations under the shield. Ablation studies showed that removing forecasting degraded success by 14.2%, whereas removing the CBF shield increased constraint breaches from 0% to 6.1% of steps. Anticipatory perception combined with Safe-RL yields substantially safer and more reliable telepresence co-navigation in human-dense clinical layouts without sacrificing efficiency. The framework is modular, enabling alternative forecasters and safety shields. Limitations include sensitivity to forecast drift during abrupt changes in crowd flow. Future work will explore on-device adaptation, shared-autonomy overlays to incorporate operator intent, and prospective evaluations in live hospital workflows.
During turning maneuvers in the galloping gait of quadruped animals, a strong relationship exists between the turning direction and the sequence in which the forelimbs make ground contact: the outer forelimb acts as the "trailing limb" while the inner forelimb serves as the "leading limb." However, the control mechanisms underlying this behavior remain largely unclear. Understanding these mechanisms could deepen biological knowledge and assist in developing more agile robots. To address this issue, we hypothesized that decentralized interlimb coordination mechanism and trunk movement are essential for the emergence of an inside leading limb in a galloping turn. To test the hypothesis, we developed a quasi-quadruped robot with simplified wheeled hind limbs and variable trunk roll and yaw angles. For forelimb coordination, we implemented a simple decentralized control based on local load-dependent sensory feedback, utilizing trunk roll inclination and yaw bending as turning methods. Our experimental results confirmed that in addition to the decentralized control from previous studies which reproduces animal locomotion in a straight line, adjusting the trunk roll angle spontaneously generates a ground contact sequence similar to gallop turning in quadruped animals. Furthermore, roll inclination showed a greater influence than yaw bending on differentiating the leading and trailing limbs. This study suggests that physical interactions serve as a universal mechanism of locomotor control in both forward and turning movements of quadrupedal animals.
Gait robots have the potential to analyze gait characteristics during gait training using mounted sensors in addition to robotic assistance of the individual's movements. However, no systems have been proposed to analyze gait performance during robot-assisted gait training. Our newly developed gait robot," Welwalk WW-2000 (WW-2000)" is equipped with a gait analysis system to analyze abnormal gait patterns during robot-assisted gait training. We previously investigated the validity of the index values for the nine abnormal gait patterns. Here, we proposed new index values for four abnormal gait patterns, which are anterior trunk tilt, excessive trunk shifts over the affected side, excessive knee joint flexion, and swing difficulty; we investigated the criterion validity of the WW-2000 gait analysis system in healthy adults for these new index values. Twelve healthy participants simulated four abnormal gait patterns manifested in individuals with hemiparetic stroke while wearing the robot. Each participant was instructed to perform 16 gait trials, with four grades of severity for each of the four abnormal gait patterns. Twenty strides were recorded for each gait trial using a gait analysis system in the WW-2000 and video cameras. Abnormal gait patterns were assessed using the two parameters: the index values calculated for each stride from the WW-2000 gait analysis system, and assessor's severity scores for each stride. The correlation of the index values between the two methods was evaluated using the Spearman rank correlation coefficient for each gait pattern in each participant. The median (minimum to maximum) values of Spearman rank correlation coefficient among the 12 participants between the index value calculated using the WW-2000 gait analysis system and the assessor's severity scores for anterior trunk tilt, excessive trunk shifts over the affected side, excessive knee joint flexion, and swing difficulty were 0.892 (0.749-0.969), 0.859 (0.439-0.923), 0.920 (0.738-0.969), and 0.681 (0.391-0.889), respectively. The WW-2000 gait analysis system captured four new abnormal gait patterns observed in individuals with hemiparetic stroke with high validity, in addition to nine previously validated abnormal gait patterns. Assessing abnormal gait patterns is important as improving them contributes to stroke rehabilitation. https://jrct.niph.go.jp, identifier jRCT 042190109.
In real-world sports scenarios, Human Action Recognition (HAR) is often hindered by data complexity, limited dynamic adaptability, and fragmented integration of physiological and kinematic information. To address these challenges, this study proposes a multimodal HAR framework for personalized sports health promotion by integrating wearable sensor streams with deep learning architectures. The proposed system employs a robust sensing layer to capture 12-dimensional multimodal data and synchronize physiological indicators with behavioral signals in real time. A novel Transformer-GCN hybrid model was developed to extract complex spatiotemporal dependencies for accurate action recognition and dynamic state analysis. In addition, a reinforcement learning module was incorporated to generate adaptive exercise prescriptions based on user progress. The framework was deployed through a responsive interface for real-time intervention and evaluated in a 12-week randomized controlled trial. The results demonstrated that the proposed framework achieved effective multimodal fusion and reliable action recognition in sports scenarios. After the 12-week intervention, participants in the intervention group showed a 20.1% increase in cardiorespiratory fitness (VO 2 max), a 99.3% improvement in muscular endurance, and a sports injury rate maintained below 15%. These findings indicate that the framework can support accurate motion analysis and safe, personalized intervention. The proposed multimodal fusion architecture effectively bridges the gap between action recognition and personalized sports health intervention. By combining wearable sensing, hybrid deep learning, and reinforcement learning, the framework provides a practical solution for AI-driven motion analysis and adaptive health promotion in land sports scenarios.
In advanced robot systems, monitoring the health of key components such as bearings in the transmission system is crucial for achieving reliable autonomous operation. However, there are still challenges in accurately diagnosing bearing faults under dynamic and noisy conditions. To address this issue, this paper propose a brain-inspired computational framework that integrates an Improved Spider Monkey Optimization algorithm with a Probabilistic Neural Network (ISMO-PNN) for neurally-grounded bearing fault diagnosis in robotic systems. The main content includes: (1) extracting a 22 dimensional mixed feature set from vibration signals, (2) using intelligent PCA strategy to reduce the dimensionality of features to three dimensions while retaining more than 80% of the discriminative information, and (3) using ISMO algorithm to automatically optimize the key smoothing parameters of PNN. On the CWRU bearing dataset, the ISMO-PNN model has a fault classification accuracy of 97.14% and a macro-average F1 score of 97.32%, which is superior to other comparative models in the article. In addition, the minimum training and testing accuracy difference of the model is 0.72%, indicating strong generalization ability. This brain-inspired framework, synergizing a neurally-grounded probabilistic classifier with a bio-inspired swarm optimizer, forms a robust and efficient embedded health monitoring model, which can provide feasible solutions for the development of advanced robot systems.
Urban traffic congestion, environmental degradation, and road safety challenges necessitate intelligent aerial robotic systems capable of real-time adaptive decision-making. Unmanned Aerial Vehicles (UAVs), with their flexible deployment and high vantage point, offer a promising solution for large-scale traffic surveillance in complex urban environments. This study introduces a UAV-based neural framework that addresses challenges such as asymmetric vehicle motion, scale variations, and spatial inconsistencies in aerial imagery. The proposed system integrates a multi-stage pipeline encompassing contrast enhancement and region-based clustering to optimize segmentation while maintaining computational efficiency for resource-constrained UAV platforms. Vehicle detection is carried out using a Recurrent Neural Network (RNN), optimized via a hybrid loss function combining cross-entropy and mean squared error to improve localization and confidence estimation. Upon detection, the system branches into two neural submodules: (i) a classification stream utilizing SURF and BRISK descriptors integrated with a Swin Transformer backbone for precise vehicle categorization, and (ii) a multi-object tracking stream employing DeepSORT, which fuses motion and appearance features within an affinity matrix for robust trajectory association. Comprehensive evaluation on three benchmark UAV datasets-AU-AIR, UAVDT, and VAID shows consistent and high performance. The model achieved detection precisions of 0.913, 0.930, and 0.920; tracking precisions of 0.901, 0.881, and 0.890; and classification accuracies of 92.14, 92.75, and 91.25%, respectively. These findings highlight the adaptability, robustness, and real-time viability of the proposed architecture in aerial traffic surveillance applications. By effectively integrating detection, classification, and tracking within a unified neural framework, the system contributes significant advancements to intelligent UAV-based traffic monitoring and supports future developments in smart city mobility and decision-making systems.
To address these challenges, we propose a subdomain adaptation framework driven by transferable semantic alignment and class correlation. First, source and target domains are divided into subdomains according to class labels, and a joint subdomain distribution alignment mechanism is introduced to reduce intra-class distribution divergence while enlarging inter-class disparities. Second, a domain-adaptive semantic consistency loss is employed to cluster semantically similar samples and separate dissimilar ones in a unified representation space, enabling precise cross-domain semantic alignment. Third, pseudo-label quality in the target domain is improved via temperature-based label smoothing, complemented by a class correlation matrix and a loss function capturing inter-class relationships to exploit intrinsic intra-class coherence and inter-class distinction. Extensive experiments on multiple public datasets demonstrate that the proposed method achieves superior average classification accuracy compared to existing approaches, validating the effectiveness of semantic alignment and class correlation modeling. By explicitly modeling intra-class coherence and inter-class distinction without additional architectural complexity, the framework effectively mitigates domain shift, enhances semantic alignment, and improves recognition performance on the target domain, offering a robust solution for deep unsupervised domain adaptation.
Balancing exploration and exploitation remains a fundamental challenge in reliable mobile robot control, as conventional policies often converge on suboptimal behaviors. Inspired by the brain's division of labor for adaptive control, we propose SpikeAEC, a fully spiking, neuromodulated Actor-Explorer-Critic architecture designed to address this dilemma online within a closed-loop system. SpikeAEC comprises three specialized subnetworks operating in parallel: the Actor, inspired by the basal ganglia, proposes exploitative actions; the Explorer, modeled after the ACC-GPe-STN pathway, generates adaptive exploratory actions gated by a vigilance signal modulated by the accumulated global temporal-difference (TD) error; and the Critic, based on the ventral striatum, computes the TD error. The final action is selected by a separate, TAN-based Arbitrator, which probabilistically chooses between the Actor's and Explorer's action proposals according to recent performance and the TD error. These subnetworks are coupled through a unified three-factor learning framework that uses the TD signal and phasic neuromodulators (acetylcholine and dopamine) from the Arbitrator to drive pathway-specific synaptic plasticity. This online plasticity enhances the quality of action proposals and accelerates policy refinement. In simulation, SpikeAEC outperforms leading brain-inspired methods by converging 24% faster, reducing trajectory length by 18%, and increasing cumulative reward by over 5% against the top-performing baseline, all while maintaining consistency with established neurophysiological principles.
Under the influence of Masked Language Modeling (MLM), Masked Image Modeling (MIM) employs an attention mechanism to perform masked training on images. However, processing a single image requires numerous iterations and substantial computational resources to reconstruct the masked regions, resulting in high computational complexity and significant time costs. To address this issue, we propose an Effective and Efficient self-supervised Masked model based on Mixed feature training (EESMM). First, we stack two images for encoding and input the fused features into the network, which not only reduces computational complexity but also enables the learning of more features. Second, during decoding, we obtain the decoding features corresponding to the original images based on the decoding features of the two input original images and the mixed images, and then construct a corresponding loss function to enhance feature representation. EESMM significantly reduces pre-training time without sacrificing accuracy, achieving 83% accuracy on ImageNet in just 363 h using four V100 GPUs-only one-tenth of the training time required by SimMIM. This validates that the method can substantially accelerate the pre-training process without noticeable performance degradation.
Physiotherapy robots offer a feasible and promising solution for achieving safe and efficient treatment. Among these, acupoint recognition is the core component that ensures the precision of physiotherapy robots. Although the research on the acupoint recognition such as hand and ear has been extensive, the accurate location of acupoints on the back of the human body still faces great challenges due to the lack of significant external features. This paper designs a two-stage acupoint recognition method, which is achieved through the cooperation of two detection networks. First, a lightweight RTMDet network is used to extract the effective back range from the image, and then the acupoint coordinates are inferred from the extracted back range, reducing the inference consumption caused by invalid information. In addition, the RTMPose network based on the SimCC framework converts the acupoint coordinate regression problem into a classification problem of sub-pixel block subregions on the X and Y axes by performing sub-pixel-level segmentation of images, significantly improving detection speed and accuracy. Meanwhile, the multi-layer feature fusion of CSPNeXt enhances feature extraction capabilities. Then, we designed a physiotherapy interaction interface. Through the three-dimensional coordinates of the acupoints, we independently planned the physiotherapy task path of the physiotherapy robot. We conducted performance tests on the acupoint recognition system and physiotherapy task planning in the physiotherapy robot system. The experiments have proven our effectiveness, achieving a recall of 90.17% on human datasets, with a detection error of around 5.78 mm. At the same time, it can accurately identify different back postures and achieve an inference speed of 30 FPS on a 4070Ti GPU. Finally, we conducted continuous physiotherapy tasks on multiple acupoints for the user. The experimental results demonstrate the significant advantages and broad application potential of this method in improving the accuracy and reliability of autonomous acupoint recognition by physiotherapy robots.
Prosthetic knee joints are essential assistive technologies designed to replicate natural gait and improve mobility for individuals with lower-limb loss. This study presents a comprehensive nonlinear dynamic model of a two-degree-of-freedom prosthetic knee joint and introduces three robust nonlinear control strategies: Integral Sliding Mode Control, Conditional Super-Twisting Sliding Mode Control, and Conditional Adaptive Positive Semidefinite Barrier Function-based Sliding Mode Control. These controllers are designed to address the challenges associated with nonlinear joint dynamics, external disturbances, and modeling uncertainties during locomotion. To optimize control performance, the gain parameters of each controller were fine-tuned using Red Fox Optimization, a metaheuristic algorithm inspired by the intelligent hunting behavior of red foxes. Stability analysis is conducted using Lyapunov theory, and control effectiveness is evaluated through simulations in MATLAB/Simulink and validated via hardware-in-the-loop testing using a C2000 Delfino F28379D microcontroller. Among the three controllers, the CoBA-based approach demonstrated the highest tracking accuracy, fastest convergence, and smoothest torque profile. The close agreement between simulation and experimental results confirms the practical applicability of the proposed control framework, offering a promising solution for intelligent and adaptive prosthetic knee systems.
Neurodegenerative diseases (NDs) are a significant threat to human health. Numerous research demonstrated that patients with NDs might present with decreased balance, which is responsible for an increased risk of falling. As an emerging technology, wearable devices can detect falls and prevent privacy breaches. To access the evolution of trends and technology in wearable devices to detect falls among patients with NDs. We screened PubMed and Web of Science (February 2023) to summarize the pathway of fall detection with any body-worn sensor. Included articles were required to be full-text and published in English. Documents were excluded if they; (1) only used wearable devices for fall cueing, (2) did not offer sufficient information for data extraction, (3) did not use patients with NDs, (4) only used non-wearable sensors or devices. The review identified 89 articles at the end of the procedure for data extraction. A wide variety existed in participant sample size (1-131), sensor types, placement and algorithms. 97.75% of papers (n = 87) used patients with Parkinson's disease as experimental subjects. 21.45% of studies attached devices on the ankle (n = 19), with a clear preference for using multiple types of sensors (58.43% of studies, n = 52). As the most commonly used inertial measurement unit (IMU), 21 articles utilized accelerometers and gyroscopes to assess falls. 39.33% of studies (n = 35) choose data set to verify the effectiveness of their algorithm. Machine learning algorithms have become prevalent since 2019, and the most commonly used algorithm was support vector machine (SVM) (n = 17). These results show that an increasing number of researchers examine the validation performance of their systems in non-real-time. The ankle was the preferred location among researchers, and there is a clear preference to use multiple types of sensors and machine learning algorithms to improve accuracy and immediacy. Future work should focus on other NDs instead of limiting to Parkinson's disease and consider an adequately studied population. A consensus on walking tasks and accuracy measurements is urgently needed. Performing studies in a simulated free-living environment for a specified time frame is advisable, with continuous real-time monitoring and assessment. PROSPERO, identifier (CRD42023405952).
Three-way decision with neighborhood rough sets (3WDNRS) is effective in handling uncertain problems involving continuous data through the adjustment of the neighborhood radius. However, it faces two main limitations. Firstly, 3WDNRS relies on individual neighborhood granules as inputs, which can impair both decision efficiency and model generalizability. Secondly, the thresholds used in 3WDNRS often require predefinition based on prior knowledge, making the method difficult to apply in situations where such knowledge is lacking. To address these problems, this study introduces interval granulation (IG) into 3WD to construct an effective three-way classifier. Firstly, an interval granulation method based on DBSCAN is proposed. Then, an interval granulation neighborhood rough sets (IGNRS) model is presented, combining IG with quality indicators. Based on the IGNRS, a three-way classifier called 3WD-IGNRS is proposed by considering the principle of minimum fuzzy loss. Finally, extensive comparative experiments are conducted with three state-of-the-art granular-ball (GB)-based classifiers and four classical machine learning classifiers on 12 public benchmark datasets. The results demonstrate that our models consistently outperform the compared methods, achieving an average accuracy improvement of 4.94% compared to the best-performing granular-ball classifier.
Robotic racket sports provide exceptional benchmarks for evaluating dynamic motion control capabilities in robots. Due to the highly non-linear dynamics of the shuttlecock, the stringent demands on robots' dynamic responses, and the convergence difficulties caused by sparse rewards in reinforcement learning, badminton strikes remain a formidable challenge for robot systems. To address these issues, this study proposes DTG-IRRL, a novel learning framework for badminton strikes that integrates imitation-relaxation reinforcement learning with dynamic trajectory generation. The framework demonstrates significantly improved training efficiency and performance, achieving faster convergence and twice the landing accuracy. Analysis of the reward function within a specific parameter space hyperplane intuitively reveals the convergence difficulties arising from the inherent sparsity of rewards in racket sports and demonstrates the framework's effectiveness in mitigating local and slow convergence. Implemented on hardware with zero-shot transfer, the framework achieves a 90% hitting rate and a 70% landing accuracy, enabling sustained humanrobot rallies. Cross-platform validation using the UR5 robot demonstrates the framework's generalizability while highlighting the requirement for high dynamic performance of robotic arms in racket sports.
In deep-sea areas, the hoisting operation of offshore wind turbines is seriously affected by waves, and the secondary impact is prone to occur between the turbine and the pile foundation. To address this issue, this study proposes an integrated wave compensation system for offshore wind turbines based on a neuromorphic vision (NeuroVI) camera. The system employs a NeuroVI camera to achieve non-contact, high-precision, and low-latency displacement detection of hydraulic cylinders, overcoming the limitations of traditional magnetostrictive displacement sensors, which exhibit slow response and susceptibility to interference in harsh marine conditions. A dynamic simulation model was developed using AMESim-Simulink co-simulation to analyze the compensation performance of the NeuroVI-based system under step and sinusoidal wave disturbances. Comparative results demonstrate that the NeuroVI feedback system achieves faster response times and superior stability over conventional sensors. Laboratory-scale model tests and real-world application in the installation of a 5.2 MW offshore wind turbine validated the system's feasibility and robustness, enabling real-time collaborative control of turbine and cylinder displacement to effectively mitigate multi-impact risks. This research provides an innovative approach for deploying neural perception technology in complex marine scenarios and advances the development of neuro-robotic systems in ocean engineering.
In nasal endoscopic surgery, the narrow nasal cavity restricts the surgical field of view and the manipulation of surgical instruments. Therefore, precise real-time intraoperative navigation, which can provide precise 3D information, plays a crucial role in avoiding critical areas with dense blood vessels and nerves. Although significant progress has been made in endoscopic 3D reconstruction methods, their application in nasal scenarios still faces numerous challenges. On the one hand, there is a lack of high-quality, annotated nasal endoscopy datasets. On the other hand, issues such as motion blur and soft tissue deformations complicate the nasal endoscopy reconstruction process. To tackle these challenges, a series of nasal endoscopy examination videos are collected, and the pose information for each frame is recorded. Additionally, a novel model named Mip-EndoGS is proposed, which integrates 3D Gaussian Splatting for reconstruction and rendering and a diffusion module to reduce image blurring in endoscopic data. Meanwhile, by incorporating an adaptive low-pass filter into the rendering pipeline, the aliasing artifacts (jagged edges) are mitigated, which occur during the rendering process. Extensive quantitative and visual experiments show that the proposed model is capable of reconstructing 3D scenes within the nasal cavity in real-time, thereby offering surgeons more detailed and precise information about the surgical scene. Moreover, the proposed approach holds great potential for integration with AR-based surgical navigation systems to enhance intraoperative guidance.
Human teams intuitively and effectively collaborate to move large, heavy, or unwieldy objects. However, understanding of this interaction in literature is limited. This is especially problematic given our goal to enable human-robot teams to work together. Therefore, to better understand how human teams work together to eventually enable intuitive human-robot interaction, in this paper we examine four sub-components of collaborative manipulation (co-manipulation), using motion and haptics. We define co-manipulation as a group of two or more agents collaboratively moving an object. We present a study that uses a large object for co-manipulation as we vary the number of participants (two or three) and the roles of the participants (leaders or followers), and the degrees of freedom necessary to complete the defined motion for the object. In analyzing the results, we focus on four key components related to motion and haptics. Specifically, we first define and examine a static or rest state to demonstrate a method of detecting transitions between the static state and an active state, where one or more agents are moving toward an intended goal. Secondly, we analyze a variety of signals (e.g. force, acceleration, etc.) during movements in each of the six rigid-body degrees of freedom of the co-manipulated object. This data allows us to identify the best signals that correlate with the desired motion of the team. Third, we examine the completion percentage of each task. The completion percentage for each task can be used to determine which motion objectives can be communicated via haptic feedback. Finally, we define a metric to determine if participants divide two degree-of-freedom tasks into separate degrees of freedom or if they take the most direct path. These four components contribute to the necessary groundwork for advancing intuitive human-robot interaction.