共找到 20 条结果
In Luminous, two generations of a Korean family use neurorobotics to build sentient robot friends.
As automotive manufacturing advances toward the industrial 5.0 era, traditional rigid automation production models are transitioning toward the embodied intelligence paradigm. Confronted with mass customization, diverse products, and small-batch production, the environment of automotive manufacturing exhibits high dynamism and unstructured characteristics. Different from traditional industrial intelligence based on static, hard-coded logic, robots enhance their cognitive abilities through closed-loop interaction with dynamic environments, inspired by bionic neural mechanisms, this shift enables robots to perform flexible and reliable operations in complex production scenarios. This paper analyzes the core role and key technologies of neural intelligence algorithms in reshaping perception, decision, and execution of industrial robot, while providing a systematic review of industrial robot evolution within the automotive industry, and provides a reliable path for future development.
[This corrects the article DOI: 10.3389/fnbot.2026.1796043.].
Adaptive lower-limb neurorobotics requires gaitd-state representations that preserve locomotor structure without reducing post-stroke walking to a single asymmetry score or opaque latent embedding. Because post-stroke gait is multimodal and side dependent, transparent side-aware representations may better support future adaptive-assistance design than modality-isolated summaries. This secondary analysis used a public multimodal gait dataset comprising 138 able-bodied adults and 50 adults with stroke. The analytic space was restricted to 11 waveform domains shared across public exports: four sagittal kinematic waveforms and seven repository-normalized surface electromyography waveforms, each represented by 1,001 time-normalized points. Stroke waveforms were organized into paretic, non-paretic, bilateral-mean, and side-difference views, with side difference defined as paretic minus non-paretic. Domain-view functional principal component analysis retained 90% cumulative variance, capped at three components per block; family-level reduction retained 90% variance, capped at eight components. Candidate Ward hierarchical and K-means solutions from two to five states were screened in kinematics-only, sEMG-only, fused, paretic-only, and erector-spinae-excluded spaces. The retained fused side-aware solution organized the strict complete-case stroke cohort (n = 43) into three states: State 1 (n = 12), State 2 (n = 18), and State 3 (n = 13). The strongest fused two-state K-means comparator showed higher compactness and resampling stability than the retained three-state solution [silhouette 0.189; bootstrap adjusted Rand index (ARI) 0.876 versus silhouette 0.155; bootstrap ARI 0.633]. However, the three-state solution was retained as a representation-level choice because it avoided trivial micro-clusters, preserved explicit multimodal side-aware structure, and enabled clearer waveform-level interpretation. Sensitivity analyses showed identical assignments after erector-spinae exclusion (ARI = 1.000), partial concordance under robust scaling (ARI = 0.785), and material reassignment when the block cap was reduced to two components (ARI = 0.335). The strongest domain contributors were ankle angle (1.000), vastus lateralis sEMG (0.898), knee angle (0.866), gastrocnemius sEMG (0.851), and tibialis anterior sEMG (0.840). Public waveform exports supported an internally interpretable, side-aware multimodal representation of post-stroke gait relevant to neurorobotic state-representation design. This contribution remains exploratory and representational, not clinical, interventional, real-time, or controller-validating; for future studies, it should be interpreted as a hypothesis-generating framework.
Resource-constrained environmental perception requires autonomous robots and embodied intelligent systems to process visual signals efficiently while preserving image fidelity in complex real-world environments. However, converting high dynamic range RAW sensor data into perceptually faithful RGB images remains computationally expensive, thereby limiting the deployment of neural image signal processors on edge platforms with restricted memory, energy, and computational budgets. Consequently, this study proposes the enhanced quantized image signal processor (EQISP), comprising the quantized convolutional neural network (QCNN) and the unified pyramid fusion algorithm (UPFA). QCNN employs dynamic fixed-point hybrid quantization, which adjusts parameter ranges according to the linear relationship between threshold standard deviation and fractional length, thereby significantly reducing the computational load. Meanwhile, UPFA utilizes Gaussian pyramids to capture global illumination and Laplacian pyramids to preserve fine details, enabling multi-scale, multi-exposure fusion and iterative reconstruction to mitigate detail loss induced by quantization. Comprehensive comparative experiments demonstrated that EQISP achieved a PSNR of 22.90 dB, an SSIM of 0.9278, and 164.843 GFLOPs. Compared with the PyNET baseline, EQISP improved the PSNR by 1.71 dB while reducing the computational cost by a factor of 4.24. Furthermore, deployment experiments on an NVIDIA Jetson TX2 development board showed that EQISP achieved a model size of 57 MB, an inference latency of 189 ms, an inference speed of 6.1 FPS, and a peak memory usage of 2.2 GB. These results provide practical evidence that EQISP can serve as an efficient and scalable visual front end for resource-constrained embodied perception systems.
Chinese-English machine translation based on neural network model strictly adopts the sequential modeling method of encoder-decoder. However, this traditional method cannot make effective use of syntactic information and linguistic hierarchy information. Therefore, to integrate syntactic structure information into Chinese-English machine translation to improve its translation performance, this paper proposes a new Chinese-English machine translation method based on graph convolutional network and BERT (Bidirectional Encoder Representation from Transformers) knowledge enhancement. In this work, we present an enhanced approach to neural machine translation that integrates multiple techniques to improve translation quality. The multi-BERT context is first compressed and aligned into the semantic space of the translation model using learnable compression vectors. This alignment ensures that the rich contextual information from BERT is effectively utilized within our translation framework. At the end of source language, we employ a dual encoder to encode both the source sentence and its syntactic dependency tree, thereby capturing both lexical and structural information. To further enrich the source-side semantic representation, the compression vector is concatenated with the input vector of the encoder. Additionally, we introduce a phrase discard mechanism that randomly discards target phrases during training. This mechanism enhances the model's robustness against mistranslated phrases, thereby reducing their impact on subsequent phrase translations. Experiments on NIST dataset demonstrate the effectiveness of our proposed lightweight Chinese-English translation method. Different from general-purpose large chatbot models (e.g., ChatGPT) with high computing costs, this model achieves 39.68 BLEU with low parameters, solving issues of low-resource scenarios and phrase mistranslation. It offers a novel lightweight paradigm for private-oriented translation chatbots, outperforming the baseline Transformer (35.75 BLEU) significantly.
Neuromorphic vision systems process continuous event streams and offer transformative potential for real-time applications. However, their evaluation remains tethered to methodologies from RGB imaging. These approaches convert asynchronous event streams into synchronized frames and ignore perception latency, creating a critical gap between benchmarks and real-world performance. To address this, we introduce the STream-based lAtency-awaRe Evaluation (STARE) framework. STARE integrates two core components: Continuous Sampling, maximizing model throughput to reduce the impact of latency, and Latency-Aware Evaluation, quantifying latency-induced online accuracy. To rigorously validate STARE, we developed ESOT500, a high-dynamic object tracking dataset with 500 Hz annotations. Experiments reveal that latency severely degrades online accuracy by over 50%. We further introduce two model enhancement strategies: Asynchronous Tracking, a fast-slow architecture that boosts model throughput, and Context-Aware Sampling, which dynamically adapts input to handle low event density cases. Overall, our work bridges the latency gap between models' theoretical potential and real-world deployment.
Deep learning technology promotes the development of single-image dehazing. However, many existing methods fail to fully consider the haze density and its spatial distribution, which limits the improvement of dehazing performance. To address this issue, we propose an attention-based multi-scale feature aggregation network (AMSA-Net) for single-image dehazing. AMSA-Net is an encoding and decoding structure. Its encoder and decoder are composed of multi-scale hybrid attention feature aggregation module (MSHA-FAM). The module can perceive the haze density and spatial information in the haze image, which helps to improve the dehazing effect. MSHA-FAM is composed of two key components: the scale-aware coordinate residual module (SCRM) and multi-scale feature refinement residual module (MSFRRM). SCRM uses improved coordinate attention to effectively capture haze density and spatial characteristics, thus significantly improving dehazing effect. MSFRRM extracts semantic features through up-sampling and down-sampling, and uses improved pixel attention mechanism to enhance key features. In the overall MSHA-FAM pipeline, SCRM first learns the density and spatial distribution characteristics of haze, then refines it through MSFRRM, so as to remove haze more effectively. The experimental results demonstrate that our proposed AMSA-Net is superior to the comparison methods in terms of dehazing quality. Ablation studies further verify the effectiveness of the proposed modules. In this work, we present AMSA-Net, which has achieved good dehazing performance and can provide high-quality input for subsequent computer vision tasks.
Active physical human-exoskeleton interaction has been widely studied. However, the challenges of human motion intention recognition and synchronous tracking have not been well-addressed. In this article, a motion intention recognition method based on biophysical information fusion and adaptive learning was proposed to overcome the limitations of existing approaches. First, a lower-limb joint angle prediction model was developed by integrating surface electromyography (sEMG), historical joint angles and centers of gravity. The convolutional neural network, Mamba network, and multilayer perceptron network were used respectively for feature extraction, information fusion, and joint angle prediction. Second, an online adaptive method for the angle prediction model was designed based on a style transfer mapping technique to address the issue of recognition accuracy decline. In this method, the new sEMG features were mapped into the initial feature space, by which the prediction model can maintain the predictive performance during long-term implementation. Furthermore, a real-time control method for the exoskeleton synchronous tracking was given based on the predicted angles. Finally, the feasibility of the proposed methods was validated through the offline and online experiments.
The integration of virtual simulation with intelligent modeling is crucial for advancing the scientization and personalization of volleyball physical training. This study aims to overcome the convergence instability and feature misalignment in modeling multimodal kinematic and physiological sequences. A dynamical framework based on a Dual-Stream Long Short-Term Memory network integrated with a temporal attention mechanism is proposed. The framework decouples heterogeneous feature learning and optimizes temporal weight distribution. Experimental validation on complex motion state estimation demonstrates that the proposed model reduces load modeling error to 3.8% and achieves a motion classification accuracy of 93.1%. The velocity trajectory fitting coefficient of determination is 0.91 with a peak deviation of 0.05 m/s. These results confirm the effectiveness of the attention-based DS-LSTM in optimizing multimodal sequence modeling for training state estimation and feedback.
In the high-stakes arena of aerial combat-a domain defined by extreme dynamics and unforgiving physical constraints-UAV swarms are currently squeezed between two extremes: the "tactical short-sightedness" of Multi-Agent Reinforcement Learning (MARL) and the "inference lag" of Large Language Models (LLMs). While MARL struggles to internalize the complex maneuverability priors required for expert flight, LLMs are simply too heavy to meet millisecond-level control demands. We bridge this gap by introducing a cognitive synergetic hierarchical framework that decouples strategic reasoning from tactical execution. Our architecture splits the workload between a "Strategic Brain" and a "Tactical Torso." For the Brain, we utilize a synergy between DeepSeek-R1 (70B) and its 7B distilled counterpart to create a collaborative inference engine. By capitalizing on the inherent sparsity of tactical logic in air combat, we implemented a speculative decoding mechanism that achieves an effective boost in decision throughput while maintaining the deep logic of the full 70B model. For the Torso, we developed an enhanced MAPPO algorithm that processes relative pose graphs via graph attention. By integrating a KL-divergence constraint into the loss function, we essentially force agents with different payloads-like scouts and attackers-to evolve specialized tactical personalities within a shared latent space. Experimental results using the JSBSim high-fidelity 6-DOF engine demonstrate that the swarm does more than just improve its exchange ratio. Further t-SNE manifold analysis and Chain-of-Thought visualizations confirm that our architecture successfully aligns symbolic intent with raw physical control. Most notably, through our "decision-reflection-evolution" loop, the system proved it could diagnose its own failures, and iteratively refine its own tactical instructions.
Recent advances in neural networks have introduced a new paradigm for robotic inverse kinematics. However, existing methods remain limited by insufficient feature extraction and suboptimal integration of multi-source information, preventing them from achieving high accuracy, broad generalization, and real-time performance on robots with diverse and complex kinematic structures. In this work, we propose HarmoAtt-IK, an adaptive multimodal neural inverse kinematics approach designed for real-time inference and zero-collection training. Built upon the CycleIK framework, the proposed method introduces a novel adaptive multimodal attention fusion mechanism (HarmoAtt) that dynamically integrates the complementary strengths of spatial, channel, and cross-dimensional attention. It employs a temperature-adaptive Softmax function coupled with a compact weight-generation network to perform multidimensional extraction and adaptive enhancement of input features. We further introduce a composite loss function integrating an improved Smooth-L1 loss, a sign-invariant quaternion loss, and a Shannon entropy regularizer to enhance training stability and overall accuracy. Leveraging forward differential kinematics, our method enables rapid, cross-platform deployment by generating training data solely from URDF models, eliminating the need for costly physical data collection and manual annotation. Experimental evaluations on five humanoid platforms exhibiting substantial kinematic diversity demonstrate that HarmoAtt-IK attains maximum reductions of 76.4% in terminal positional error and 55.1% in rotational error relative to the baseline, while consistently improving the model's inference success rate across all tested platforms by up to 5.76 percentage points. These results indicate that the proposed HarmoAtt-IK significantly outperforms baseline methods in both accuracy and reliability across diverse kinematic structures, highlighting the effectiveness of the adaptive multimodal attention mechanism and composite loss design. This further supports its potential for scalable, real-time deployment on a wide range of robotic platforms.
The rapid advancement of unmanned aerial vehicles (UAVs) in disaster response and environmental monitoring has underscored the growing importance of real-time object detection within UAV swarm networks. However, the non-independent and identically distributed (non-IID) characteristics of data in UAV networks present significant challenges to model convergence and adaptability. To tackle these challenges, this study introduces a robust federated UAV object detection framework tailored for non-IID data distributions. The framework aims to enhance adaptability across clients, thereby improving both detection performance and convergence speed. Our approach includes a self-distillation mechanism that leverages personalized knowledge from local model historical states to guide current local training, striking a balance between specialization and adaptability. Additionally, we propose a drift compensation mechanism to synchronize local and global model updates, mitigating model drift. We conducted extensive experiments on the VisDrone2019-DET dataset, comparing our method to baseline models. Results demonstrate that our approach accelerates convergence speed by approximately 2.2 times and enhances detection performance by around 3%, offering an efficient and robust solution for UAV-based object detection under non-IID conditions.
Direct cellular reprogramming, the conversion of one somatic cell type into another, represents a remarkable advancement in regenerative medicine. Its potential to transform fibrotic tissue into functional parenchyma underscores its therapeutic promise. However, several critical challenges remain unresolved, including limited reprogramming efficiency, the long-term functional stability of converted cells, their integration within pre-existing cellular circuits, and safety concerns related to transgene integration and immunological responses to reprogramming-based viral vectors. Approaches based on the exogenous administration of recombinant proteins and miRNAs have also emerged, though these rely on factors that are naturally prone to exhaustion and degradation, potentially restricting their efficacy. This review is divided into three main sections. The first part addresses direct cellular reprogramming in the context of other cell-based applications, outlining its main applications and current biological limitations. The second part examines how different biomaterials, ranging from hydrogel scaffolds to nanoparticles, can modulate direct cellular reprogramming by providing mechanical and topographical cues and by enabling tighter control over the concentration and spatiotemporal dynamics of reprogramming factors and viral vectors. The third part discusses key findings in biomaterial-assisted reprogramming strategies, highlighting emerging opportunities for clinically translatable approaches. The convergence of regenerative biology and biomaterials science may ultimately generate advanced gel-based and hybrid cellular reprogramming platforms for in vitro testing and, in situ applications, for promoting cell fate stabilization and facilitating the regeneration of damaged tissues and organs.
[This corrects the article DOI: 10.3389/fnbot.2026.1768219.].
We propose ESSC-RM, a plug-and-play Enhancing framework for Semantic Scene Completion with a Refinement Module, which can be seamlessly integrated into existing semantic scene completion (SSC) models. ESSC-RM operates in two phases: a baseline SSC network first produces a coarse voxel prediction, which is subsequently refined by a 3D U-Net-based Prediction Noise-Aware Module (PNAM) and Voxel-level Local Geometry Module (VLGM) under multiscale supervision. Experiments on SemanticKITTI show that ESSC-RM consistently improves semantic prediction performance. When integrated into CGFormer and MonoScene, the mean IoU increases from 16.87 to 17.27% and from 11.08 to 11.51%, respectively. These results demonstrate that ESSC-RM serves as a general refinement framework applicable to a wide range of SSC models. Project page: https://github.com/LuckyMax0722/ESSC-RM and https://github.com/LuckyMax0722/VLGSSC.
Digital-image technology has broadened the creative space of dance, yet accurately capturing the semantic correspondence between low-level motion data and high-level dance key-points remains challenging, especially when labeled data are scarce. We aim to establish a lightweight, semi-supervised pipeline that can extract discriminative motion features from depth sequences and map them to 3-D key-points of dancers in real time. To achieve pixel-level alignment between dance movement targets and high-dimensional sensory data, we propose a novel LSTM-CNN (Long Short Term Memory-Convolutional Neural Network) framework. Temporal-context features are first extracted by LSTM, after which multi-dimensional spatial features are captured by three convolutional layers and one max-pooling layer; the fused representation is finally regressed to 3-D body key-points. To relieve class imbalance caused by complex postures, an online hard-example mining (OHEM) strategy together with a Dice-cross-entropy weighted loss (3:1) is embedded into semi-supervised learning, enabling the network to converge with only 20% labeled samples. Experiments on the public MSR-Action3D dataset (567 sequences, 20 actions) yielded an average recognition rate of 96.9%, surpassing the best comparison method (MSST) by 1.1%. On our self-established dataset (99 sequences, 11 actions) the accuracy reached 97.99% while training time was reduced by 35% compared with the previous best Multi_perspective_MHPCs approach. Both datasets show low RMSE (≤ 0.032) between predicted and ground-truth key-points, confirming spatial precision. The results demonstrate that the proposed model can reliably track subtle dance gestures under limited annotation, offering an efficient, low-cost solution for digital choreography, motion-style transfer and interactive stage performance.
Brain pathologies such as ischemic stroke or traumatic brain injury (TBI) are among the most impactful diseases worldwide. In ischemic stroke, we currently lack truly effective treatments capable of delaying infarct progression, limiting lesion size or stimulating endogenous brain repair mechanisms to promote neurovascular remodeling and functional recovery. Two main barriers continue to limit the clinical translation of therapeutic molecules: the highly restrictive nature of the blood-brain barrier and that many bioactive molecules exhibit low stability at the target site, with half-lives shorter than the therapeutic window. In this study, we developed tunable silk fibroin (SF) films of variable concentration, fabricated via water annealing, that effectively preserve the functional activity of the chemokine CXCL12 (SDF-1α). The 2% SF formulation provided sustained release of SDF-1α for at least 7 days, promoting the in vitro migration of mesenchymal stem cells (MSCs) and low-density bone marrow mononuclear cells (LDBM), the latter containing hematopoietic stem cells. When implanted on the cortical surface, the SDF-1α-SF films successfully stimulated the guided migration of exogenously administered MSCs and LDBM from subcortical regions into the cerebral cortex. Furthermore, co-implantation of SDF-1α-SF films with MSCs or LDBM enhanced cell retention at the cortical site, effectively minimizing off-target dispersion. In a photothrombotic model of cortical ischemia, allowing precise control of lesion location and size, SDF-1α-SF films significantly reduced lesion volume and preserved neuronal function in the somatosensory cortex, as assessed by electrophysiology. Our findings provide proof of concept for using chemokine-releasing biomaterials to actively modulate stem cell migration and retention within the brain, offering strong potential for neuroprotection and tissue remodeling in areas at risk or already affected by damage.
Remote sensing image target detection has important applications in disaster prevention and mitigation. However, current detection models still have shortcomings in multi-scale feature fusion and in representing contextual information, and are prone to class imbalance during training. To address these issues, this paper proposes a target detection model that combines a context transformer and a weighted residual pyramid. Specifically, a weighted residual pyramid module is designed to fuse deep and shallow target-instance features effectively, and a learnable balancing factor is introduced to alleviate the imbalance in contributions across network layers. Simultaneously, a context transformer module is introduced into the feature extraction and fusion networks to enhance the model's multi-scale feature representation capability. Furthermore, a rotated bounding box is introduced to locate target instances, reducing the influence of redundant background information, and a CIoU-based multi-task loss function is designed to reduce the contribution of different instance targets to the regression task. Experimental results show that, while ensuring real-time performance, the proposed model achieves mAP@0.5 scores of 0.754 and 0.714 on the DOTAv1.0 and DOTAv1.5 datasets, respectively, demonstrating consistent improvements across both datasets. Meanwhile, visualization of detection results across various disaster scenarios shows that the model proposed in this paper has strong practical value.
In advanced robot systems, monitoring the health of key components such as bearings in the transmission system is crucial for achieving reliable autonomous operation. However, there are still challenges in accurately diagnosing bearing faults under dynamic and noisy conditions. To address this issue, this paper propose a brain-inspired computational framework that integrates an Improved Spider Monkey Optimization algorithm with a Probabilistic Neural Network (ISMO-PNN) for neurally-grounded bearing fault diagnosis in robotic systems. The main content includes: (1) extracting a 22 dimensional mixed feature set from vibration signals, (2) using intelligent PCA strategy to reduce the dimensionality of features to three dimensions while retaining more than 80% of the discriminative information, and (3) using ISMO algorithm to automatically optimize the key smoothing parameters of PNN. On the CWRU bearing dataset, the ISMO-PNN model has a fault classification accuracy of 97.14% and a macro-average F1 score of 97.32%, which is superior to other comparative models in the article. In addition, the minimum training and testing accuracy difference of the model is 0.72%, indicating strong generalization ability. This brain-inspired framework, synergizing a neurally-grounded probabilistic classifier with a bio-inspired swarm optimizer, forms a robust and efficient embedded health monitoring model, which can provide feasible solutions for the development of advanced robot systems.