Weather radar reflectivity images play a critical role in reliable weather monitoring and forecasting. However, inherent factors such as ground clutter, sea clutter, and electromagnetic interference frequently introduce nonprecipitation echoes (NPEs) into these data, posing significant challenges for accurate precipitation detection. A promising solution is leveraging deep learning networks to identify and remove NPEs using satellite observations. To enhance the practicality of these models, recent advancements in reparameterization technology have shown potential for reducing computational complexity. However, existing methods focus on parallel multibranch reparameterization; the fusion of multiple convolutions into single equivalent convolutions, referred to as multiconvolution reparameterization, is still not explored. In this study, we propose a novel reparameterized NPE removal network (RepNPE-Net), designed to recognize and remove NPEs of radar reflectivity data using multichannel brightness temperature (BT) observations from a geostationary meteorological satellite. Reparameterized NPE removal network (RepNPE)-Net incorporates two innovative dual-stream convolution structure-based modules, including the reparameterized dual-stream convolutional module (RepDCM) and the reparameterized attention dual-stream convolutional module (RepADCM), which synergize standard and depthwise separable (DS) residual convolutional blocks to improve feature extraction and representation capabilities. Within the RepADCM module, a positional efficient local attention (PELA) block is designed to enable the network to focus on spatially significant positional features and enhance model's accuracy. Furthermore, to strengthen the practical application ability of the proposed RepNPE-Net, we introduce hybrid convolution reparameterization (HCR) technology, which consolidates multibranch and multiconvolution operations (e.g., depthwise (DW) and pointwise (PW) convolutions) into single equivalent convolutions during the inference stage, significantly reducing computational complexity without compromising network performance. Experimental results demonstrate that RepNPE-Net outperforms existing methods in both NPE removal accuracy and computational efficiency, highlighting its potential for improving radar data quality and advancing meteorological research and applications.
Hybrid Vision Transformers (HybridViTs), which integrate convolutional neural networks (CNNs) with Transformer blocks, offer both local and global feature extraction capabilities, achieving high performance across a range of computer vision tasks. However, the substantial computational asymmetry between lightweight CNN blocks and compute-intensive Transformer blocks presents significant challenges for simultaneous optimization and acceleration within a single hardware architecture. To address these challenges, we propose FLASH, a power-efficient field-programmable gate array (FPGA)-based accelerator tailored for CNN-Transformer hybrid networks. FLASH reduces quantization overhead by consolidating redundant quantization-dequantization operations into a single requantization step and enables 8-bit integer-only computation for residual connections through proper scaling factor handling. To further optimize for hardware efficiency, FLASH introduces hardware-friendly linear approximations of nonlinear functions such as Swish and Softmax. By precomputing row-wise max values through offline calibration, we eliminate both max-value search logic and intermediate memory buffering overhead, while reusing shared integer-exponential units to minimize resource consumption. Architecturally, FLASH employs a two-stage pipeline: Stage 1 eliminates external DRAM access using a fully pipelined MobileNetV2 backbone, while Stage 2 accelerates Transformer and convolutional components through specialized compute units and dataflow optimizations. Experimental evaluation using MobileViT (MViT)-xxs on Xilinx VCU118 FPGA demonstrates that FLASH incurs only a 0.84% accuracy drop on ImageNet-1K compared to the FP32 baseline, while achieving up to $16.8\times $ lower power consumption and $26.3\times $ improvement in energy efficiency relative to CPU/GPU implementations. These results establish FLASH as an energy-efficient hardware accelerator for real-time inference of HybridViT models on edge devices.
Quantum convolutional neural networks (QCNNs) are a highly appealing architecture that combines quantum computing and deep learning. Inspired by classical convolutional neural network (CNN) hierarchical feature extraction, QCNNs use quantum operations such as entanglement, superposition, and measurement to capture complex correlations in high-dimensional datasets. Since their introduction, QCNNs have evolved into several architectural variants, including fully quantum, variational, hybrid, and graph-based models, and have been investigated in applications ranging from quantum many-body physics to classical machine learning tasks such as image classification, speech recognition, and time-series forecasting. The development of software ecosystems such as Qiskit Machine Learning, Pennylane, and TensorFlow Quantum (TFQ) has facilitated rapid development and experimentation, speeding up research. Despite these developments, existing research remains divided, with many focused on individual implementations and lacking a unifying taxonomy or full review. This article addresses that gap by providing a systematic and holistic survey on QCNN, offering comparative insights across key architectures, applications, and toolboxes, and outlining open challenges. We also identify potential study areas, such as scalable architectures, domain-knowledge integration, fault tolerance, and security. This survey seeks to serve as a basic reference for furthering QCNN research in the current near-term quantum era and beyond, providing both architectural insights and application-driven viewpoints.
The advent of in-context learning (ICL) allows pretrained large language models (LLMs) to handle unseen inputs by leveraging context, without requiring parameter updates. However, the success of ICL is strongly influenced by factors such as the quality, size, and ordering of demonstrations, often resulting in unstable or less-than-ideal outcomes. This work is the first to address these limitations through the lens of demonstration augmentation. We first propose a simple yet effective ICL method, termed implicit demonstration augmentation-based ICL (IDAICL), that enriches demonstrations by leveraging their deep feature distributions, integrating knowledge from the entire demonstration set to enhance LLM predictions. From a theoretical standpoint, we demonstrate that when the number of augmented samples tends to infinity, our method asymptotically converges to a new form of logit calibration. Building upon this foundation, we further propose a domain-aware IDAICL (D-IDAICL) method, which improves the precision of knowledge integration by identifying and leveraging the most pertinent domain for each test sample during augmentation. Specifically, a hypernetwork is employed to adaptively select the most effective domain based on the deep representation of the test sample. The corresponding domain-specific knowledge is then utilized to augment the demonstrations, resulting in a domain-aware logit calibration function that enhances predictive performance. Comprehensive evaluations across multiple tasks using eight LLMs reveal that both approaches markedly boost overall and worst-case accuracy, leading to improved robustness and predictive capability. In addition, our approaches mitigate performance fluctuations across different demonstrations, orderings, and templates, while also showing effectiveness in handling class imbalance.
The performance of deep neural networks (DNNs) in accomplishing tasks heavily relies on feature selection and sparse representation of high-dimensional data. Previous work has treated feature selection and sparse representation as separate mechanisms for improving DNNs performance, focusing on identifying and leveraging informative features to enhance task-specific outcomes. However, few studies have established a connection between feature selection and sparse representation. To address this gap, this article proposes an optimization framework termed informative sparse transport (IST), which integrates feature selection and sparse coding into a unified multiobjective optimization framework. Using optimal transport as a bridge, the IST framework harmonizes the relationship between feature selection and sparse representation, offering an informational advantage. In the IST framework, feature selection aims to identify an optimal subset of features to maximize mutual information or minimize redundancy, while sparse representation seeks to approximate data with the fewest possible features. Although these objectives differ, they are fundamentally complementary, as both emphasize extracting task-relevant information while eliminating redundancy. By unifying feature selection and sparse representation, the IST framework effectively mitigates challenges posed by high-dimensional data, delivering a robust solution for enhanced feature extraction and representation. We validate the IST framework on generative and classification tasks, demonstrating IST framework improves model performance through the complementary synergy of feature selection and sparse representation.
The extended multilinear mixing (EMLM) model has garnered considerable attention due to its proficiency in accurately characterizing intricate nonlinear mixing processes in hyperspectral unmixing (HU). However, current deep learning (DL) approaches leveraging EMLM for HU heavily relies on the meticulous design of efficient deep autoencoder (AE) network architectures, a process that is both time-consuming and labor-intensive, requiring profound expertise and extensive design experience. This challenge is particularly pronounced when confronted with mixed phenomena exhibiting varying nonlinear intensities across diverse spectral bands, rendering manual network architecture design to accommodate all bands exceedingly difficult. To address the issue, this study proposes an EMLM HU model based on neural architecture search (NAS) with spatial-spectral attention. The main contributions are threefold: 1) NAS is pioneeringly integrated into HU based on EMLM, enabling adaptive modeling of the intricate relationships among endmembers, abundances, and transition probability parameters; 2) a spectral-spatial attention-guided large-scale search space is designed, incorporating a diverse array of multiscale convolutional operations tailored to the variability in scattering degree across hyperspectral remote sensing image bands. Additionally, a NAS acceleration strategy inspired by sparse coding is employed to alleviate the search time burden stemming from the expanded search space; and 3) a hybrid loss function combining linear reconstruction loss with multilinear reconstruction spectral angle distance (SAD) is formulated to balance the influence of linear components during the unmixing process, enhancing convergence speed and endmember accuracy. Extensive experimental validation on both synthetic and real datasets demonstrates the significant performance advantages of the proposed method, underscoring its potential to revolutionize HU. The code is available at https://github.com/Appe-hub/HNU-NAS.
Network motifs, as fundamental functional substructures in gene regulatory networks (GRNs), play a critical role in regulating gene expression. Despite the successful application of graph representation learning in GRN modeling, most existing approaches mainly capture pairwise relationships and overlook higher order regulatory patterns encoded by functional motifs, which limits the accuracy of regulatory inference. To address this limitation, we propose Motif-GRN, a motif-based hypergraph representation learning framework that captures the underlying biological logic in higher order semantic structures. We first identify statistically significant regulatory motifs and construct a multichannel motif-induced hypergraph. We then design a motif-aware hypergraph convolutional network to extract motif-centric semantic features, while a conventional graph convolution module preserves first-order relational information. In addition, we introduce cross-view contrastive learning to align heterogeneous representations and enhance gene embeddings. Building on Motif-GRN, we develop an inductive extension that enables cross-dataset generalization and effective GRN inference with limited labels. Extensive experiments on three ground-truth networks across seven cell types demonstrate that Motif-GRN outperforms state-of-the-art baselines in both transductive and inductive GRN inference tasks, highlighting its potential for higher order regulatory network modeling.
In deep brain stimulation (DBS) surgery for Parkinson's disease (PD), the accurate intraoperative identification of key nuclei-such as the subthalamic nucleus (STN)-is critical to ensuring therapeutic efficacy. However, current approaches heavily depend on expert annotations, and classification tasks are typically confined to binary distinctions between STN and non-STN regions. This limitation hampers the real-time recognition of complex and diverse neuroanatomical structures during PD-related DBS procedures. To overcome this challenge, we propose a three-class classification strategy for microelectrode recording (MER) signals to effectively distinguish the zona incerta (Zi), STN, and substantia nigra (SN). This approach integrates supervised feature selection with unsupervised clustering: discriminative features are first selected using the random forest algorithm; these features are then input into fuzzy c-means (FCM) clustering for preliminary classification; finally, samples with low-confidence scores are manually reviewed. This strategy forms an efficient and verifiable label-generation mechanism that improves classification accuracy and enhances clinical applicability. Experimental results show that the proposed "clustering + review" labeling framework achieves an overall classification accuracy of 92.71%, with only about 10% of samples requiring manual verification-closely matching the 92.97% accuracy achieved with expert labeling and significantly improving labeling efficiency. Furthermore, the ROC-AUC values for all three nuclei (Zi, STN, and SN) exceed 0.97, confirming the model's robust discriminative performance. By combining supervised and unsupervised techniques, the proposed multinuclei classification framework for MER signals not only ensures high accuracy while substantially reducing manual annotation costs but also offers a scalable and efficient solution for rapid neural signal labeling. This method is particularly well-suited for real-time applications such as intraoperative target localization during DBS and shows strong potential for clinical translation.
The growth of large models demands multinode cooperation during training and inference processes. The computing node failures can interrupt these processes, subsequently causing information loss and prolonging the execution time. To reduce the prohibitively large overhead incurred by the computing nodes failures, the accurate prediction of computing node failure is vital, which can help to avert potential large overhead, service interruptions, and negative customer experiences. Existing solutions of computing nodes failure prediction mainly focus on utilizing state-of-the-art time-series models to enhance the performance of computing node failure prediction. However, on the one hand, they could not capture the causal relationship between device over-utilization and node failures; On the other hand, they fail to extract the complex spatial-temporal cascading correlations among computing node failure events. These limits can degrade the performance of computing node failure prediction. To address these above problems, this article makes an effort to focus on designing a continuous-time dynamic graphs-based computing node failures prediction (CTDG-NFP) scheme, to accurately predict in dynamic cluster environments. Specifically, the CTDG-NFP scheme first designs a novel multiple-dimensional feature-biased neighbor sampling method, which jointly considers CPU utilization-biased, memory utilization-biased, temporal-biased and spatial-biased, to sample relevant context. Then, the CTDG-NFP scheme extracts diverse computing node failure motifs by multiple-dimensional feature-biased-based long-short-path walk method and set-based anonymization method. Finally, the CTDG-NFP scheme adopts time encoder to encode these motifs, and thereby extracting the complex spatial-temporal correlations among computing node failure events. On this basis, contrastive learning is adopted to train the computing node failure prediction model. Extensive evaluations with various real-world failure traces demonstrate the CTDG-NFP scheme can achieve superior performance in terms of six widely used performance metrics compared with the SOTA node failure prediction methods.
Graph neural networks (GNNs) have excelled in handling graph-structured data, attracting significant research interest. However, two primary challenges have emerged: interference between topology and attributes distorting node representations, and the low-pass filtering nature of most GNNs leading to the oversight of valuable high-frequency information in graph signals. These issues are particularly pronounced in heterophilic graphs. To address these challenges, we propose attribute-topology cross-frequency aligned (ATCFA) GNNs. ATCFA combines low- and high-pass filters to capture both smooth and detailed representations from topological and attribute perspectives. It also enforces frequency-specific constraints to reduce noise and redundancy in each frequency band. The model can dynamically adjust the filtering ratios for both homophilic and heterophilic graphs. Crucially, ATCFA establishes dynamic associations between corresponding frequency components of topology and attribute, achieving systematic alignment and interactive fusion that explicitly mitigates interference and promotes complementary information utilization across domains. Extensive experiments on standard datasets show that ATCFA delivers higher classification accuracy than state-of-the-art methods, proving its capability to handle both homophilic and heterophilic graphs in node classification.
Unmanned aerial vehicles (UAVs) play a vital role in scenarios such as community safety patrol and disaster search-and-rescue operations due to their maneuverability and deployment flexibility. However, their limited payload capacity, energy constraints, and susceptibility to interference hinder technological advancements. Additionally, centralized training models pose privacy risks, increasing the potential for data leakage. To address these challenges, this article proposes EMM-Det, a low-power, distributed detection method designed for UAV object detection. EMM-Det enhances system performance through three key design strategies: 1) employing memory-enhanced spiking neurons with dynamic leakage constants enhances firing rates and prevents spike decay; 2) utilizing wavelet transform to encode multiscale frequency-domain features improves object detection robustness; and 3) leveraging crowdsourced perception and federated learning (FL) technologies boosts data collection efficiency while mitigating privacy leakage risks. On our constructed dataset, EMM-Det achieves 81.8% mAP@50:95 detection accuracy at extremely low power consumption-3.2% higher than the second-best method and 14.5% superior compared to traditional artificial neural network (ANN) approaches. Experimental results demonstrate that EMM-Det achieves an effective balance between computational efficiency, noise resilience, and data privacy protection. It shows strong potential for deployment in real-world scenarios with stringent energy and privacy requirements, such as community safety patrol and emergency rescue operations.
Unlike traditional model-based reinforcement learning (RL) approaches that estimate system parameters from data, nonmodel-based data-driven control learns the optimal policy directly from input-state data without any intermediate model identification. Although this direct RL approach offers increased adaptability and resilience to model misspecification, its reliance on raw data leaves it vulnerable to system noise and disturbances that may undermine convergence, robustness, and stability. In this article, we establish the convergence, robustness, and stability of value iteration (VI) for data-driven control of stochastic linear quadratic (LQ) systems in discrete time with entirely unknown dynamics and cost. Our contributions are threefold. First, we prove that VI is globally exponentially stable for any positive semidefinite initial value matrix in noise-free settings, thereby significantly relaxing restrictive assumptions on initial value functions in existing literature. Second, we extend our analysis to settings with external disturbances, proving that VI maintains small-disturbance input-to-state stability (ISS) and converges within a small neighborhood of the optimal solution when disturbances are sufficiently small. Third, we propose a new nonmodel-based robust adaptive dynamic programming (ADP) algorithm for adaptive optimal controller design, which, unlike existing procedures, requires no prior knowledge of an initial admissible control policy. Numerical experiments on a "data center cooling" problem demonstrate the convergence and stability of the algorithm compared to established methods, highlighting its robustness and adaptability for data-driven control in noisy environments. Finally, we apply the method to dynamic portfolio allocation, demonstrating its practical relevance outside traditional control tasks.
The cooperative tracking control problem for the multiagent systems with unknown dynamic models, state constraints, control input constraints, optimal performance index, and convergence rate constraints is investigated in this article. A novel design framework is proposed to cope with these constraints and requirements. More specifically, the mean value theorem is utilized to transform the control input constraint to the unconstrained form. The performance function combined with a barrier Lyapunov function is leveraged to achieve the guaranteed transient tracking performance, finite-time convergence, and state constraints. To satisfy the optimal performance index, while considering the unknown dynamics of agents, we employ the actor-critic neural network architecture to get the near-optimal solution. Our control scheme is completely model-free. The rigorous Lyapunov stability analyses show that the full-state constraints and control input constraints are always satisfied, and the cooperative tracking errors can be made as small as possible in a finite time at a desired decay rate by appropriately setting the parameters in the performance function and barrier Lyapunov function. Finally, the simulation and hardware tests are conducted to clarify the effectiveness of the proposed control strategy.
Designing identical dual-band optical filters remains a complex optimization challenge in photonics and optical communication systems. Conventional methods, which rely on iterative electromagnetic simulations or analytical approximations, often suffer from limited generalizability and high computational costs. In this work, we propose a deep reinforcement learning (RL) framework for the autonomous optimization of identical dual-band fiber Bragg grating (FBG) filters. A policy network based on a three-layer fully connected neural architecture is trained using a proximal policy optimization algorithm to minimize the full width at half maximum (FWHM) of both transmission bands while maintaining spectral symmetry and identical channel characteristics. The deep RL-based design achieves a 43% reduction in FWHM and a 49% reduction in grating length compared to baseline designs, without sacrificing reflectivity or channel uniformity. This study demonstrates the feasibility and effectiveness of deep RL as a powerful optimization tool for complex photonic systems, providing a scalable and data-efficient pathway toward next-generation optical device design.
The advent of federated learning (FL) has revolutionized the way distributed systems handle collaborative model training while preserving user privacy. Recently, federated unlearning (FU) has emerged to address demands for the "right to be forgotten" and unlearning of the impact of poisoned clients without requiring retraining in FL. Most FU algorithms require the cooperation of retained or target clients (clients to be unlearned), introducing additional communication overhead and potential security risks. In addition, some FU methods need to store historical models to execute the unlearning process. These challenges hinder the efficiency and memory constraints of the current FU methods. Moreover, due to the complexity of nonlinear models and their training strategies, most existing FU methods for deep neural networks (DNNs) lack theoretical certification. In this work, we introduce a novel FL training and unlearning strategy in DNN, termed forgettable federated linear learning ( $\mathtt {F^{2}L^{2}}$ ). $\mathtt {F^{2}L^{2}}$ considers a common practice of using pretrained models to approximate DNN linearly, allowing them to achieve similar performance as the original networks via federated linear training (FLT). We then present FedRemoval, a certified, efficient, and secure unlearning strategy that enables the server to unlearn a target client without requiring client communication or adding additional storage. We have conducted extensive empirical validation on small- to large-scale datasets, using both convolutional neural networks and modern foundation models (FMs). These experiments demonstrate the effectiveness of $\mathtt {F^{2}L^{2}}$ in balancing model accuracy with the successful unlearning of target clients. $\mathtt {F^{2}L^{2}}$ represents a promising pipeline for efficient and trustworthy FU. The code is available at: https://anonymous.4open.science/r/2F2L-Federated-Unlearning-D57D/README.md.
Formal verification using temporal logics such as computation tree logic (CTL) is essential for validating safety and correctness in complex systems. However, traditional model-checking techniques face severe scalability limitations due to the state explosion problem and their reliance on exhaustive symbolic traversal. Moreover, existing learning-based verification methods often lack formal guarantees and interpretability. These challenges create a pressing need for scalable, learning-based verification methods that preserve verification reliability while improving computational efficiency. This article introduces a novel deep reinforcement learning (DRL)-based model checking framework that learns to verify CTL formulas directly through interaction with system models. Unlike traditional symbolic model checkers such as NuSMV, the proposed DRL-CTL checker trained using proximal policy optimization (PPO) interprets CTL semantics over system models represented as Kripke structures without performing symbolic state-space traversal at inference time. Reward functions are designed for individual CTL operators, and fixed-point reasoning is incorporated to handle global temporal properties such as $AG(\phi)$ and $EG(\phi)$ . Experimental results show that the proposed method achieves near-constant inference time of approximately 2 ms per formula on an Intel Core i9-13900K CPU (24 cores, 3.0 GHz), 64 GB RAM, NVIDIA RTX 4090 GPU (24 GB VRAM), reduces verification time by up to 90% compared with traditional model checkers, and scales to models with more than $10^{1192}$ reachable states. The framework also produces witnesses and counterexamples and yields verification outcomes identical to those of symbolic checkers in our experiments. These results highlight the potential of DRL to serve as a scalable, efficient, and explainable alternative to classical CTL model checking.
Text-driven diffusion models have achieved remarkable performance in human motion generation. However, these generative works struggle to generate high-quality motion consistent with textual descriptions. The primary reasons are: 1) insufficient fine-grained motion modeling due to the motion representations being difficult to distinguish in latent diffusion; and 2) inconsistencies between motions and textual descriptions due to misalignment in the multimodal space. To overcome these limitations, this work proposes the Motion generation with Frequency and Text State Space models (MoFTSS) including two main modules: frequency state space model (FreqSSM) and text state space model (TextSSM). Specifically, FreqSSM derives fine-grained representations by decomposing sequences into low-frequency and high-frequency components. This allows it to guide the generation of static poses (e.g., sitting, lying) and fine-grained motions (e.g, transitions, stumbling). For consistency between text and motion, TextSSM treats text features as a semantic modulation term within the SSM, enabling dynamic filtering of motion features consistent with textual semantics. Extensive experiments suggest that our MoFTSS achieves superior performance on the text-to-motion generation task. Notably, it attains the lowest FID of 0.181 on the HumanML3D dataset, significantly lower than the 0.421 achieved by MLD.
Reinforcement learning (RL) has shown excellent performance in solving decision-making and control problems of autonomous driving (AD), which is increasingly applied in diverse driving scenarios. However, driving is a multiattribute problem, leading to challenges in achieving multiobjective compatibility for current RL methods, especially in both policy updating and policy execution. On the one hand, a single value evaluation network limits the policy updating in complex scenarios with coupled driving objectives. On the other hand, the common single-type action space structure limits driving flexibility or results in large behavior fluctuations during policy execution. To this end, we propose a multiobjective ensemble-critic (MoEC) RL method with a hybrid parametrized action for multiobjective compatible AD. Specifically, an advanced MORL architecture is constructed, in which the ensemble-critic focuses on different objectives through independent reward functions. The architecture integrates a hybrid parameterized action space structure, and the generated driving actions contain both abstract guidance that matches the hybrid road modality and concrete control commands. In addition, an uncertainty-based exploration mechanism that supports hybrid actions is developed to learn multiobjective compatible policies more quickly. The experimental results demonstrate that, in both simulator-based and HighD dataset-based multilane highway scenarios, our method efficiently learns multiobjective compatible AD with respect to efficiency, action consistency, and safety.
Geographic entity representation learning (GERL) is an emerging method that represents natural, administrative divisions, road networks, and points of interest (POIs) in a low-dimensional continuous vector space. GERL provides a fundamental approach for geographic entities to underpin a variety of intelligent applications by learning their representation vectors to capture the semantics and interactions of the entities. Previous GERL methods mainly focus on the representation learning of the geographic entities that are seen at the time of training, which struggle to accurately generate representation vectors for the growing number of unseen geographic entities that were not involved in model training. To address this issue, this article proposes spatial meta-learning-based representation learning (SMRL), which integrates spatial subgraphs and meta-learning to improve the representation vectors of unseen geographic entities. Specifically, SMRL first designs a spatial-aware subgraph sampling module based on attributes and relationships of geographic entities to divide entities into spatial subgraphs. It develops a local-level representation module to learn entity features at the subgraph level. Finally, SMRL proposes a meta-learning-driven representation strategy that integrates meta-learning to learn the representation of unseen geographic entities. Extensive experiments show that the proposed SMRL method outperforms baselines with both higher accuracy and higher computational efficiency. This study provides new explorations for the representation of unseen geographic entities and offers methodological References for the various geographic applications.
High-resolution remote sensing semantic segmentation plays a critical role in land-use monitoring, urban planning, and disaster response. However, its deployment remains challenging owing to modality heterogeneity, fine-scale object structures, and the high computational cost of current deep learning models. To address these challenges, we propose a semantic prompt and graph-convolution-structure distillation framework (SPGSNet-S ${}^{\ast } $ ), a compact, yet effective architecture that integrates multimodal feature enhancement with dual-path knowledge distillation (KD). Specifically, we design two lightweight modules-auxiliary spatial feature extraction (ASFE) and red-green-blue (RGB) representation-to denoise and align noisy normalized digital surface model (nDSM) features with RGB imagery, enabling robust feature fusion. In addition, we introduce a dual distillation scheme comprising graph-convolution-based structure distillation, which captures and transfers spatial topological dependencies, and semantic prompt distillation (SPD), which dynamically generates and injects class-aware visual prompts without external text supervision. Experimental results on the Vaihingen and Potsdam datasets show that SPGSNet-S ${}^{\ast }$ outperforms several state-of-the-art methods, achieving competitive performance with only 8.89 M parameters and 2.29 G floating-point operations (FLOPs). The source code and experimental results are publicly available at https://github.com/110-011/SPGSNet.