Graph neural networks (GNNs) have become a cornerstone for modeling complex relational data, yet most canonical architectures assume the graph topology is static. In real-world temporal interaction graphs (TIGs), however, nodes and edges evolve continuously. Recent memory-augmented temporal GNNs address this by maintaining a learnable memory for every node, which is discretely updated only when an interaction occurs. Although memory based models such as TGN and TGAT include time encoding as a marker representing the update time of node memory, the elapsed time merely serves as a static input feature that modulates the next embedding update. The memory itself is still assumed to remain unchanged between two events, implying that the node state evolves through a sequence of instantaneous jumps. This event-driven assumption neglects two key temporal phenomena: (1) natural decay-a node's latent state gradually loses relevance during long inactivity; and (2) short-term drift-the latest neighbour exerts a residual influence that fades smoothly rather than abruptly. Consequently, memories of sparsely active nodes become stale. We can only obtain the memory of the node at the time of the event, but cannot obtain the memory at any given time. The discrete-update paradigm cannot infer a node's state at arbitrary intermediate timestamps, limiting both accuracy and temporal consistency. To tackle these limitations, we introduce the Natural Evolution Unit (NEU), a module that inserts a continuous-time memory evolution stage before the embedding read-out. NEU treats the time difference not as an auxiliary input but as a dynamic driver of a learnable ordinary differential equation, enabling memory to drift and decay smoothly between events. This brings two benefits: (1) node states become continuously queryable at arbitrary timestamps, providing continuous time inferability and interpretability; and (2) our modeling of time differences has been upgraded from "static input" to "dynamic evolution driven", allowing the ODE-driven dynamics to dominate the temporal modeling and reducing dependence on learnable time encoding. Therefore, we adopt a fixed time encoding to control the stability of training, thereby further improving the experimental results. We conducted comprehensive experiments on five public datasets, and the results showed that NEU consistently improved AUC and AP on the strongest memory based baseline. This indicates that our method is crucial for mitigating memory aging and enhancing long-term prediction, offering a simple yet effective new perspective for representation learning on dynamic graphs.
Neural network pruning makes large neural networks more portable and smaller neural networks more interpretable with minimal loss of predictive accuracy. We propose a pruning method that reformulates the neural-network LASSO problem as a standard weighted regression or classification problem with a LASSO penalty. We apply this method starting from a dense neural network structure that includes all possible feed-forward networks as subnetworks. This efficiently removes substantial redundancy. To further refine the network, we develop a second step, which cycles over all remaining links, one at a time, removing those that do not sufficiently improve the model fit. We demonstrate the effectiveness and stability of our method for both regression and classification problems in four simulation studies. Our method improves both prediction and interpretability, compared with the original dense neural network, and with state-of-the-art neural network pruning methods. Our method also outperforms the state-of-the-art methods across ten real data examples (five regression and five classification).
Recently, introducing task-related information into the fusion of infrared and visible images has improved the practicality of fused images in real scenarios. However, due to the heterogeneity of fused features and task-related features, it is difficult to integrate task information into fused images, resulting in the quality degradation. To alleviate this issue, we propose a task-driven visible and infrared image fusion network called TDFuse, which implements dual injection of detail and semantic information from the downstream task into fused images. Specifically, the detail injection module (DIM) and semantic injection module (SIM) are designed to progressively distill the task-related detail and semantic information into the features of fused images, respectively. In DIM, the task detail enhancement block is constructed to combine fusion features with the detail information from the task network. The task semantic enhancement block in SIM is responsible for the injection of the semantic information in the task network. Meanwhile, a detail-semantic dual constraint loss is devised to ensure that DIM and SIM can adaptively learn the corresponding detail and semantic information from the task network. Through the dual injection manner, the fused images contain more detail and semantic information from the downstream task. Finally, extensive experiments on the M3FD, MSRS, LLVIP and downstream tasks demonstrate that our method not only ensures the visual effect of the fused images but also enhances the performance of downstream tasks. The code is available at https://github.com/Fullness-1/TDFuse.
The Transformer architecture widely adopted in the large language models (LLMs) suffers from limited inference efficiency due to the inherently sequential nature of autoregressive token generation. To address this issue, speculative decoding (SD) has been proposed to accelerate LLM inference by employing small speculative models (SSMs) to generate candidate tokens that are subsequently verified by the target LLM. However, the SD methods is often constrained by the key challenges: the low acceptance rate of tokens predicted by SSMs. To overcome the limitation, this paper proposes a Dual-Stream Network Architecture (DSNA), the architecture introduces two parallel processing streams that simultaneously model word sequences and feature sequences. The outputs of these two streams are progressively fused in subsequent stages to enhance the quality of candidate predictions. Furthermore, a dynamic multi-path decoding (DMPD) mechanism is introduced to leverage the enriched representations produced by the dual-stream architecture. This mechanism allows multiple candidate token paths to be evaluated simultaneously, enabling the model to accept multiple tokens within a single forward propagation step during the inference process. Extensive experiments show that our proposed the method consistently outperforms the state-of-the-art SD approaches, achieving significant improvements in both inference throughput and generation accuracy across multiple benchmarks.
The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic forecasting and energy consumption prediction. Therefore, it is imperative to develop a robust spatio-temporal learning methodology capable of extracting meaningful insights from incomplete datasets. Despite the existence of methodologies for spatio-temporal graph forecasting in the presence of missing values, unresolved issues persist. Primarily, the majority of extant research is predicated on time-series analysis, thereby neglecting the dynamic spatial correlations inherent in sensor networks. Additionally, the complexity of missing data patterns compounds the intricacy of the problem. Furthermore, the variability in maintenance conditions results in a significant fluctuation in the ratio and pattern of missing values, thereby challenging the generalizability of predictive models. In response to these challenges, this study introduces GeoMAE, an enhanced spatio-temporal representation learning model with a self-supervised auxiliary loss. The model comprises three principal components: an input preprocessing module, an attention-based spatio-temporal forecasting network (STAFN), and an auxiliary learning task, which inspired by Masking AutoEncoders to enhance the robustness of spatio-temporal representation learning. Empirical evaluations on real-world datasets demonstrate that GeoMAE significantly outperforms existing benchmarks, achieving up to 13.20% relative improvement in MAE over the best baseline models (e.g., 22.42 vs. 25.01 MAE at 25% missing rate).
Breast cancer is the second most common cause of mortality in women. The oncogenes are present in the cells. The milk gland has irregular meiosis and mitosis cell division and cause tumors structure that known as Breast cancer.This study proposes the JSPR-Net classifier for the detection of breast cancer. The detection framework for a new fractional-order compartmental model that uses CF derivatives captures the dynamics of breast cancer progression that depend on memory across different stages of the disease. Fixed-point theory for existence and uniqueness and Natural transform with Ulam-Hyers stability make sure that the model's math is solid and that the numbers are reliable. Graphical visualizations reveal significant disease dynamic variations under different fractional orders (α=0.7,0.8,0.9,1.0), while equilibrium analysis provides insights for intervention strategies and CF method for progress identification of Breast cancer. The kaggle dataset with 250 samples for disease detection is used. The diagnosis accuracy 94% is achieved by proposed method. The achieved accuracy, feature importance and stability analysis shows the robustness of model for breast cancer detection.
The process operating performance assessment (POPA) of electro-fused magnesium furnace (EFMF) is very important to ensure product quality and pursue the maximum comprehensive economic benefit. However, the data at the beginning of the new production processes do not have performance grade labels and often includes new performance grades. Traditional multi-source domain open-set domain adaptation (OSDA) method categorizes all unknown classes into one class without further subdivision. To address this issue, a method based on multi-source domain open-set deep transfer adversarial network (MDODTAN) is studied to solve the POPA problem of the EFMF, which focuses on subdividing multiple unknown classes into different unknown performance grades. This network designs a task classifier for each source domain, and the assessment accuracy of known performance grades is further enhanced. Then, the domain gap between the known performance grades in each source-target domain is reduced through multi-source domain adversarial training. By constructing a similarity matrix between the known and unknown performance grades, pseudo-labels are assigned to the target domain data, and the assessment accuracy of performance grade of the new smelting process is improved through iterative training. The experimental results indicate that our method achieves higher performance assessment accuracy in open-set scenarios compared to existing methods, while also accurately classifying and subdividing multiple unknown performance grades.
Reservoirs are efficient networks for time-series processing. It is well known that the network structure is one of the determinants of their performance. However, the topological structure of reservoirs, as well as their performance, is hard to analyze due to the lack of suitable mathematical tools. In this paper, we study the topological structure of reservoirs using persistent GLMY homology theory and develop a method to improve their performance. Specifically, we find that reservoir performance is correlated with the one-dimensional GLMY homology groups. Then, we develop a reservoir structure optimization method by modifying the minimal representative cycles of one-dimensional GLMY homology groups. Finally, through experiments, we validate that the performance of reservoirs is jointly influenced by the reservoir structure and the periodicity of the dataset.
Spiking neural networks (SNNs) represent a promising approach in machine learning, combining the hierarchical learning capabilities of deep neural networks with the energy efficiency of spike-based computations. Traditional end-to-end training of SNNs is often based on back-propagation, where weight updates are derived from gradients computed through the chain rule. However, this method encounters challenges due to its excessive cost, limited biological plausibility, and inefficiency on neuromorphic hardware. In this study, we introduce an alternative training approach for SNNs. Instead of using back-propagation, we leverage weight perturbation methods within a forward-mode gradient framework. Specifically, we perturb the weight matrix with a small noise term and estimate gradients by observing the changes in the network output. Experimental results on regression tasks, including solving various PDEs, show that our approach achieves competitive accuracy, suggesting its suitability for neuromorphic systems and potential hardware compatibility.
Point cloud-based place recognition aims to estimate a rough location by searching the database with a global descriptor aggregated from local features of the query point cloud. Recent advanced methods exploit the attention mechanism that establishes all pairs of relationships to enhance the local features with long-range contextual information. However, this operation may aggregate redundant and misleading information from time-varying objects and task-irrelevant areas (such as cars and ground points) into the local features, thereby impairing the discriminative power of the features. In this paper, we propose a novel discriminative region-guided transformer, dubbed DRFormer, for the point cloud-based place recognition task by explicitly constructing discriminative pair relationships to avoid aggregating task-irrelevant information. Specifically, we devise a lightweight but effective global aggregation module, named LightVLAD, to efficiently provide cues for locating the discriminative regions. Based on the LightVLAD, we propose a discriminative region-guided attention module to pay more attention to distant discriminative local features. In this module, the approximate centers of discriminative regions are located according to the assignment weights in LightVLAD. The local regions around the centers are embedded to characterize the local contextual and structural information. Next, global interaction is performed between these embedded features, and the global information is distributed to enhance local features via cross-attention. As such, the local features attend to a small subset of discriminative regions without distraction from other irrelevant ones. Extensive experiments on various benchmark datasets demonstrate that our method outperforms existing state-of-the-art methods on the point cloud-based place recognition task.
Numerous domains, including robotics and artificial intelligence, make extensive use of time-varying quadratic programming (TVQP). Because of the TVQP's importance, a novel adaptive neutrosophic logic/fuzzy neural network TVQP solver, called NZNN-TVQP, is introduced in this work. The proposed TVQP solver uses a recently developed neutrosophic logic/fuzzy adaptive zeroing neural network (NZNN) technique as well as a neutrosophic logic/fuzzy adaptive penalty function. It is important to mention that the NZNN is an advancement on the conventional zeroing neural network (ZNN) technique, which has shown great promise in solving time-varying tasks. To address the TVQP task, the performance of four variations of the NZNN-TVQP solver are examined. All variations of the solver perform remarkably well, as demonstrated by two simulation tests and two real-world applications to portfolio selection problem.
Information propagation characterizes how input correlations evolve across layers in deep neural networks. This framework has been well studied using mean-field theory, which assumes infinitely wide networks. However, these assumptions break down for practical, finite-size networks. In this work, we study information propagation in randomly initialized neural networks with finite width and reveal that the boundary between ordered and chaotic regimes exhibits a fractal structure. This shows the fundamental complexity of neural network dynamics, in a setting that is independent of input data and optimization. To extend this analysis beyond multilayer perceptrons, we leverage recently introduced Fourier-based structured transforms, and show that information propagation in convolutional neural networks also follow the same behavior. In practice, our investigation highlights the importance of finite network depth with respect to the tradeoff between separation and robustness. We also show that fractal patterns are observed for information propagation in the backward pass, i.e., backpropagation from the last to the first layer of finite-size networks.
Recent advancements in radiance fields, particularly with the emergence of Gaussian splatting, have highlighted their significant potential for 3D scene reconstruction and novel view synthesis. However, existing methods encounter substantial challenges when addressing dynamic environments, especially in complex urban settings with both rigid and non-rigid participants. To tackle these challenges, we propose a geometry-aware framework that integrates Gaussian primitives with a template mesh to effectively represent dynamic objects. This integration facilitates the efficient and accurate reconstruction of urban scenes, ensuring that the geometric integrity of dynamic elements is maintained. We first decompose the scene into a dynamic scene graph and fit the template vertices to observations to construct topologically consistent 3D models. Then, we build Gaussian radiance fields for dynamic nodes based on the template meshes, optimizing the vertex offset of dynamic participants to align with their geometric surfaces. We further project the appearance attributes into the 2D texture space based on topological relationships preserved in the Gaussians, enabling finer reconstruction of small-scale details and smoother appearance generalization on unseen surfaces. To validate the effectiveness of our proposed method, we conduct extensive evaluations on the Waymo Open Dataset (Ettinger et al., 2021) and the KITTI Dataset (Geiger et al., 2013). Our results demonstrate superior performance compared to mainstream dynamic reconstruction methods. We believe our work establishes a foundation for more realistic and geometrically complete urban scene reconstruction.
Graph Attention Network (GAT), which adaptively distinguishes the importance of neighboring nodes in information aggregation, is a powerful graph representation learning method. However, the existing GAT methods that rely solely on the node embedding-level attention ignore the rich semantic correlation information embedded in the graph topology, struggling to effectively distinguish the importance of different nodes. They also face the challenges as well as semantic deviation between graph and label prediction and insufficient interaction between unlabeled and labeled nodes due to the network depth and scarcity of supervised information. In this paper, we propose a hybrid graph attention learning mechanism to integrate both node embedding-level and structure embedding-level attentions, enabling more comprehensive and accurate modeling of node neighboring relationships. Additionally, we introduce a dynamic graph evolution mechanism that incorporates elaborate topology pruning and node mixing operations guided by pseudo labels with gradually increasing confidence. This endows the model with the ability to adaptively correct the graph structure and significantly enhance its robustness to noisy graph. It is also beneficial for promoting the semantic alignment between graph and label prediction as well as improving the accessibility of labeled nodes. The adaptive graph with feature and structure mixing inevitably promotes the hybrid attention learning, resulting in a closed-loop between representation learning and graph optimization framework. Extensive experimental results on real-world graph datasets clearly demonstrate the superiority of the proposed method in the ability of exploring accurate attention and discriminative representation learning, which achieves significant performance improvements over several previous baselines.
Recently, Multi-Plane Image shows great potential in novel view synthesis since it provide a generalizable formulation that enables strong reasoning ability even in unknown scenes. However, existing methods not only struggle with occlusion and complex scenes, but also often require a large number of depth planes and expensive computational cost. In this paper, we propose a novel consistency guided Multi-Plane Image construction method specifically designed for novel view synthesis. Different from the previous MPI-based methods, we construct the MPI serially layer by layer and accumulate consistency information during the process. Specifically we first propose a cross-view consistency mask to incorporate foreground occlusion information into the layered construction to perceive occlusion. Second, we propose a cross-layer consistency mask and a novel depth guidance strategy to incorporate appropriate scene context information into the layered construction to better understand the geometric structure of the scene. We conduct extensive experiments on the Spaces and Real Forward-Facing datasets. The results demonstrate that our method excels in novel view synthesis and multi-frame denoising tasks, achieving state-of-the-art performance with relatively low computational cost. Quantitatively, it outperforms state-of-the-art methods by improving PSNR by approximately 1.1% in challenging sparse-view settings and 2.1% in denoising tasks, achieving superior performance with relatively low computational cost.
Object detection methods that fuse visible and infrared modalities significantly enhance detection accuracy and robustness by leveraging complementary information from both modalities. However, existing methods suffer from the following shortcomings: 1) They are limited to specific scenarios (e.g., conventional or remote sensing small targets), making it difficult to meet the detection needs of multi-scale targets; 2) CNN-based fusion methods, constrained by static convolutional kernels, struggle to handle fusion tasks involving significant modality differences; 3) Transformer-based fusion methods, limited by quadratic computational complexity, hindering practical application. To enhance the model's adaptability to both conventional and remote sensing scenarios, we design the Global and Local Mamba modules, which extract global contextual information from a global perspective and construct local receptive fields from a local perspective, respectively. Secondly, to reduce the computational overhead of long-sequence modeling across modalities during modality fusion, we introduce the Modality Decouple module to decouple features into modality-agnostic and modality-specific features. Based on the degree of modality difference among the decoupled features, differentiated fusion strategies are implemented, effectively reducing the computational resource consumption associated with redundant modality-agnostic features during modality fusion. For modality-agnostic features with small modality differences, we employ a lightweight Spatial Attention module for simple modeling fusion. For modality-specific feature with significant modality differences, we utilize efficient the Spatial Mamba and Channel Mamba modules to perform complex modeling fusion from spatial and channel dimensions, respectively. Simultaneously, to address the potential memory challenges of Mamba in cross modal long sequence modeling, we propose a cross modal interaction paradigm for Mamba within the Spatial and Channel Mamba modules. This paradigm achieves efficient cross modal fusion by interleaving cross modal features at corresponding positions. Extensive experiments on three conventional scene object datasets (FLIR, LLVIP, M3FD) and two remote sensing small object datasets (VEDAI, DroneVehicle) demonstrate that our MDM not only achieves SOTA performance efficiently but also addresses the challenges of small object detection, exhibiting strong robustness. To our knowledge, this study explores the interaction paradigm of Mamba for cross modal fusion. Compared to the simple interaction schemes in existing work, we further unlock the potential of Mamba in cross modal fusion tasks. The code is available at: https://github.com/SEUZYC/MDM.
Deep neural networks developed for natural language processing tasks have shown to be vulnerable to attacks, and a variety of text adversarial methods have been proposed so far for the purpose of enhancing the attack success rates and attack efficiency in such natural language processing applications. These methods have good attack success rates and efficiency, and can stably generate successful adversarial samples for a "static" victim model (i.e., a fixed model). In reality, as the training data are continually generated, the models will also be continuously updated. Whether the current adversarial attack methods can maintain their performance for these "dynamic" victim models is an open question. In this paper, we design a new task and investigate the performance of static text adversarial attack methods in continuously updating model scenarios. To standardize this task, we propose a comprehensive evaluation framework consisting of two novel experimental methods, three new metrics, and a lightweight dynamic baseline. This framework explicitly decouples the assessment of dynamic attack performance into direct effectiveness and iterative persistence. Extensive experiments on two victim models on three datasets show that a continuously updating model leads to a significant degradation in the performance of static adversarial attack methods. Moreover, the results demonstrate that our framework precisely characterizes the temporal variations of attack performance, confirming its effectiveness and necessity.
Stereo matching is of vital importance for the perception system of autonomous driving. Although significant achievements have been made so far, matching blurring still easily occurs in large disparity and ill-posed regions. This paper introduces a novel stereo matching algorithm, DV-Stereo, which constructs differential features using the disparity map through a difference optimization module and Difference Guided Attention to optimizes the blurry regions in the correlation disparity features. Additionally, to make the model applicable to large disparity scenarios, a compressed cost volume is proposed in this paper. It can improve the model's performance in large disparity regions with a small increase in computational cost. Furthermore, to effectively fuse the correlation features in the compressed cost volume, an adaptive correlation feature fusion module is proposed. This module can adaptively fuse the geometric feature information under different disparity ranges and input it into the ConvGRU iterative optimization module. DV-Stereo performs well in benchmark tests on the Scene Flow, KITTI 2012 & 2015, and ETH3D datasets. Particularly on the ETH3D dataset, the 1-pixel error is only 0.90, which is approximately 43% lower than the state-of-the-art methods. Meanwhile, DV-Stereo demonstrates extremely strong performance in the generalization test on the Middlebury 2014 dataset.
Diffusion trajectory distillation accelerates sampling by training a student model to approximate the multi-step denoising trajectories of a pretrained teacher model using far fewer steps. Despite strong empirical results, the trade-off between distillation strategy and generative quality remains poorly understood. We provide a theoretical characterization by reinterpreting trajectory distillation as an operator merging problem, differentiating our analysis between two distinct regimes. In the linear Gaussian regime, where approximation error is zero, we isolate optimization error, specifically signal shrinkage driven by finite training time, as the primary bottleneck. This characterization allows us to derive the theoretically optimal merging strategy, which exhibits a variance-driven phase transition and is computable via a Pareto dynamic programming algorithm. In the nonlinear Gaussian mixture regime, we prove that distilling composite steps incurs unavoidable approximation error due to the exponential growth of mixture components, and we quantify how these errors amplify across merges. Together, these results clarify the distinct theoretical mechanisms governing each regime and provide principled guidance for method selection.
In curriculum reinforcement learning (CRL), an agent incrementally accumulates knowledge over a sequence of tasks (i.e., a curriculum), and the learning process is aimed at using the accumulated knowledge to finally solve a challenging target task. While early CRL works focus on sequencing candidate tasks, recent research explores automatic curriculum generation. Among the rich CRL literature, the interpolation-based CRL paradigm is a main body, which automatically generates intermediate tasks by interpolating between the initial task distribution and the target task distribution in task space with meaningful distance metrics (i.e., can measure the task similarity). However, in challenging navigation tasks, the non-Euclidean context (task) space invalidates this assumption. To achieve automatic curriculum generation in complex task, we propose a novel automatic curriculum generation approach based on measurable task representation learning. To better measure the similarity, we propose to transform the task space to a latent space. Through a variational autoencoder structure that encodes the reward and the state transitions, we achieve a latent task representation with a task similarity measurement property, and two close task embeddings correspond to two similar tasks in terms of rewards and state transitions. Based on the learned task representation, we further develop an automatic curriculum generation scheme, which can effectively generate new tasks more and more similar to the target task. We evaluate our method in a variety of challenging navigation tasks, and the experiment results indicate that the proposed approach surpasses state-of-the-art CRL approaches based on interpolation and generative adversarial networks.