We investigate two population-level quantities (corresponding to complete data) related to uncensored stage waiting times in a progressive multi-stage model, conditional on a prior stage visit. We show how to estimate these quantities consistently using right-censored data. The first quantity is the stage waiting time distribution (survival function), representing the proportion of individuals who remain in stage j within time t after entering stage j. The second quantity is the cumulative incidence function, representing the proportion of individuals who transition from stage j to stage j' within time t after entering stage j. To estimate these quantities, we present two nonparametric approaches. The first uses an inverse probability of censoring weighting (IPCW) method, which reweights the counting processes and the number of individuals at risk (the at-risk set) to address dependent right censoring. The second method utilizes the notion of fractional observations (FRE) that modifies the at-risk set by incorporating probabilities of individuals (who might have been censored in a prior stage) eventually entering the stage of interest in the uncensored or full data experiment. Neit
For many conditions, it is of clinical importance to know not just the ability of a test to distinguish between those with and without the disease, but also the sensitivity to detect disease at different stages: in particular, the test's ability to detect disease at a stage most amenable to treatment. In a systematic review of test accuracy, pooled stage-specific estimates can be produced using subgroup analysis or meta-regression. However, this requires stage-specific data from each study, which is often not reported. Studies may however report test sensitivity for merged stage categories (e.g. stages I-II) or merged across all stages, together with information on the proportion of patients with disease at each stage. We demonstrate how to incorporate studies reporting merged stage data alongside studies reporting stage-specific data, to allow the inclusion of more studies in the meta-analysis. We consider both meta-analysis of tests with binary results, and meta-analysis of tests with continuous results, where the sensitivity to detect disease of each stage across the whole range of observed thresholds is estimated. The methods are demonstrated using a series of simulated dataset
Partially observable Markov decision processes (POMDPs) with stage duration provide a framework for approximating continuous-time behavior by scaling transition probabilities with a stage duration parameter $h \in (0,1]$. While previous literature has primarily focused on the limit of the discounted value as the stage duration $h$ vanishes, this paper investigates the global behavior of the asymptotic value, $V(h)$, across varying stage durations. Our main result demonstrates that any strategy in a POMDP with stage duration $h$ can be mimicked in the base POMDP ($h=1$). Specifically, we provide an explicit construction showing that for any strategy in the POMDP with stage duration $h$, there exists a strategy in the base POMDP that secures the same asymptotic payoff. As a consequence of this theorem, we establish that the value function $V(h)$ is nondecreasing with respect to $h$, and that the continuous-time limit $\lim_{h \to 0} V(h)$ exists.
In this paper, we study a two-stage stochastic version of the assignment game, which is a fundamental cooperative game. Given an initial setting, the set of players may change in the second stage according to some probability distribution, and the goal is to find core solutions that are minimally modified. When the probability distribution is given explicitly, we observe that the problem is polynomial time solvable, as it can be modeled as an LP. More interestingly, we prove that the underlying polyhedron is integral, and exploit this in two ways. First, integrality of the polyhedron allows us to show that the problem can be well approximated when the distribution is unknown, which is a hard setting. Second, we can establish an intimate connection to the well-studied multistage vertex cover problem. Here, it is known that the problem is NP-hard even when there are only 2 stages and the graph in each stage is bipartite. As a byproduct of our result, we can prove that the problem is polynomial-time solvable if the bipartition is the same in each stage.
The multi-stage phenomenon in the training loss curves of neural networks has been widely observed, reflecting the non-linearity and complexity inherent in the training process. In this work, we investigate the training dynamics of neural networks (NNs), with particular emphasis on the small initialization regime, identifying three distinct stages observed in the loss curve during training: the initial plateau stage, the initial descent stage, and the secondary plateau stage. Through rigorous analysis, we reveal the underlying challenges contributing to slow training during the plateau stages. While the proof and estimate for the emergence of the initial plateau were established in our previous work, the behaviors of the initial descent and secondary plateau stages had not been explored before. Here, we provide a more detailed proof for the initial plateau, followed by a comprehensive analysis of the initial descent stage dynamics. Furthermore, we examine the factors facilitating the network's ability to overcome the prolonged secondary plateau, supported by both experimental evidence and heuristic reasoning. Finally, to clarify the link between global training trends and local par
The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through every stage of a contemporary IR system. Such systems contain multiple stages (e.g., retrieval, pre-ranking, ranking, and re-ranking stages, as examined in this paper). The \emph{selection bias} inherent in the model of each stage significantly influences the results that are ultimately presented to users. To address this issue, we propose an improved ranking principle for multi-stage systems, namely the Generalized Probability Ranking Principle (GPRP), to emphasize both the selection bias in each stage of the system pipeline as well as the underlying interest of users. We realize GPRP via a unified algorithmic framework named Full Stage Learning to Rank. Our core idea is to first estimate the selection bias in the subsequent stages and then learn a ranking model that best complies with the dow
Hip fractures are a major cause of disability, mortality, and healthcare burden in older adults, underscoring the need for early risk assessment. However, commonly used tools such as the DXA T-score and FRAX often lack sensitivity and miss individuals at high risk, particularly those without prior fractures or with osteopenia. To address this limitation, we propose a sequential two-stage model that integrates clinical and imaging information to improve prediction accuracy. Using data from the Osteoporotic Fractures in Men Study (MrOS), the Study of Osteoporotic Fractures (SOF), and the UK Biobank, Stage 1 (Screening) employs clinical, demographic, and functional variables to estimate baseline risk, while Stage 2 (Imaging) incorporates DXA-derived features for refinement. The model was rigorously validated through internal and external testing, showing consistent performance and adaptability across cohorts. Compared to T-score and FRAX, the two-stage framework achieved higher sensitivity and reduced missed cases, offering a cost-effective and personalized approach for early hip fracture risk assessment. Keywords: Hip Fracture, Two-Stage Model, Risk Prediction, Sensitivity, DXA, FRAX
We develop a unified theoretical framework for sparse knowledge distillation based on probability-domain softening operators. While the equivalence $p^{1/T} \propto \mathrm{softmax}(z/T)$ is well known, our contribution is an operator-level analytical framework built on this foundation rather than the equivalence itself. The framework comprises four core components: (i) operator-agnostic bias--variance decompositions that characterize when sparse students outperform dense teachers, (ii) a homotopy path formalization of multi-stage pruning in function space explaining why iterative compression succeeds where one-shot pruning fails, (iii) convergence guarantees establishing $O(1/n)$ rates for $n$-stage distillation with explicit parameter dependence, and (iv) equivalence class characterizations identifying distinct probability-domain operators that yield identical student models under capacity constraints. We introduce an axiomatic definition of probability-domain softening operators based on ranking preservation, continuity, entropy monotonicity, identity, and boundary behavior, and show that multiple non-equivalent operator families satisfy these axioms. All learning-theoretic guar
For the scalability of industrial online advertising systems, a two-stage auction architecture is widely used to enable efficient ad allocation on a large set of corpus within a limited response time. The current deployed two-stage ad auction usually retrieves an ad subset by a coarse ad quality metric in a pre-auction stage, and then determines the auction outcome by a refined metric in the subsequent stage. However, this simple and greedy solution suffers from serious performance degradation, as it regards the decision in each stage separately, leading to an improper ad selection metric for the pre-auction stage. In this work, we explicitly investigate the relation between the coarse and refined ad quality metrics, and design a two-stage ad auction by taking the decision interaction between the two stages into account. We decouple the design of the two-stage auction by solving a stochastic subset selection problem in the pre-auction stage and conducting a general second price (GSP) auction in the second stage. We demonstrate that this decouple still preserves the incentive compatibility of the auction mechanism. As the proposed formulation of the pre-auction stage is an NP-hard p
Modern LLM serving now spans multi-stage pipelines including RAG retrieval and KV cache reuse, each with distinct compute, memory, and latency demands. Inference engines expose a large configuration space with no systematic navigation methodology, and exhaustively benchmarking configurations can exceed 40K in cloud costs. Simultaneously, the hardware landscape is rapidly diversifying across AMD GPUs, TPUs, and custom ASICs, while cross-vendor prefill-decode (PD) disaggregated configurations lack unified software stacks for end-to-end evaluation today. To address this gap, we present MIST, a Heterogeneous Multi-stage LLM inference Execution Simulator. MIST models diverse request stages; including RAG, KV retrieval, reasoning, prefill, and decode across complex hardware hierarchies. MIST supports heterogeneous clients executing multiple models concurrently unlike prior frameworks while incorporating advanced batching strategies and multi-level memory hierarchies. By integrating real hardware traces with analytical modeling, MIST captures critical trade-offs such as memory bandwidth contention, inter-cluster communication latency, and batching efficiency in hybrid CPU-accelerator depl
Reinforcement learning has recently been explored to improve text-to-image generation, yet applying existing GRPO algorithms to autoregressive (AR) image models remains challenging. The instability of the training process easily disrupts the pretrained model capability during long runs, resulting in marginal gains, degraded image quality, and poor generalization. In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics. To address these, we introduce STAGE, a stable and generalizable framework that leverages two targeted solutions: 1) Advantage/KL reweighting. Similarity-aware reweighting to alleviate conflicting updates; and 2) Entropy reward. An entropy-based reward corresponding to reference model to stabilize learning. With the help of alleviating conflicts between tokens and an entropy reward for stabilizing training, we reduce disruption of the pretrained distribution and mitigate reward hacking, which in turn improves generalization and transfer better to other benchmarks. Experiments across multiple benchmarks show that STAGE consistently improves visual quality, s
Time irreversibility (TIR) refers to the manifestation of nonequilibrium brain activity influenced by various physiological conditions; however, the influence of sleep on electroencephalogram (EEG) TIR has not been sufficiently investigated. In this paper, a comprehensive study on permutation TIR (pTIR) of EEG data under different sleep stages is conducted. Two basic ordinal patterns (i.e., the original and amplitude permutations) are distinguished to simplify sleep EEGs, and then the influences of equal values and forbidden permutation on pTIR are elucidated. To detect pTIR of brain electric signals, 5 groups of EEGs in the awake, stages I, II, III, and rapid eye movement (REM) stages are collected from the public Polysomnographic Database in PhysioNet. Test results suggested that the pTIR of sleep EEGs significantly decreases as the sleep stage increases (p<0.001), with the awake and REM EEGs, demonstrating greater differences than others. Comparative analysis and numerical simulations support the importance of equal values. Distribution of equal states, a simple quantification of amplitude fluctuations, significantly increases with the sleep stage (p<0.001). If these equal
Industrial ranking systems, such as advertising systems, rank items by aggregating multiple objectives into one final objective to satisfy user demand and commercial intent. Cascade architecture, composed of retrieval, pre-ranking, and ranking stages, is usually adopted to reduce the computational cost. Each stage may employ various models for different objectives and calculate the final objective by aggregating these models' outputs. The multi-stage ranking strategy causes a new problem - the ranked lists of the ranking stage and previous stages may be inconsistent. For example, items that should be ranked at the top of the ranking stage may be ranked at the bottom of previous stages. In this paper, we focus on the \textbf{ranking consistency} between the pre-ranking and ranking stages. Specifically, we formally define the problem of ranking consistency and propose the Ranking Consistency Score (RCS) metric for evaluation. We demonstrate that ranking consistency has a direct impact on online performance. Compared with the traditional evaluation manner that mainly focuses on the individual ranking quality of every objective, RCS considers the ranking consistency of the fused final
The two-stage process of propensity score analysis (PSA) includes a design stage where propensity scores are estimated and implemented to approximate a randomized experiment and an analysis stage where treatment effects are estimated conditional upon the design. This paper considers how uncertainty associated with the design stage impacts estimation of causal effects in the analysis stage. Such design uncertainty can derive from the fact that the propensity score itself is an estimated quantity, but also from other features of the design stage tied to choice of propensity score implementation. This paper offers a procedure for obtaining the posterior distribution of causal effects after marginalizing over a distribution of design-stage outputs, lending a degree of formality to Bayesian methods for PSA (BPSA) that have gained attention in recent literature. Formulation of a probability distribution for the design-stage output depends on how the propensity score is implemented in the design stage, and propagation of uncertainty into causal estimates depends on how the treatment effect is estimated in the analysis stage. We explore these differences within a sample of commonly-used pr
We present a two-stage algorithm for the parallel reduction of a pencil to Hessenberg-triangular form. Traditionally, two-stage Hessenberg-triangular reduction algorithms achieve high performance in the first stage, but struggle to achieve high performance in the second stage. Our algorithm extends techniques described by Karlsson et al. to also achieve high performance in the second stage. Experiments in a shared memory environment demonstrate that the algorithm can outperform state-of-the-art implementations.
Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion and manipulation, yet dynamic real-world interactions remain challenging. As a step toward fast-moving object interactions, we present a reinforcement-learning training pipeline that yields a unified whole-body controller for humanoid badminton, coordinating footwork and striking without motion priors or expert demonstrations. Training follows a three-stage curriculum (footwork acquisition, precision-guided swing generation, and task-focused refinement) so legs and arms jointly serve the hitting objective. For deployment, we use an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking, and also develop a prediction-free variant that removes the EKF and explicit prediction. We validate the framework with five sets of experiments in simulation and on hardware. In simulation, two robots sustain a rally of 21 consecutive hits. In real-world tests with both machine-fed shuttles and human-robot rallies, the robot achieves outgoing shuttle speeds up to 19.1~m/s with a mean return landing distance of 4~m. Moreover, the prediction-free varian
Images captured under complicated rain conditions often suffer from noticeable degradation of visibility. The rain models generally introduce diversity visibility degradation, which includes rain streak, rain drop as well as rain mist. Numerous existing single image deraining methods focus on the only one type rain model, which does not have strong generalization ability. In this paper, we propose a novel end-to-end Neuron Attention Stage-by-Stage Net (NASNet), which can solve all types of rain model tasks efficiently. For one thing, we pay more attention on the Neuron relationship and propose a lightweight Neuron Attention (NA) architectural mechanism. It can adaptively recalibrate neuron-wise feature responses by modelling interdependencies and mutual influence between neurons. Our NA architecture consists of Depthwise Conv and Pointwise Conv, which has slight computation cost and higher performance than SE block by our contrasted experiments. For another, we propose a stage-by-stage unified pattern network architecture, the stage-by-stage strategy guides the later stage by incorporating the useful information in previous stage. We concatenate and fuse stage-level information dyn
In many classification systems, sensing modalities have different acquisition costs. It is often {\it unnecessary} to use every modality to classify a majority of examples. We study a multi-stage system in a prediction time cost reduction setting, where the full data is available for training, but for a test example, measurements in a new modality can be acquired at each stage for an additional cost. We seek decision rules to reduce the average measurement acquisition cost. We formulate an empirical risk minimization problem (ERM) for a multi-stage reject classifier, wherein the stage $k$ classifier either classifies a sample using only the measurements acquired so far or rejects it to the next stage where more attributes can be acquired for a cost. To solve the ERM problem, we show that the optimal reject classifier at each stage is a combination of two binary classifiers, one biased towards positive examples and the other biased towards negative examples. We use this parameterization to construct stage-by-stage global surrogate risk, develop an iterative algorithm in the boosting framework and present convergence and generalization results. We test our work on synthetic, medical
Extremely large-scale array (XL-array) has emerged as a promising technology to improve the spectrum efficiency and spatial resolution of future wireless systems. However, the huge number of antennas renders the users more likely to locate in the near-field (instead of the far-field) region of the XL-array with spherical wavefront propagation. This inevitably incurs prohibitively high beam training overhead since it requires a two-dimensional (2D) beam search over both the angular and distance domains. To address this issue, we propose in this paper an efficient two-stage hierarchical beam training method for near-field communications. Specifically, in the first stage, we employ the central sub-array of the XL-array to search for a coarse user direction in the angular domain with conventional far-field hierarchical codebook. Then, in the second stage, given the coarse user direction, we progressively search for the fine-grained user direction-and-distance in the polar domain with a dedicatedly designed codebook. Numerical results show that our proposed two-stage hierarchical beam training method can achieve over 99% training overhead reduction as compared to the 2D exhaustive searc
This paper describes the five development stages of the rope worm, which could be human parasite. Rope worms have been discovered as a result of cleansing enemas. Thousands or people have passed the rope worms from all over the World. Adult stages live in human gastro-intestinal tract and are anaerobic. They move inside the body by releasing gas bubbles utilizing jet propulsion. These worms look like a rope, and can be over a meter long. The development stages were identified based on their morphology. The fifth stage looks like a tough string of mucus about a meter long. The fourth stage looks similar, but the rope worm is shorter and has softer slimier body. The third stage looks like branched jellyfish. The second stage is viscous snot, or mucus with visible gas bubbles that act as suction cups. The first stage is slimier mucus with fewer bubbles, which can reside almost anywhere in the body. Rope worms have cellular structure, based on optical microscopy, DAPI staining and DNA analysis, however, the data collected is not sufficient to identify the specie. Removal methods are also mentioned in the paper.