In many statistical applications, particularly in clinical studies, hypotheses may carry different levels of importance, motivating the use of weighted multiple testing procedures (wMTPs) to control the familywise error rate (FWER). Among these approaches, two weighted Holm procedures are commonly used: the weighted Holm procedure (WHP), which is based on ordered weighted $p$-values, and the weighted alternative Holm procedure (WAP), which relies on ordered raw $p$-values. This paper provides a systematic comparison of these two procedures, along with practical recommendations for their use. We first examine their corresponding closed testing procedures (CTPs) and show that WHP is uniformly more powerful than WAP. We further investigate their structural properties, demonstrating that WAP, while consonant, lacks monotonicity. To facilitate communication with non-statisticians, we introduce graphical representations of both procedures using a common initial graph and distinct updating strategies. In addition, we derive adjusted $p$-values and adjusted weighted $p$-values for both methods. Finally, we establish an optimality result: WHP cannot be improved by enlarging any of its criti
We introduce the class of bilateral parking procedures on the integer line. While cars try to park in the nearest available spot to their right in the classical case, we consider more general parking rules that allow cars to use the nearest available spot to their left. We show that for a natural subclass of local procedures, the number of corresponding parking functions of length $r$ is always equal to $(r+1)^{r-1}$. The setting can be extended to probabilistic procedures, in which the decision to park left or right is random. We finally describe how bilateral procedures can naturally be encoded by certain labeled binary forests, whose combinatorics shed light on several results from the literature.
We consider the problem of finding feasible systems with respect to stochastic constraints when system performance is evaluated through simulation. Our objective is to solve this problem with high computational efficiency and statistical validity. Existing indifference-zone (IZ) procedures introduce a fixed tolerance level, which denotes how much deviation the decision-maker is willing to accept from the threshold in the constraint. These procedures are developed under the assumption that all systems' performance measures are exactly the tolerance level away from the threshold, leading to unnecessary simulations. In contrast, IZ-free procedures, which eliminate the tolerance level, perform well when systems' performance measures are far from the threshold. However, they may significantly underperform compared to IZ procedures when systems' performance measures are close to the threshold. To address these challenges, we propose the Indifference-Zone Relaxation (IZR) procedure, IZR introduces a set of relaxed tolerance levels and utilizes two subroutines for each level: one to identify systems that are clearly feasible and the other to exclude those that are clearly infeasible. We al
A cornerstone of the multiple testing literature is the Benjamini-Hochberg (BH) procedure, which guarantees control of the FDR when $p$-values are independent or positively dependent. While BH controls the average quality of rejections, it does not provide guarantees for individual discoveries, particularly those near the rejection threshold, which are more likely to be false than the average rejection. For independent $p$-values with Uniform$(0,1)$ null distribution, the Support Line procedure (SL; arXiv:2207.07299) provably controls the error probability for the rejection at the edge of the discovery set (i.e. the one with largest $p$-value) at level $q m_0/m$, where $m_0$ is the number of true null hypotheses and $q$ is a tuning parameter. In this work, we study adaptive versions of the SL procedure that operate in two steps: the first step estimates $m_0$ from non-significant statistics, and the second step runs the SL procedure at an adjusted level $q m / \hat{m}_0$. The adaptive procedures are shown to control the false discovery probability for the "boundary'' rejection under an independence assumption. Simulation studies suggest that some but not all of the two-stage proced
We give simple procedures to obtain the left and right keys of a semi-standard Young tableau. Keys derive their interest from the fact that they encode the characters of Demazure and opposite Demazure modules for the general and special linear groups. Given the importance of keys, there are indeed several procedures available in the literature to determine them. In comparison, our procedures are new (to the best of our knowledge) and especially simple. Having said that, we hasten to add that there is nothing new in any individual ingredient that goes into our procedures. These ingredients are all routine, straightforward, and (in any case) occur in the literature. But they never quite seem to have been put together as done here. Our procedures end up repeatedly performing the Deodhar lifts, maximal lifts for the left key and minimal lifts for right key. Together with the well known fact that keys can be obtained by such repeated lifts, this justifies the procedures. The relevance of Deodhar lifts to combinatorial models for Demazure characters is well known in Standard Monomial Theory. Right and left keys appear respectively as initial and final directions of Lakshmibai-Seshadri pa
I study how organisations choose selection procedures in a competitive environment. Two firms compete to hire candidates of unknown productivity from a common pool. Firms simultaneously post a selection procedure which consists of a test and an acceptance probability for each test outcome. After observing the firms' selection procedures, each candidate can apply to one of them. Firms can vary both the accuracy and difficulty of their test. The firms face two key considerations when choosing their selection procedure: the statistical properties of their test and the selection into the procedure by the candidates. I show that there is a unique symmetric equilibrium where the test is maximally accurate but minimally difficult. Intuitively, competition leads to maximal but misguided learning: firms end up having precise knowledge that is not payoff-relevant. In contrast, when firms face capacity constraints or have the possibility of making a wage offer, they use more difficult tests in equilibrium. I also consider asymmetric equilibria where one firm is more selective than another.
This work concerns adaptive refinement procedures for meshes of polygonal virtual elements. Specifically, refinement procedures previously proposed by the authors for structured meshes are generalized for the challenging case of arbitrary element geometries arising in unstructured/Voronoi discretizations. Here, structured and unstructured meshes are considered and are created via Voronoi tessellation of sets of structured and unstructured seed points respectively. The novel mesh refinement procedures for both structured and unstructured meshes allow for accurate and efficient application of the virtual element method to challenging elastic problems in two-dimensions. The results demonstrate that the high efficacy of the proposed refinement procedures on structured meshes, as seen in previous work by the authors, is also achieved in the case of unstructured/Voronoi meshes. The versatility and efficacy of the refinement procedures demonstrated over a variety of mesh types indicates that the procedures are well-suited to virtual element applications.
Surgical procedures are often not "standardised" (i.e., defined in a unique and unambiguous way), but rather exist as implicit knowledge in the minds of the surgeon and the surgical team. This reliance extends to pre-surgery planning and effective communication during the procedure. We introduce a novel approach for the formal and automated analysis of surgical procedures, which we model as security ceremonies, leveraging well-established techniques developed for the analysis of such ceremonies. Mutations of a procedure are used to model variants and mistakes that members of the surgical team might make. Our approach allows us to automatically identify violations of the intended properties of a surgical procedure.
Statistical discoveries are often obtained through multiple hypothesis testing. A variety of procedures exists to evaluate multiple hypotheses, for instance the ones of Benjamini-Hochberg, Bonferroni, Holm or Sidak. We are particularly interested in multiple testing procedures with two desired properties: (solely) monotonic and well-behaved procedures. This article investigates to which extent the classes of (monotonic or well-behaved) multiple testing procedures, in particular the subclasses of so-called step-up and step-down procedures, are closed under basic set operations, specifically the union, intersection, difference and the complement of sets of rejected or non-rejected hypotheses. The present article proves two main results: First, taking the union or intersection of arbitrary (monotonic or well-behaved) multiple testing procedures results in new procedures which are monotonic but not well-behaved, whereas the complement or difference generally preserves neither property. Second, the two classes of (solely monotonic or well-behaved) step-up and step-down procedures are closed under taking the union or intersection, but not the complement or difference.
Ranking and selection (R&S) is a popular model for studying discrete-event dynamic systems. It aims to select the best design (the design with the largest mean performance) from a finite set, where the mean of each design is unknown and has to be learned by samples. Great research efforts have been devoted to this problem in the literature for developing procedures with superior empirical performance and showing their optimality. In these efforts, myopic procedures were popular. They select the best design using a 'naive' mechanism of iteratively and myopically improving an approximation of the objective measure. Although they are based on simple heuristics and lack theoretical support, they turned out highly effective, and often achieved competitive empirical performance compared to procedures that were proposed later and shown to be asymptotically optimal. In this paper, we theoretically analyze these myopic procedures and prove that they also satisfy the optimality conditions of R&S, just like some other popular R&S methods. It explains the good performance of myopic procedures in various numerical tests, and provides good insight into the structure and theoretical d
In applications such as clinical safety analysis, the data of the experiments usually consists of frequency counts. In the analysis of such data, researchers often face the problem of multiple testing based on discrete test statistics, aimed at controlling family-wise error rate (FWER). Most existing FWER controlling procedures are developed for continuous data, which are often conservative when analyzing discrete data. By using minimal attainable $p$-values, several FWER controlling procedures have been specifically developed for discrete data in the literature. In this paper, by utilizing known marginal distributions of true null $p$-values, three more powerful stepwise procedures are developed, which are modified versions of the conventional Bonferroni, Holm and Hochberg procedures, respectively. It is shown that the first two procedures strongly control the FWER under arbitrary dependence and are more powerful than the existing Tarone-type procedures, while the last one only ensures control of the FWER in special settings. Through extensive simulation studies, we provide numerical evidence of superior performance of the proposed procedures in terms of the FWER control and minim
We investigate the performance of a family of multiple comparison procedures for strong control of the False Discovery Rate ($\mathsf{FDR}$). The $\mathsf{FDR}$ is the expected False Discovery Proportion ($\mathsf{FDP}$), that is, the expected fraction of false rejections among all rejected hypotheses. A number of refinements to the original Benjamini-Hochberg procedure [1] have been proposed, to increase power by estimating the proportion of true null hypotheses, either implicitly, leading to one-stage adaptive procedures [4, 7] or explicitly, leading to two-stage adaptive (or plug-in) procedures [2, 21]. We use a variant of the stochastic process approach proposed by Genovese and Wasserman [11] to study the fluctuations of the $\mathsf{FDP}$ achieved with each of these procedures around its expectation, for independent tested hypotheses. We introduce a framework for the derivation of generic Central Limit Theorems for the $\mathsf{FDP}$ of these procedures, characterizing the associated regularity conditions, and comparing the asymptotic power of the various procedures. We interpret recently proposed one-stage adaptive procedures [4, 7] as fixed points in the iteration of well kn
This tutorial provides a comprehensive and in-depth view of the research on procedures, primarily in Natural Language Processing. A procedure is a sequence of steps intended to achieve some goal. Understanding procedures in natural language has a long history, with recent breakthroughs made possible by advances in technology. First, we discuss established approaches to collect procedures, by human annotation or extraction from web resources. Then, we examine different angles from which procedures can be reasoned about, as well as ways to represent them. Finally, we enumerate scenarios where procedural knowledge can be applied to the real world.
The stable marriage problem is a well-known problem of matching men to women so that no man and woman who are not married to each other both prefer each other. Such a problem has a wide variety of practical applications ranging from matching resident doctors to hospitals to matching students to schools. A well-known algorithm to solve this problem is the Gale-Shapley algorithm, which runs in polynomial time. It has been proven that stable marriage procedures can always be manipulated. Whilst the Gale-Shapley algorithm is computationally easy to manipulate, we prove that there exist stable marriage procedures which are NP-hard to manipulate. We also consider the relationship between voting theory and stable marriage procedures, showing that voting rules which are NP-hard to manipulate can be used to define stable marriage procedures which are themselves NP-hard to manipulate. Finally, we consider the issue that stable marriage procedures like Gale-Shapley favour one gender over the other, and we show how to use voting rules to make any stable marriage procedure gender neutral.
Abraham Robinson's framework for modern infinitesimals was developed half a century ago. It enables a re-evaluation of the procedures of the pioneers of mathematical analysis. Their procedures have been often viewed through the lens of the success of the Weierstrassian foundations. We propose a view without passing through the lens, by means of proxies for such procedures in the modern theory of infinitesimals. The real accomplishments of calculus and analysis had been based primarily on the elaboration of novel techniques for solving problems rather than a quest for ultimate foundations. It may be hopeless to interpret historical foundations in terms of a punctiform continuum, but arguably it is possible to interpret historical techniques and procedures in terms of modern ones. Our proposed formalisations do not mean that Fermat, Gregory, Leibniz, Euler, and Cauchy were pre-Robinsonians, but rather indicate that Robinson's framework is more helpful in understanding their procedures than a Weierstrassian framework.
In the context of the cell centered finite volume approach, care must be taken when performing the reconstruction of property gradients at cell interfaces. The present work analyzes three different gradient reconstruction procedures, using three different turbulent simulation test cases, namely the zero-gradient flat plate, the subsonic NACA 0012 airfoil and the transonic OAT15A airfoil. The analysis is concerned mainly with the usage of quadrilateral meshes. The gas dynamics equations are solved using an implicit implementation of Roe's second-order upwind scheme. The RANS closure problem is solved by using the negative Spalart-Allmaras turbulence model. The solution quality of each gradient discretization procedure is analyzed and compared to experimental data and other numerical solutions available in the literature. For the cases considered here, excellent agreement is obtained between the computed solutions and the expected results, regardless of which gradient reconstruction scheme is used.
Guided troubleshooting is an inherent task in the domain of technical support services. When a customer experiences an issue with the functioning of a technical service or a product, an expert user helps guide the customer through a set of steps comprising a troubleshooting procedure. The objective is to identify the source of the problem through a set of diagnostic steps and observations, and arrive at a resolution. Procedures containing these set of diagnostic steps and observations in response to different problems are common artifacts in the body of technical support documentation. The ability to use machine learning and linguistics to understand and leverage these procedures for applications like intelligent chatbots or robotic process automation, is crucial. Existing research on question answering or intelligent chatbots does not look within procedures or deep-understand them. In this paper, we outline a system for mining procedures from technical support documents. We create models for solving important subproblems like extraction of procedures, identifying decision points within procedures, identifying blocks of instructions corresponding to these decision points and mappin
For finite parameter spaces under finite loss, every Bayes procedure derived from a prior with full support is admissible, and every admissible procedure is Bayes. This relationship already breaks down once we move to finite-dimensional Euclidean parameter spaces. Compactness and strong regularity conditions suffice to repair the relationship, but without these conditions, admissible procedures need not be Bayes. Under strong regularity conditions, admissible procedures can be shown to be the limits of Bayes procedures. Under even stricter conditions, they are generalized Bayes, i.e., they minimize the Bayes risk with respect to an improper prior. In both these cases, one must venture beyond the strict confines of Bayesian analysis. Using methods from mathematical logic and nonstandard analysis, we introduce the class of nonstandard Bayes decision procedures---namely, those whose Bayes risk with respect to some prior is within an infinitesimal of the optimal Bayes risk. Among procedures with finite risk functions, we show that a decision procedure is extended admissible if and only if its nonstandard extension is nonstandard Bayes. For problems with continuous risk functions define
Recent advances in vision-language models (VLMs) have achieved impressive results on standard image-text tasks, yet their potential for visual procedure question answering (VP-QA) remains largely unexplored. VP-QA presents unique challenges where users query next-step actions by uploading images for intermediate states of complex procedures. To systematically evaluate VLMs on this practical task, we propose ProcedureVQA, a novel multimodal benchmark specifically designed for visual procedural reasoning. Through comprehensive analysis, we identify two critical limitations in current VLMs: inadequate cross-modal retrieval of structured procedures given visual states, and misalignment between image sequence granularity and textual step decomposition. To address these issues, we present Chain-of-Procedure (CoP), a hierarchical reasoning framework that first retrieves relevant instructions using visual cues, then performs step refinement through semantic decomposition, and finally generates the next step. Experiments across six VLMs demonstrate CoP's effectiveness, achieving up to 13% absolute improvement over standard baselines.
Learning to autonomously execute long-horizon procedures from natural language remains a core challenge for intelligent agents. Free-form instructions such as recipes, scientific protocols, or business workflows encode rich procedural knowledge, but their variability and lack of structure cause agents driven by large language models (LLMs) to drift or fail during execution. We introduce Procedure Aware DynaMic Execution (PADME), an agent framework that produces and exploits a graph-based representation of procedures. Unlike prior work that relies on manual graph construction or unstructured reasoning, PADME autonomously transforms procedural text into executable graphs that capture task dependencies, decision points, and reusable subroutines. Central to PADME is a two-phase methodology; Teach phase, which focuses on systematic structuring, enrichment with executable logic of procedures, followed by Execute phase, which enables dynamic execution in response to real-time inputs and environment feedback. This separation ensures quality assurance and scalability, allowing expert knowledge to be encoded once and reliably reused across varying contexts. The graph representation also prov