From flocking birds to schooling fish, organisms interact to form collective dynamics across the natural world. Self-organization is present at smaller scales as well: cells interact and move during development to produce patterns in fish skin, and wound healing relies on cell migration. Across these examples, scientists are interested in shedding light on the individual behaviors informing spatial group dynamics and in predicting the patterns that will emerge under altered agent interactions. One challenge to these goals is that images of self-organization -- whether empirical or generated by models -- are qualitative. To get around this, there are many methods for transforming qualitative pattern data into quantitative information. In this tutorial chapter, I survey some methods for quantifying self-organization, including order parameters, pair correlation functions, and techniques from topological data analysis. I also discuss some places that I see as especially promising for quantitative data, modeling, and data-driven approaches to continue meeting in the future.
In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern Federated Learning. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems at each iteration and that can exploit second-order similarity among individual functions. However, to achieve such communication efficiency, the algorithm requires solving local subproblems sufficiently accurately resulting in slightly sub-optimal local complexity. Inspired by the hybrid-projection proximal-point method, in this work, we propose a novel distributed algorithm S-DANE. Compared to DANE, this method uses an auxiliary sequence of prox-centers while maintaining the same deterministic communication complexity. Moreover, the accuracy condition for solving the subproblem is milder, leading to enhanced local computation efficiency. Furthermore, S-DANE supports partial client participation and arbitrary stochastic local solvers, making it attractive in practice. We further accelerate S-DANE and show that the resulting algorithm achieves the best-known commu
The combination of nonlinear FETI-DP (Dual Primal Finite Element Tearing and Interconnecting) and Quasi-Newton methods using a sequential quadratic programming (SQP) approach is considered. Nonlinear FETI-DP methods are parallel iterative solution methods for nonlinear finite element problems, based on divide and conquer, using Lagrange multipliers. In the method, we use Quasi-Newton approximations of Hessian for the quadratic programs, where the initial approximation uses the exact Hessian. To accelerate the convergence, we recompute the initial Hessian and restart the Quasi-Newton approximation. We provide numerical experiments using homogeneous model problems from nonlinear structural mechanics.
Sinc-collocation methods are known to be efficient for Fredholm integral equations of the second kind, even if functions in the equations have endpoint singularity. However, existing methods have the disadvantage of inconsistent collocation points. This inconsistency complicates the implementation of such methods, particularly for large-scale problems. To overcome this drawback, this study proposes another Sinc-collocation methods with consistent collocation points. The results of a theoretical error analysis show that the proposed methods have the same convergence property as existing methods. Numerical experiments suggest the superiority of the proposed methods in terms of implementation and computational cost.
Sharing diverse genomic and other biomedical datasets is critical to advance scientific discoveries and their equitable translation to improve human health. However, data sharing remains challenging in the context of legacy datasets, evolving policies, multi-institutional consortium science, and international stakeholders. The NIH-funded Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium was established to improve the performance of polygenic risk estimates for a broad range of health and disease outcomes with global impacts. Improving polygenic risk score performance across genetically diverse populations requires access to large, diverse cohorts. We report on the design and implementation of data sharing policies and procedures developed in PRIMED to aggregate and analyze data from multiple, heterogeneous sources while adhering to existing data sharing policies for each integrated dataset. We describe two primary data sharing mechanisms: coordinated dbGaP applications and a Consortium Data Sharing Agreement, as well as provide alternatives when individual-level data cannot be shared within the Consortium (e.g., federated analyses). We also describe technical implem
Random effects meta-analysis is a widely applied methodology to synthetize research findings of studies in a specific scientific question. Besides estimating the mean effect, an important aim of the meta-analysis is to summarize the heterogeneity, i.e. the variation in the underlying effects caused by the differences in study circumstances. The prediction interval is frequently used for this purpose: a 95% prediction interval contains the true effect of a similar new study in 95% of the cases when it is constructed, or in other words, it covers 95% of the true effects distribution on average. In this article, after providing a clear mathematical background, we present an extensive simulation investigating the performance of all frequentist prediction interval methods published to date. The work focuses on the distribution of the coverage probabilities and how these distributions change depending on the amount of heterogeneity and the number of involved studies. Although the single requirement that a prediction interval has to fulfill is to keep a nominal coverage probability on average, we demonstrate why the distribution of coverages cannot be disregarded, and that for small numbe
Standard Virtual Element Methods (VEM) are based on polynomial projections and require a stabilization term to evaluate the contribution of the non-polynomial component of the discrete space. However, the stabilization term is not uniquely defined by the underlying variational formulation and is typically introduced in an ad hoc manner, potentially affecting the numerical response. Stabilization-free and self-stabilized formulations have been proposed to overcome this issue, although their theoretical analysis is still less mature. This paper provides an in-depth numerical investigation into different stabilized and self-stabilized formulations for the p-version of VEM. The results show that self-stabilized and stabilization-free formulations achieve optimal accuracy while suffering from worse conditioning. Moreover, a new projection operator, which explicitly accounts for variable coefficients, is introduced within the framework of standard virtual element spaces. Numerical results show that this new approach is more robust than the existing ones for large values of p.
We introduce a hyperreduced reduced basis element method for model reduction of parameterized, component-based systems in continuum mechanics governed by nonlinear partial differential equations. In the offline phase, the method constructs, through a component-wise empirical training, a library of archetype components defined by a component-wise reduced basis and hyperreduced quadrature rules with varying hyperreduction fidelities. In the online phase, the method applies an online adaptive scheme informed by the Brezzi-Rappaz-Raviart theorem to select an appropriate hyperreduction fidelity for each component to meet the user-prescribed error tolerance at the system level. The method accommodates the rapid construction of hyperreduced models for large-scale component-based nonlinear systems and enables model reduction of problems with many continuous and topology-varying parameters. The efficacy of the method is demonstrated on a two-dimensional nonlinear thermal fin system that comprises up to 225 components and 68 independent parameters.
Geometrical methods in quantum information are very promising for both providing technical tools and intuition into difficult control or optimization problems. Moreover, they are of fundamental importance in connecting pure geometrical theories, like GR, to quantum mechanics, like in the AdS/CFT correspondence. In this paper, we first make a survey of the most important settings in which geometrical methods have proven useful to quantum information theory. Then, we lay down a general framework for an action principle for quantum resources like entanglement, coherence, and anti-flatness. We discuss the case of a two-qubit system.
Models accounting for imperfect detection are important. Single-visit methods have been proposed as an alternative to multiple-visits methods to relax the assumption of closed population. Knape and Korner-Nievergelt (2015) showed that under certain models of probability of detection single-visit methods are statistically non-identifiable leading to biased population estimates. There is a close relationship between estimation of the resource selection probability function (RSPF) using weighted distributions and single-visit methods for occupancy and abundance estimation. We explain the precise mathematical conditions needed for RSPF estimation as stated in Lele and Keim (2006). The identical conditions, that remained unstated in our papers on single-visit methodology, are needed for single-visit methodology to work. We show that the class of admissible models is quite broad and does not excessively restrict the application of the RSPF or the single-visit methodology. To complement the work by Knape and Korner-Nievergelt, we study the performance of multiple-visit methods under the scaled logistic detection function and a much wider set of situations. In general, under the scaled log
Two blind source separation methods (Independent Component Analysis and Non-negative Matrix Factorization), developed initially for signal processing in engineering, found recently a number of applications in analysis of large-scale data in molecular biology. In this short review, we present the common idea behind these methods, describe ways of implementing and applying them and point out to the advantages compared to more traditional statistical approaches. We focus more specifically on the analysis of gene expression in cancer. The review is finalized by listing available software implementations for the methods described.
Understanding the function of complex cortical circuits requires the simultaneous recording of action potentials from many neurons in awake and behaving animals. Practically, this can be achieved by extracellularly recording from multiple brain sites using single wire electrodes. However, in densely packed neural structures such as the human hippocampus, a single electrode can record the activity of multiple neurons. Thus, analytic techniques that differentiate action potentials of different neurons are required. Offline spike sorting approaches are currently used to detect and sort action potentials after finishing the experiment. Because the opportunities to record from the human brain are relatively rare, it is desirable to analyze large numbers of simultaneous recordings quickly using online sorting and detection algorithms. In this way, the experiment can be optimized for the particular response properties of the recorded neurons. Here we present and evaluate a method that is capable of detecting and sorting extracellular single-wire recordings in realtime. We demonstrate the utility of the method by applying it to an extensive data set we acquired from chronically-implanted d
In this work a solver for instationary two-phase flows on the basis of the extended Discontinuous Galerkin (extended DG/XDG) method is presented. The XDG method adapts the approximation space conformal to the position of the interface. This allows a sub-cell accurate representation of the incompressible Navier-Stokes equations in their sharp interface formulation. The interface is described as the zero set of a signed-distance level-set function and discretized by a standard DG method. For the interface, resp. level-set, evolution an extension velocity field is used and a two-staged algorithm is presented for its construction on a narrow-band. On the cut-cells a monolithic elliptic extension velocity method is adapted and a fast-marching procedure on the neighboring cells. The spatial discretization is based on a symmetric interior penalty method and for the temporal discretization a moving interface approach is adapted. A cell agglomeration technique is utilized for handling small cut-cells and topology changes during the interface motion. The method is validated against a wide range of typical two-phase surface tension driven flow phenomena including capillary waves, an oscillati
Mammalian cells have about 30,000-fold more protein molecules than mRNA molecules. This larger number of molecules and the associated larger dynamic range have major implications in the development of proteomics technologies. We examine these implications for both liquid chromatography-tandem mass spectrometry (LC-MS/MS) and single-molecule counting and provide estimates on how many molecules are routinely measured in proteomics experiments by LC-MS/MS. We review strategies that have been helpful for counting billions of protein molecules by LC-MS/MS and suggest that these strategies can benefit single-molecule methods, especially in mitigating the challenges of the wide dynamic range of the proteome. We also examine the theoretical possibilities for scaling up single-molecule and mass spectrometry proteomics approaches to quantifying the billions of protein molecules that make up the proteomes of our cells.
Accurate prediction of RNA three-dimensional (3D) structure remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to scarcity of experimentally determined data, complicates computational prediction efforts. Here, we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pre-trained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate RhoFold+'s superiority over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and inter-helical angles, providing empirically verifiable features that broaden its app
There is a broad need in the neuroscience community to understand and visualize large-scale recordings of neural activity, big data acquired by tens or hundreds of electrodes simultaneously recording dynamic brain activity over minutes to hours. Such dynamic datasets are characterized by coherent patterns across both space and time, yet existing computational methods are typically restricted to analysis either in space or in time separately. Here we report the adaptation of dynamic mode decomposition (DMD), an algorithm originally developed for the study of fluid physics, to large-scale neuronal recordings. DMD is a modal decomposition algorithm that describes high-dimensional dynamic data using coupled spatial-temporal modes; the resulting analysis combines key features of performing principal components analysis (PCA) in space and power spectral analysis in time. The algorithm scales easily to very large numbers of simultaneously acquired measurements. We validated the DMD approach on sub-dural electrode array recordings from human subjects performing a known motor activation task. Next, we leveraged DMD in combination with machine learning to develop a novel method to extract sl
An FFT-based algorithm is developed to simulate the propagation of elastic waves in heterogeneous $d$-dimensional rectangular shape domains. The method allows one to prescribe the displacement as a function of time in a subregion of the domain, emulating the application of Dirichlet boundary conditions on an outer face. Time discretization is performed using an unconditionally stable beta-Newmark approach. The implicit problem for obtaining the displacement at each time step is solved by transforming the equilibrium equations into Fourier space and solving the corresponding linear system with a preconditioned Krylov solver. The resulting method is validated against analytical solutions and compared with implicit and explicit finite element simulations and with an explicit FFT approach. The accuracy of the method is similar to or better than that of finite elements, and the numerical performance is clearly superior, allowing the use of much larger models. To illustrate the capabilities of the method, some numerical examples are presented, including the propagation of planar, circular, and spherical waves and the simulation of the propagation of a pulse in a polycrystalline medium.
The identification of protein-ligand interaction plays a key role in biochemical research and drug discovery. Although deep learning has recently shown great promise in discovering new drugs, there remains a gap between deep learning-based and experimental approaches. Here we propose a novel framework, named AIMEE, integrating AI Model and Enzymology Experiments, to identify inhibitors against 3CL protease of SARS-CoV-2, which has taken a significant toll on people across the globe. From a bioactive chemical library, we have conducted two rounds of experiments and identified six novel inhibitors with a hit rate of 29.41%, and four of them showed an IC50 value less than 3 μM. Moreover, we explored the interpretability of the central model in AIMEE, mapping the deep learning extracted features to domain knowledge of chemical properties. Based on this knowledge, a commercially available compound was selected and proven to be an activity-based probe of 3CLpro. This work highlights the great potential of combining deep learning models and biochemical experiments for intelligent iteration and expanding the boundaries of drug discovery.
Recently developed methods for video analysis, especially models for pose estimation and behavior classification, are transforming behavioral quantification to be more precise, scalable, and reproducible in fields such as neuroscience and ethology. These tools overcome long-standing limitations of manual scoring of video frames and traditional "center of mass" tracking algorithms to enable video analysis at scale. The expansion of open-source tools for video acquisition and analysis has led to new experimental approaches to understand behavior. Here, we review currently available open-source tools for video analysis and discuss how to set up these methods for labs new to video recording. We also discuss best practices for developing and using video analysis methods, including community-wide standards and critical needs for the open sharing of datasets and code, more widespread comparisons of video analysis methods, and better documentation for these methods especially for new users. We encourage broader adoption and continued development of these tools, which have tremendous potential for accelerating scientific progress in understanding the brain and behavior.
We present a computational method for solving the coupled problem of chemical transport in a fluid (blood) with binding/unbinding of the chemical to/from cellular (platelet) surfaces in contact with the fluid, and with transport of the chemical on the cellular surfaces. The overall framework is the Augmented Forcing Point Method (AFM) (\emph{L. Yao and A.L. Fogelson, Simulations of chemical transport and reaction in a suspension of cells I: An augmented forcing point method for the stationary case, IJNMF (2012) 69, 1736-52.}) for solving fluid-phase transport in a region outside of a collection of cells suspended in the fluid. We introduce a novel Radial Basis Function-Finite Difference (RBF-FD) method to solve reaction-diffusion equations on the surface of each of a collection of 2D stationary platelets suspended in blood. Parametric RBFs are used to represent the geometry of the platelets and give accurate geometric information needed for the RBF-FD method. Symmetric Hermite-RBF interpolants are used for enforcing the boundary conditions on the fluid-phase chemical concentration, and their use removes a significant limitation of the original AFM. The efficacy of the new methods a