共找到 20 条结果
While the scientific community traditionally relies on various computational metrics to assess the performance of HPC systems -such as the TOP500 list (based on HPL performance), HPCG, Graph500, IO500- these metrics do not capture how HPC contributes to social progress. We propose a novel approach to follow how the growth of HPC systems and the advances of HPC research address concrete social challenges. The uniqueness of these new metrics lies in their ability to not only measure the capabilities of HPC architectures but also to gauge the concrete social advancements achieved through their use: it focuses on the output of the computation instead of its input. Contrarily to current measure, it also promotes the diversity of machines by evaluating the Pareto front created between size and result. We emphasize the need for dynamic, community-driven metrics that can evolve with emerging social needs.
The fastest supercomputer in 2020, Fugaku, has not only achieved digital transformation of epidemiology in allowing end-to-end, detailed quantitative modeling of COVID-19 transmissions for the first time but also transformed the behavior of the entire Japanese public through its detailed analysis of transmission risks in multitudes of societal situations entailing heavy risks. A novel aerosol simulation methodology was synthesized out of a combination of a new CFD methods meeting industrial demands in the solver, CUBE (Jansson et al., 2019), which not only allowed the simulations to scale massively with high resolution required for micrometer virus-containing aerosol particles but also enabled extremely rapid time-to-solution due to its ability to generate the digital twins representing multitudes of societal situations in a matter of minutes, attaining true overall application high performance; such simulations have been running for the past 1.5°years on Fugaku, cumulatively consuming top supercomputer-class resources and the communicated by the media as well as becoming the basis for official public policies.
With an increase in awareness regarding a troubling lack of reproducibility in analytical software tools, the degree of validity in scientific derivatives and their downstream results has become unclear. The nature of reproducibility issues may vary across domains, tools, data sets, and computational infrastructures, but numerical instabilities are thought to be a core contributor. In neuroimaging, unexpected deviations have been observed when varying operating systems, software implementations, or adding negligible quantities of noise. In the field of numerical analysis, these issues have recently been explored through Monte Carlo Arithmetic, a method involving the instrumentation of floating-point operations with probabilistic noise injections at a target precision. Exploring multiple simulations in this context allows the characterization of the result space for a given tool or operation. In this article, we compare various perturbation models to introduce instabilities within a typical neuroimaging pipeline, including (i) targeted noise, (ii) Monte Carlo Arithmetic, and (iii) operating system variation, to identify the significance and quality of their impact on the resulting derivatives. We demonstrate that even low-order models in neuroimaging such as the structural connectome estimation pipeline evaluated here are sensitive to numerical instabilities, suggesting that stability is a relevant axis upon which tools are compared, alongside more traditional criteria such as biological feasibility, computational efficiency, or, when possible, accuracy. Heterogeneity was observed across participants which clearly illustrates a strong interaction between the tool and data set being processed, requiring that the stability of a given tool be evaluated with respect to a given cohort. We identify use cases for each perturbation method tested, including quality assurance, pipeline error detection, and local sensitivity analysis, and make recommendations for the evaluation of stability in a practical and analytically focused setting. Identifying how these relationships and recommendations scale to higher order computational tools, distinct data sets, and their implication on biological feasibility remain exciting avenues for future work.
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations.
Provisional stenting is the standard treatment for coronary bifurcation lesions, relying on real-time assessment of side-branch (SB) hemodynamics to guide intervention. Functional metrics such as fractional flow reserve (FFR) and instantaneous wave-free ratio (iFR) are used for this purpose. Still, their application in bifurcation lesions is limited by procedural complexity and the lack of preoperative planning tools. We developed a simulation-based machine learning (ML) framework to predict iFR under resting, and FFR under hyperemic conditions. The framework leveraged a synthetic hemodynamic dataset of 252 bifurcation lesions generated from 7 patient-specific geometries using HARVEY, a massively parallel computational fluid dynamics (CFD) solver. Anatomical variability was incorporated using four linear mixed-effects (LME) models to establish robust predictions. Two clinically relevant data-splitting strategies were evaluated: (1) a scenario excluding untreated cases, simulating models blind to new geometries, and (2) a scenario incorporating matched untreated cases, reflecting real-world conditions where preoperative anatomical data is available. Morphological features, including lesion severity, length, and curvature, were systematically varied alongside inherited anatomical parameters like bifurcation angle and side-branch count. Splitting approach 2 demonstrated superior predictive performance, achieving a maximum diagnostic accuracy of 0.847 (AUC: 0.899) for FFR and 0.797 (AUC: 0.874) for iFR. Mixed-effects models effectively account for patient-specific anatomical variability, with Bland-Altman analyzes confirming minimal bias between CFD and ML predictions. Incorporating preoperative anatomical information reduced variability and improved diagnostic accuracy across the studied thresholds. The proposed ML framework offers precise, real-time functional assessments of SB hemodynamics, reducing procedural uncertainty in provisional stenting strategies using only pre-operative lesion-specific features and a precomputed synthetic hemodynamic dataset of 252 bifurcation lesion instances. Using synthetic data sets and patient-specific anatomical insights, this approach paves the way for personalized coronary intervention planning, bridging the gap between computational modeling and clinical applicability.
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.
The fast multipole method (FMM) is an efficient algorithm for calculating electrostatic interactions in molecular simulations and a promising alternative to Ewald summation methods. Translation of multipole expansion in spherical harmonics is the most important operation of the fast multipole method and the fast Fourier transform (FFT) acceleration of this operation is among the fastest methods of improving its performance. The technique relies on highly optimized implementation of fast Fourier transform routines for the desired expansion sizes, which need to incorporate the knowledge of symmetries and zero elements in the input arrays. Here a method is presented for automatic generation of such, highly optimized, routines.
Ensembles of widely distributed, heterogeneous resources, or Grids, have emerged as popular platforms for large-scale scientific applications. In this paper we present the Virtual Instrument project, which provides an integrated application execution environment that enables end-users to run and interact with running scientific simulations on Grids. This work is performed in the specific context of MCell, a computational biology application. While MCell provides the basis for running simulations, its capabilities are currently limited in terms of scale, ease-of-use, and interactivity. These limitations preclude usage scenarios that are critical for scientific advances. Our goal is to create a scientific "Virtual Instrument" from MCell by allowing its users to transparently access Grid resources while being able to steer running simulations. In this paper, we motivate the Virtual Instrument project and discuss a number of relevant issues and accomplishments in the area of Grid software development and application scheduling. We then describe our software design and report on the current implementation. We verify and evaluate our design via experiments with MCell on a real-world Grid testbed.
Integrative biomedical research projects query, analyze, and integrate many different data types and make use of datasets obtained from measurements or simulations of structure and function at multiple biological scales. With the increasing availability of high-throughput and high-resolution instruments, the integrative biomedical research imposes many challenging requirements on software middleware systems. In this paper, we look at some of these requirements using example research pattern templates. We then discuss how middleware systems, which incorporate Grid and high-performance computing, could be employed to address the requirements.
Analysis of large sensor datasets for structural and functional features has applications in many domains, including weather and climate modeling, characterization of subsurface reservoirs, and biomedicine. The vast amount of data obtained from state-of-the-art sensors and the computational cost of analysis operations create a barrier to such analyses. In this paper, we describe middleware system support to take advantage of large clusters of hybrid CPU-GPU nodes to address the data and compute-intensive requirements of feature-based analyses in large spatio-temporal datasets.
This paper describes an integrated, data-driven operational pipeline based on national agent-based models to support federal and state-level pandemic planning and response. The pipeline consists of (i) an automatic semantic-aware scheduling method that coordinates jobs across two separate high performance computing systems; (ii) a data pipeline to collect, integrate and organize national and county-level disaggregated data for initialization and post-simulation analysis; (iii) a digital twin of national social contact networks made up of 288 Million individuals and 12.6 Billion time-varying interactions covering the US states and DC; (iv) an extension of a parallel agent-based simulation model to study epidemic dynamics and associated interventions. This pipeline can run 400 replicates of national runs in less than 33 h, and reduces the need for human intervention, resulting in faster turnaround times and higher reliability and accuracy of the results. Scientifically, the work has led to significant advances in real-time epidemic sciences.
As a theoretically rigorous and accurate method, FEP-ABFE (Free Energy Perturbation-Absolute Binding Free Energy) calculations showed great potential in drug discovery, but its practical application was difficult due to high computational cost. To rapidly discover antiviral drugs targeting SARS-CoV-2 Mpro and TMPRSS2, we performed FEP-ABFE-based virtual screening for ∼12,000 protein-ligand binding systems on a new generation of Tianhe supercomputer. A task management tool was specifically developed for automating the whole process involving more than 500,000 MD tasks. In further experimental validation, 50 out of 98 tested compounds showed significant inhibitory activity towards Mpro, and one representative inhibitor, dipyridamole, showed remarkable outcomes in subsequent clinical trials. This work not only demonstrates the potential of FEP-ABFE in drug discovery but also provides an excellent starting point for further development of anti-SARS-CoV-2 drugs. Besides, ∼500 TB of data generated in this work will also accelerate the further development of FEP-related methods.
The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.
Cloud computing has become more popular in provision of computing resources under virtual machine (VM) abstraction for high performance computing (HPC) users to run their applications. A HPC cloud is such cloud computing environment. One of challenges of energy efficient resource allocation for VMs in HPC cloud is tradeoff between minimizing total energy consumption of physical machines (PMs) and satisfying Quality of Service (e.g. performance). On one hand, cloud providers want to maximize their profit by reducing the power cost (e.g. using the smallest number of running PMs). On the other hand, cloud customers (users) want highest performance for their applications. In this paper, we focus on the scenario that scheduler does not know global information about user jobs and user applications in the future. Users will request shortterm resources at fixed start times and non interrupted durations. We then propose a new allocation heuristic (named Energy-aware and Performance per watt oriented Bestfit (EPOBF)) that uses metric of performance per watt to choose which most energy-efficient PM for mapping each VM (e.g. maximum of MIPS per Watt). Using information from Feitelson's Paralle
International collaboration is sometimes encouraged in the belief that it generates higher quality research or is more capable of addressing societal problems. Nevertheless, while there is evidence that the journal articles of international teams tend to be more cited than average, perhaps from increased international audiences, there is no science-wide direct academic evidence of a connection between international collaboration and research quality. This article empirically investigates the connection between international collaboration and research quality for the first time, with 148,977 UK-based journal articles with post publication expert review scores from the 2021 Research Excellence Framework (REF). Using an ordinal regression model controlling for collaboration, international partners increased the odds of higher quality scores in 27 out of 34 Units of Assessment (UoAs) and all Main Panels. The results therefore give the first large scale evidence of the fields in which international co-authorship for articles is usually apparently beneficial. At the country level, the results suggests that UK collaboration with other high research-expenditure economies generates higher q
For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively develop and share code projects across the globe. As the HPC community faces the challenges associated with guiding researchers from disciplines using high productivity interactive tools to effective use of HPC systems, it seems appropriate to revisit the assumptions surrounding the necessary skills required for access to large computational systems. For over a decade, MIT Lincoln Laboratory has been supporting interactive, on-demand high performance computing by seamlessly integrating familiar high productivity tools to provide users with an increased number of design turns, rapid prototyping capability, and faster time to insight. In this paper, we discuss the lessons learned while supporting interactive, on-demand high performance computing from the perspectives of the users and the team supporting the users and the system. Building on these lessons,
Cloud computing today has now been growing as new technologies and new business models. In distributed technology perspective, cloud computing most like client-server services like web-based or web-service but it used virtual resources to execute. Currently, cloud computing relies on the use of an elastic virtual machine and the use of network for data exchange. We conduct an experimental setup to measure the quality of service received by cloud computing customers. Experimental setup done by creating a HTTP service that runs in the cloud computing infrastructure. We interest to know about the impact of increasing the number of users on the average quality received by users. The qualities received by user measured within two parameters consist of average response times and the number of requests time out. Experimental results of this study show that increasing the number of users has increased the average response time. Similarly, the number of request time out increasing with increasing number of users. It means that the qualities of service received by user are decreasing also. We found that the impact of the number of users on the quality of service is no longer in linear trend.