Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although pythonic idioms are well accepted in the Python community, Python programmers are often faced with many challenges in using them, for example, being unaware of certain pythonic idioms or do not know how to use them properly. Based on an analysis of 7,638 Python repositories on GitHub, we find that non-idiomatic Python code that can be implemented with pythonic idioms occurs frequently and widely. Unfortunately, there is no tool for automatically refactoring such non-idiomatic code into idiomatic code. In this paper, we design and implement an automatic refactoring tool to make Python code idiomatic. We identify nine pythonic idioms by systematically contrasting the abstract syntax grammar of Python and Java. Then we define the syntactic patterns for detecting non-idiomatic code for each pythonic idiom. Finally, we devise atomic AST-rewriting operations and refactoring steps to refactor non-idiomatic code into idiomatic code. We test and review over 4,115 refactorings applied to 1,065 Python projects from GitHub, and submit 90 pull requests for the 90 rand
Python is one of the most commonly used programming languages in industry and education. Its English keywords and built-in functions/modules allow it to come close to pseudo-code in terms of its readability and ease of writing. However, those who do not speak English may not experience these advantages. In fact, they may even be hindered in their ability to understand Python code, as the English nature of its terms creates an additional layer of overhead. To that end, we introduce the task of automatically translating Python's natural modality (keywords, error types, identifiers, etc.) into other human languages. This presents a unique challenge, considering the abbreviated nature of these forms, as well as potential untranslatability of advanced mathematical/programming concepts across languages. We therefore create an automated pipeline to translate Python into other human languages, comparing strategies using machine translation and large language models. We then use this pipeline to acquire translations from five common Python libraries (pytorch, pandas, tensorflow, numpy, and random) in seven languages, and do a quality test on a subset of these terms in French, Greek, and Ben
We present Kosmulator, a modular and vectorised Python framework designed to accelerate the statistical testing of cosmological models. As the theoretical landscape expands beyond standard $Λ$CDM, implementing new expansion histories into traditional Einstein--Boltzmann solvers becomes a significant computational bottleneck. Kosmulator addresses this by leveraging array-native execution and efficient ensemble slice sampling (via Zeus) to perform rapid Bayesian inference. We validate the framework against the industry-standard Cobaya code using a combination of Type Ia Supernovae, Cosmic Chronometers, and Baryon Acoustic Oscillation (BAO) data. Our results demonstrate that Kosmulator reproduces Cobaya's posterior constraints to within $\leq0.3σ$ statistical agreement on $H_{0}$ and $Ω_{m}$ and $<0.6\%$ precision on $χ^{2}$, while achieving a $\sim 4.5\times$ reduction in wall-clock time on a single CPU core compared to a standard MPI-parallelised baseline. Furthermore, we showcase the framework's utility by constraining the implicit power-law $f(Q)$ "$f_1$CDM" model and demonstrating its automated model selection capabilities (AIC/BIC). Kosmulator is introduced as a "scientific s
We explore the possibility of solving Partial Differential Equations (PDEs) using discrete weak formulations. We propose a programming environment for defining a discrete computational domain, introducing discrete functions defined over a set of points, constructing discrete inner products, and introducing discrete weak formulations employing Kronecker delta test functions. Building on this setup, we propose a discrete neural network representation, training the solution function defined over a discrete set of points and employing discrete finite difference derivatives in the automatic differentiation procedures. As a challenging computational model example, we focus on Stokes equations in two-dimensions, defined over a discrete set of points. We train the solution using the discrete weak residual and the Adamax algorithm with discrete automatic differentiation of the discrete gradients. Despite introducing the python environment, we also provide a rigorous mathematical formulation based on discrete weak formulations, proving the well-posedness and robustness of the loss function. The solution of the discrete weak formulations is based on neural network training employing a robust
Large language models (LLMs) are increasingly used by researchers in the social sciences and humanities (SSH) for text analysis, particularly to automate text annotation. However, many researchers still face challenges in adopting LLMs, addressing their limitations, and producing reproducible workflows and results. For example, annotation errors can bias downstream statistical analyses even when apparent accuracy is high. This paper provides a step-by-step methodological guide to using LLMs for text annotation in SSH research, with practical Python and R examples. We explain how LLMs work, how to set up research projects, how to interact with (open-source) LLMs programmatically, how to design and evaluate prompts without overfitting, how to integrate LLM annotations into statistical analyses while accounting for annotation error, and how to manage cost, efficiency, and reproducibility at scale. Throughout, we emphasize intuitive methodological reasoning, concrete examples, and best practices to help researchers incorporate LLM-based annotation into reproducible scientific workflows.
E-graphs have emerged as a versatile data structure with applications in synthesis, optimization, and verification through techniques such as equality saturation. This paper introduces Python bindings for the experimental egglog library (previously called egg-smol), which aims to bring the benefits of e-graphs to the Python ecosystem. The bindings offer a high-level, Pythonic API providing an accessible and familiar interface for Python users. By integrating e-graph techniques with Python, we hope to enable collaboration and innovation across various domains in the scientific computing and machine learning communities. We discuss the advantages of using Python bindings for both Python and existing egg-smol users, as well as possible future directions for development.
We present FairHealth, an open-source Python library that provides a unified, modular framework for trustworthy machine learning in healthcare applications, with particular focus on low-resource and low-income country (LMIC) settings such as Bangladesh. FairHealth addresses four critical gaps in existing healthcare AI toolkits: (1) the absence of integrated fairness auditing for biosignals and clinical tabular data; (2) the lack of privacy-preserving federated learning tools compatible with standard ML workflows; (3) missing explainability tools tailored for low-bandwidth clinical decision support; and (4) no existing toolkit covering Global South healthcare datasets. Built from five peer-reviewed research contributions, FairHealth provides six modules covering federated learning with homomorphic encryption (fairhealth.federated), intersectional fairness metrics (fairhealth.fairness), hybrid fuzzy-SHAP explainability (fairhealth.explain), multilingual dengue triage (fairhealth.lowresource), equitable disaster aid allocation (fairhealth.equity), and public dataset loaders (fairhealth.datasets). All datasets used are publicly available without institutional data use agreements. FairH
As the overlap between traditional computational mechanics and machine learning grows, there is an increasing demand for straight-forward approaches to interface Python-based procedures with C++-based OpenFOAM. This article introduces one such general methodology, allowing the execution of Python code directly within an OpenFOAM solver without the need for Python code translation. The proposed approach is based on the lightweight library pybind11, where OpenFOAM data is transferred to an embedded Python interpreter for manipulation, and results are returned as needed. Following a review of related approaches, the article describes the approach, with a particular focus on data transfer between Python and OpenFOAM, executing Python scripts and functions, and practical details about the implementation in OpenFOAM. Three complementary test cases are presented to highlight the functionality and demonstrate the effect of different data transfer approaches: a Python-based velocity profile boundary condition; a Python-based solver for prototyping; and a machine learning mechanical constitutive law class for solids4foam which performs field calculations.
nsEVDx is an open-source Python package for fitting stationary and nonstationary Extreme Value Distributions (EVDs) to extreme value data. It can be used to model extreme events in fields like hydrology, climate science, finance, and insurance, using both frequentist and Bayesian methods. For Bayesian inference it employs advanced Monte Carlo sampling techniques such as Metropolis-Hastings, Metropolis-adjusted Langevin (MALA), and Hamiltonian Monte Carlo (HMC). Unlike many existing extreme value theory (EVT) tools, which can be complex or lack Bayesian options, nsEVDx offers an intuitive, Python-native interface that is both user-friendly and extensible. It requires only standard scientific Python libraries (numpy, scipy) for its core functionality, while optional features like plotting and diagnostics use matplotlib and seaborn. A key feature of nsEVDx is its flexible support for non-stationary modeling, where the location, scale, and shape parameters can each depend on arbitrary, user-defined covariates. This enables practical applications such as linking extremes to other variables (e.g., rainfall extremes to temperature or maximum stock market losses to market volatility indice
tempdisagg is a modern, extensible, and production-ready Python framework for temporal disaggregation of time series data. It transforms low-frequency aggregates into consistent, high-frequency estimates using a wide array of econometric techniques-including Chow-Lin, Denton, Litterman, Fernandez, and uniform interpolation-as well as enhanced variants with automated estimation of key parameters such as the autocorrelation coefficient rho. The package introduces features beyond classical methods, including robust ensemble modeling via non-negative least squares optimization, post-estimation correction of negative values under multiple aggregation rules, and optional regression-based imputation of missing values through a dedicated Retropolarizer module. Architecturally, it follows a modular design inspired by scikit-learn, offering a clean API for validation, modeling, visualization, and result interpretation.
The Python library FatGHol FatGHoL used in Murri2012 to reckon the rational homology of the moduli space of Riemann surfaces is an example of a non-numeric scientific code: most of the processing it does is generating graphs (represented by complex Python objects) and computing their isomorphisms (a triple of Python lists; again a nested data structure). These operations are repeated many times over: for example, the spaces and are triangulated by 4'583'322 and 747'664 graphs, respectively. This is an opportunity for every Python runtime to prove its strength in optimization. The purpose of this experiment was to assess the maturity of alternative Python runtimes, in terms of: compatibility with the language as implemented in CPython 2.7, and performance speedup. This paper compares the results and experiences from running FatGHol with different Python runtimes: CPython 2.7.5, PyPy 2.1, Cython 0.19, Numba 0.11, Nuitka 0.4.4 and Falcon.
The Python colorspace package provides a toolbox for mapping between different color spaces which can then be used to generate a wide range of perceptually-based color palettes for qualitative or quantitative (sequential or diverging) information. These palettes (as well as any other sets of colors) can be visualized, assessed, and manipulated in various ways, e.g., by color swatches, emulating the effects of color vision deficiencies, or depicting the perceptual properties. Finally, the color palettes generated by the package can be easily integrated into standard visualization workflows in Python, e.g., using matplotlib, seaborn, or plotly.
In this paper we present the new Dune-Python module which provides Python bindings for the Dune core, which is a C++ environment for solving partial differential equations. The aim of this new module is to firstly provide the general infrastructure for exporting realizations of statically polymorphic interfaces based on just-in-time compilation and secondly to provide bindings for the central interfaces of the dune core modules. In the first release we focus on the grid interface. Our aim is to only introduce a thin layer when passing objects into Python which can be removed when the object is passed back into a C++ algorithm. Thus no efficiency is lost and little additional code maintenance cost is incurred. To make the transition for Dune users to the Python environment straightforward the Python classes provide a very similar interface to their C++ counterparts. In addition, vectorized versions of many interfaces allow for more efficient code on the Python side. The infrastructure for exporting these interfaces and the resulting bindings for a Dune grid are explained in detail in this paper for both experienced Dune users and others interested in a flexible Python environment fo
The Epiphany is a many-core, low power, low on-chip memory architecture and one can very cheaply gain access to a number of parallel cores which is beneficial for HPC education and prototyping. The very low power nature of these architectures also means that there is potential for their use in future HPC machines, however there is a high barrier to entry in programming them due to the associated complexities and immaturity of supporting tools. In this paper we present our work on ePython, a subset of Python for the Epiphany and similar many-core co-processors. Due to the limited on-chip memory per core we have developed a new Python interpreter and this, combined with additional support for parallelism, has meant that novices can take advantage of Python to very quickly write parallel codes on the Epiphany and explore concepts of HPC using a smaller scale parallel machine. The high level nature of Python opens up new possibilities on the Epiphany, we examine a computationally intensive Gauss-Seidel code from the programmability and performance perspective, discuss running Python hybrid on both the host CPU and Epiphany, and interoperability between a full Python interpreter on the
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks, integrating 19 detection algorithms and 14 diverse datasets. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. The library is designed with modularity in mind, allowing users to easily adapt and extend components for various use cases. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions. By integrating popular deep learning frameworks, SocialED ensures high efficiency and scalability across both CPU and GPU environments. The library is built adhering to high code quality standards, including unit testing, continuous integration, and code coverage, ensuring that SocialED delivers robust, maintainable software. SocialED is publicly available at \url{https://github.com/RingBDStack/SocialED} and can be installed via PyPI.
Quantum entanglement is a captivating phenomenon in quantum physics, characterized by intricate and non-classical correlations between particles. This phenomenon plays a crucial role in quantum computing and measurement processes. In this hands-on course students explore the dynamics of quantum systems with up to three spins, introducing the fundamental mechanisms by which entanglement evolves and transfers within such systems. Through detailed examples, simulations, and analyses, the hands-on course offers insights into the fundamental principles of entanglement. The provided \code{python} modules reproduce the presented results and serve as a basis for further projects. The target audience of this hands-on course is physics enthusiasts among students in their first semesters.
Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-reference paradigm, has shown to be an effective mechanism for addressing these limitations. Here, we investigate integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows. Dask provides an easy-to-use and flexible framework, but is less optimized for scaling certain data-intensive workflows. We investigate these limitations and detail the technical contributions necessary to develop a robust solution for distributed applications and demonstrate improved performance on synthetic benchmarks and real applications.
In this paper, we implement exponential integrators, specifically Integrating Factor (IF) and Exponential Time Differencing (ETD) methods, using pseudo-spectral techniques to solve phase-field equations within a Python framework. These exponential integrators have showcased robust performance and accuracy when addressing stiff nonlinear partial differential equations. We compare these integrators to the well-known implicit-explicit (IMEX) Euler integrators used in phase-field modeling. The synergy between pseudo-spectral techniques and exponential integrators yields significant benefits for modeling intricate systems governed by phase-field dynamics, such as solidification processes and pattern formation. Our comprehensive Python implementation illustrates the effectiveness of this combined approach in solving phase-field model equations. The results obtained from this implementation highlight the accuracy and computational advantages of the ETD method compared to other numerical techniques.
OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper we introduce OpenML-Python, a client API for Python, opening up the OpenML platform for a wide range of Python-based tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn plugin and a plugin mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem. Source code and documentation is available at https://github.com/openml/openml-python/.
The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems. While Python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming - making it quite challenging to maintain stable and secure Python workflows on a HPC system. In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds for efficiently maintaining multi-user Python software environments, securing and restricting resources of Python jobs and containing Python processes, while focusing on Deep Learning applications running on GPU clusters.