1. Introduction

Python is a modern and friendly language, has clean syntax, good readability, easy interface with external applications, allows quick implementation through the use of scripts, has access to a wide community of developers and has a large collection of scientific libraries. In addition, it supports High Performance Computing (HPC) through integrated or external libraries. A powerful programming environment is provided, combining Python with an interactive shell like IPython, allowing for rapid prototyping. Python's availability comes in compiler packages like Intel or in most supercomputer programming environments. Application programs implemented in languages like Fortran 90 (hereinafter referred to as F90) or C, although they require massive parallel processing, can be encapsulated in the Python environment through modular wrappers, but eventually with some loss of performance. Such flexibility facilitates simulations, analyzes and data visualization [3], including large-scale scientific applications. Therefore, Python provides a friendly and interactive programming environment that is convenient for trial and error, greedy optimization, or other common exploitation schemes. According to the IEEE Spectrum classification, Python is the most popular programming language, followed by Java and C [6].

This work aims to explore some parallelization approaches available in the Python ecosystem on the Santos Dumont (SDumont) supercomputer of the National Scientific Computing Laboratory (LNCC), which includes libraries, frameworks and tools. The intention is to describe and discuss some HPC approaches available in Python programming, such as parallel processing or the use of a Graphics Processing Unit (GPU). The performance of the Python HPC approaches is compared to the corresponding serial and parallel F90 implementations for a specific test problem. Regarding the ease of programming and processing performance, there is an exchange between languages like F90 or C and the Python environment, although the implementation of an application is more difficult than in Python, they are simple to optimize, parallelize and provide better performance. However, today there are many libraries and frameworks that provide HPC capabilities for Python, making it difficult to analyze this trade-off.

In short, this work implements an HPC application in Python, focusing on subsidies for those who are starting in the Python ecosystem, perhaps attracting more users to the HPC environment. It compares the serial and parallel performance of different implementations of a 5-point stencil test problem, related to a 2D heat transfer problem. The implementations were coded in F90 and also in Python using some resources for HPC, and the code is available at https://github.com/efurlanm/bs21. Some more common HPC resources were chosen for this work, as there are countless HPC features available for Python. Except for a GPU implementation, parallelization is achieved using the Message Passing Interface (MPI) communication library [1]. Here are some general considerations about this work to explain its scope in the vast Python environment:

  1. Python environment is very diverse, and Python code can be linked to a multitude of APIs/libraries for HPC and therefore programs can be written in different ways;
  2. The Python implementations in this work include HPC solutions for standard Python, Cython, Numba, Numba-GPU and F2Py, but there are many others;
  3. The Python multiprocessing environment allows parallelization by MPI processes, OpenMP threads or even GPU threads. However, in this work, Python implementations are based on MPI, except for the GPU-based implementation, used to exemplify the use of such a processing accelerator. All implementations were executed on a supercomputer;
  4. The performance results are specific to the selected test problem and its size. Therefore, it can be expected that different applications and problem sizes can lead to a different analysis of processing performance. However, they can help the Python programmer choose a more convenient Python HPC implementation.

This work is a small primer for the use of HPC resources in the Python programming environment, in particular the use on a supercomputer.