2. Material and methods
Raissi et al. [3] published an article about PINNs, which has 7217 citations (December 2023). That work defines PINNs as DNNs trained to solve supervised learning tasks, but complying to physical laws, usually described by nonlinear PDEs. It also describes the use of DNNs to solve PDEs and obtain physics-informed surrogates of the physical model that are fully differentiable in all coordinates and free parameters. PINNs form a new class of data-efficient universal function approximators, which can be effectively trained using small datasets, and which may encode any underlying physical law.
DNN training data can be randomly sampled from observational data, or through simulations using synthetic data from a numerical model. Except for synthetically generated data, as long as a sufficient number of CPs are available, a standard DNN can solve the PDE, otherwise a PINN would be required. A PINN uses a specific loss function incorporating PDE and parameters, in such a way that during the training phase using the set of CPs, the applicable physical law is incorporated [4].
PINNs can be considered neural networks for supervised learning problems, as proposed here. However, PINNs can also be used as agents for Reinforcement Learning (RL) [4]. The most common PINN architectures are Multi-layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Newer architectures are Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN) and Bayesian Deep Learning (BDL) [4]. This work uses the MLP architecture.
The proposed test problem requires the parameters discovery of a particular one-dimensional Burgers' equation, which estimates the speed field \(u\) along time (Equation 1). Training data for the PINN is given by a set of CPs corresponding to the position field in different times are randomly generated within the considered domain.
In the train phase, the neural network then estimates a solution \(u(t,x)\). The function employed by the PINN, \(f(t,x)\) (Equation 2), is derived from the known Burgers' equation, and allows to calculate the loss function. The parameters of the differential operator that we want to obtain are transformed into PINN parameters. In the following equations, the differential operator parameter \(\lambda_1\) (or \(u\)) is the speed of fluid at the indicated spatial and temporal coordinates, the differential operator parameter \(\lambda_2\) (or \(\nu\)) is the kinematic viscosity of fluid, and the subscripts denote partial differentiation in time and space, respectively, as \(u_t\) (which denotes \(\frac{du}{dt}\)), \(u_x\) (which denotes \(\frac{du}{dx}\)), and \(u_{xx}\) (which denotes \(\frac{d^2u}{dx^2}\)).
$$ u_t + \lambda_1 u_x - \lambda_2 u_{xx} = 0, \quad x \in [-1,1], \ t \in [0, 1] \tag{1} $$
The Burgers' equation is employed to evaluate the error \(f\) of the solution \(u(t,x)\) estimated by the PINN, as shown in Equation 2.
$$ f := u_t + \lambda_1 u_x - \lambda_2 u_{xx} \tag{2} $$
In this work, the PINN loss function to be minimized is given by the mean squared error (Equation 3) of two components, \(MSE_u\), which embeds the training data on \(u(t,x)\), and \(MSE_f\), which embeds the structure imposed by Equation 1, where \(t\) is the time step, and \(x\) is the one-dimension coordinate. The neural network parameters, along with the differential operator parameters \(\lambda_1\) and \(\lambda_2\), can be learned by minimizing the MSE.
$$ MSE = MSE_u + MSE_f \tag{3} $$ where $$ MSE_u = \frac{1}{N}\sum_{i=1}^{N}|u(t^i_u, x^i_u)-u^i|^2 $$ and $$ MSE_f = \frac{1}{N}\sum_{i=1}^{N}|f(t^i_u, x^i_u)|^2 $$
The \(\{t^i_u, x^i_u, u^i\}^N_{i=1}\) denotes the training data on \(u(t, x)\), the \(MSE_u\) loss corresponds to the training data in \(u(t, x)\), and the \(MSE_f\) loss imposes the structure of Equation 1 on a finite set of CPs. The number and location of CPs are the same as the training data.
In this work a dataset of 2,000 points generated by the numerical Gaussian Quadrature Method (GQM), using $ \lambda_1 = 1 $ and $ \lambda_2 = 0.01/\pi $, was used to obtain the CPs, that are also used to compare the result obtained through PINN. The GQM method is an iterative numerical algorithm that approximates the definite integral of a function as a weighted sum of the function values at specified points within the domain of integration [5].
When training a PINN, some important adjustable hyperparameters are the number of hidden layers \(N_l(l = 1, 2, ...)\), and the number of neurons in each layer \(N_{le}(e = 1, 2, ...)\). A general understanding about \(N_l\) and \(N_{le}\) is that efficient adjustment is still an unsolved problem and the determination is made empirically [7].
The results obtained in this work using DNN are subject to the problem of overfitting and underfitting. Overfitting means that the DNN performs very well when using training data, but fails as soon as it needs to deal with new data in the problem domain, that is, it does not generalize. Underfitting, on the other hand, means that the model performs poorly on both datasets, i.e., it does not fill the model. Both issues can also negatively affect performance [6].
The Relative L2 Error used in this work is introduced here as defined in Equation 4 where $ | \widehat{U} - U | $ is the L2 norm of the prediction deviation at certain time, and \(\|U\|\) denotes the L2 norm of the synthetic data at that time. \(R_{L2}\) gives good quantification of the prediction accuracy at a certain time [7].
$$ R_{L2} = \frac{| \widehat{U} - U |}{|U|} \tag{4} $$
2.1 PINN Implementation
The specific PINN architecture implemented in this work is an MLP network with an input layer of 2 neurons, a number of hidden layers ranging from 1 to 8, with each hidden layer having a number of neurons ranging from 10 to 30, and a output layer with one neuron. The loss function is the mean squared error (MSE). Minimization of the loss function is performed by an optimization method, the generalized limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm, a quasi-Newton method. All hidden layers employ the hyperbolic tangent as the activation function. The implementation has been configured to stop training when it reaches 50,000 iterations or when the hardware's floating point precision is interfering with the calculated error.
The PINN implementation is based on the work of Raissi et al. (2019) [3] and uses the TensorFlow 1.15 library and the Python 3.7 interpreter. Code snippets of the TensorFlow library are shown in Listing 1 and Listing 2. The code was run on SDumont and uses a V100 GPU.
To obtain the results, first the network is trained until the parameters are obtained, then the prediction is made and compared with the values of the training dataset, which is used both to train the network and compare the results. The implementation does not clearly divide the dataset into training, validation, and testing, however it would be an improvement to be investigated in future work.