Overcoming Challenges in Physics-Informed Neural Networks (PINNs): Gradient Optimization for Inverse Problems

Physics-Informed Neural Networks (PINNs) represent a promising approach for solving complex Partial Differential Equations (PDEs) and inverse problems, such as determining hidden parameters of a physical system – for instance, viscosity (nu) in a 2D Burgers equation. However, the application of PINNs presents inherent challenges. A recent study by Wang et al. (2020) delves into these limitations and proposes innovative solutions that can significantly enhance PINN performance.

The Fundamental Problem: Unbalanced Gradients and Stiff Dynamics

PINNs operate by embedding PDE constraints directly into the neural network's loss function. This composite loss function typically includes terms that penalize the PDE residual (Lr), ensuring that physical laws are satisfied, and terms that fit observed data, initial, or boundary conditions (Li) (Wang et al., 2020, p. 3, Equation 4). During training, the neural network adjusts its parameters to minimize this combined loss (Wang et al., 2020, p. 3).

Wang et al. (2020) identified a fundamental mode of failure in PINNs: stiffness in the gradient flow dynamics, which leads to unbalanced back-propagated gradients during model training (Wang et al., 2020, p. 2, 4-5). This means that the gradients from different parts of the loss function can have vastly different magnitudes. For example, the gradients from the PDE residual term (Lr) can be significantly larger than those from the data-fit terms (Li) (Wang et al., 2020, p. 4).

How does this affect nu estimation in the 2D Burgers equation? When attempting to find the viscosity parameter nu (an inverse problem), your PINN's loss function will include a term for the 2D Burgers equation residual (which depends on nu) and terms for fitting any available observational data. If the residual term's gradients dominate, the neural network may become very efficient at satisfying the PDE but neglect the constraints from observational data. This would result in an unsatisfactory fit to real-world measurements and, consequently, an inaccurate estimation of nu. The authors demonstrate that this stiffness is exacerbated when the target solution is more complex or oscillatory (Wang et al., 2020, p. 7). They illustrate this with the Helmholtz equation, where the standard PINN model fails to fit boundary conditions because gradients of the boundary loss (∇θLub(θ)) are "sharply concentrated around zero and overall attain significantly smaller values" than the gradients of the PDE residual loss (Lr(θ)) (Wang et al., 2020, p. 4, Figure 2). If ∇θLub(θ) gradients are very small, the PINN will experience difficulties in fitting the boundary conditions (Wang et al., 2020, p. 4).

Solution 1: Learning Rate Annealing Algorithm for Balanced Learning

To counteract this gradient imbalance, the work proposes an adaptive learning rate annealing algorithm (Wang et al., 2020, p. 8). This adaptive algorithm dynamically adjusts the weights (λi) of the different terms within the composite loss function during training. It uses gradient statistics to compute instantaneous values for these weights, essentially re-scaling the learning rate for each loss term (Wang et al., 2020, p. 9-10, Algorithm 1).

How this helps in nu inference for the 2D Burgers equation: This is a crucial insight for inverse problems. By dynamically adjusting the weights, the algorithm ensures that neither the PDE residual term (highly sensitive to nu) nor the data-fit terms are neglected (Wang et al., 2020, p. 9). This adaptive balancing allows the network to simultaneously learn to satisfy the underlying physics and accurately fit observed data, leading to more robust and precise inference of the viscosity parameter nu. The authors demonstrated that this method, by itself, can improve predictive accuracy by more than an order of magnitude for the Helmholtz problem (Wang et al., 2020, p. 11, Figure 6). The algorithm calculates λ̂i as the ratio between the maximum gradient magnitude of ∇θLr(θn) and the mean of the gradient magnitudes of ∇θLi(θn), and then updates λi using a moving average (Wang et al., 2020, p. 10, Equations 40-41).

Solution 2: A More Resilient Neural Network Architecture

Beyond the training algorithm, the architecture of the neural network itself plays a vital role (Wang et al., 2020, p. 12). The authors introduce a novel neural network architecture designed to be more resilient to gradient pathologies and exhibit less stiffness in the gradient flow dynamics compared to conventional fully-connected networks (Wang et al., 2020, p. 12, 22). This architecture incorporates:

Explicit multiplicative interactions between different input dimensions (Wang et al., 2020, p. 12, Equation 43).
Enhanced hidden states with residual connections (Wang et al., 2020, p. 12, Equations 44-46).

This design leads to a decrease in the largest eigenvalue of the Hessian matrix (σmax(∇2θL(θ))), an indicator of gradient flow stiffness, suggesting more stable training (Wang et al., 2020, p. 13). The proposed architecture, termed Model M3, demonstrated an approximate 3x reduction in the magnitude of the leading Hessian eigenvalue compared to the conventional dense architecture (Model M1) (Wang et al., 2020, p. 13, Figure 11).

How this helps in nu inference for the 2D Burgers equation: Using this enhanced architecture can make your PINN for the Burgers 2D equation more stable and efficient to train, even when dealing with complex flow patterns or parameter regimes that might otherwise cause training instability. A more stable network is better equipped to explore the parameter space effectively, leading to a more reliable and accurate estimation of nu.

The Combined Power: Orders of Magnitude Improvements

The most compelling finding is that the combination of the learning rate annealing algorithm and the improved neural network architecture (Model M4) consistently delivers the most accurate results (Wang et al., 2020, p. 14, 16). This combined approach improved the predictive accuracy of PINNs by a remarkable factor of 50 to 100 times across a range of problems in computational physics (Wang et al., 2020, p. 2, 22).

For your inverse problem of estimating nu in the 2D Burgers equation, this means that by adopting these proposed methodologies (Model M4), you can expect significantly more precise and robust inferences of the viscosity parameter compared to standard PINN formulations (Wang et al., 2020, p. 16, Table 3). For example, in the Klein-Gordon problem, Model M4 achieved a relative L2 error of 2.81e-03, while Model M1 had 1.79e-01, representing nearly two orders of magnitude improvement (Wang et al., 2020, p. 16, Table 3).

Conclusion

The work of Wang et al. (2020) offers crucial insights into the difficulties of PINNs and presents practical and effective solutions. When addressing the inverse problem of identifying the viscosity nu for the 2D Burgers equation using PINNs, it is fundamental to consider:

Gradient pathologies: Unbalanced gradients and stiff dynamics are common challenges (Wang et al., 2020, p. 2, 4-5).
Adaptive loss weighting: Utilize a learning rate annealing algorithm (such as Algorithm 1) to dynamically balance the contributions of the PDE residual and data-fit terms (Wang et al., 2020, p. 9-10).
Improved neural architecture: Employ architectures designed to reduce gradient flow stiffness for more stable training (Wang et al., 2020, p. 13).

By incorporating these advancements, it is possible to unlock the full potential of PINNs, transforming a challenging inverse problem into a more manageable and precise endeavor.

References

Wang, S., Teng, Y., & Perdikaris, P. (2020, January 15). Understanding and mitigating gradient pathologies in physics-informed neural networks. arXiv. Retrieved from https://github.com/PredictiveIntelligenceLab/GradientPathologiesPINNs