Project Details
Projekt Print View

Globally Optimal Neural Network Training

Subject Area Mathematics
Term since 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 463910157
 
The training of artificial neural networks is the central optimization task in deep learning and one of the most important optimization problems in machine learning. Because of its intrinsic computational complexity, local methods like stochastic gradient descent are typically applied. Different methods and initializations exhibit very different behaviors in terms of generalization, robustness (to noise or adversarial perturbations), and explainability (saliency).Because of the fundamental importance of the training problem for neural networks, it thus important and natural to investigate methods that obtain globally optimal solutions to this training problem and study the structure of these solutions. On a technical level, this will enable us to compare the results from local algorithms on small scale networks.In a nutshell, our goals are to: Compute and analyze globally optimal solutions to neural network training problems and study their generalization, explainability, and robustness behavior.In order to achieve the aforementioned goals we will:1. Leverage integer programming methods: We will approach the training problem using techniques from mixed-integer nonlinear programming. The existing methods will be extended and improved for the particular case. In particular, we intend to improve existing solving techniques based on spatial branch-and-cut methods. This requires exploiting both the model and network structure.2. Exploit Symmetry: We will further exploit possible symmetry of the data and network to speed up the solution process and reduce computational burden as well as guaranteeing a symmetric solution. This is important when the the underlying problem exhibits symmetries that has to be captured by neural networks. Then symmetry handling has to be directly incorporated into an exact optimization approach in order to understand the principle possibilities and limits of exploiting these structures. Our goal is to develop methods that will automatically perform symmetry handling in a generic setup that translates to future/other network architectures.3. Ensure Sparsity: We will incorporate true sparsity (in the l0-sense) into neural networks. Sparsification is often approached heuristically, e.g., using iterative thresholding, leading often to suboptimal overall sparsity. However, minimizing or bounding sparsity can directly be incorporated into the mixed-integer nonlinear programming framework.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung