Resource-Efficient Deep Models for Embedded Systems
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Final Report Abstract
Machine Learning (ML) is among the most promising strategies to address learning and reasoning under uncertainty in Artificial Intelligence (AI). The overwhelming majority of recent advances in AI stem from Deep Neural Networks (DNN) on big data, and today’s deep learning algorithms dramatically advance state-of-the-art performance in terms of accuracy for the vast majority of AI tasks. Examples include image and speech processing, with applications as broad as robotics, medicine, autonomous navigation, recommender systems, etc. Still, today the main application domain of ML is the “virtual world“. To address the requirements of upcoming applications such as autonomous navigation for personal transport and delivery services, a transition of ML into the “wild“ is required. This transition requires the processing of complex ML models close to the point of interest, which usually has limited compute capability, unreliable online connectivity and limited battery life. This requires to address the gap in between the tremendous compute requirements of such ML models and the hardware capability. The present project pursued a two-fold approach to this problem. On one hand, methods have been developed that compress existing ML models so that compute and memory requirements are substantially reduced, and thereby a deployment on resource-constrained mobile devices is feasible. Examples for such include work on quantization, which basically reverts from high-precision floating-point operators to low-precision fixed-point ones, but also pruning that introduces sparsity in the models by training certain model weights towards zero. On the other hand, also new ML models have been investigated that exhibit less compute and memory requirements or already include built-in support for compression. One example is a Bayesian Network classifier, which structure is learning during training to find models as small as possible. Another example is a Bayesian Neural Network in which scalar values are replaced by distributions, but later from these distributions well-chosen quantized values are sampled. While the present project did result in various results, in the following we would like to shortly highlight a couple of major insights. It was surprising to see that ARM processors can be competitive to specialized processors such as GPUs and FPGAs, if the software architecture is well-chosen. While this does not mean that ARM processors are ultimately faster or achieve high accuracy, the gap in between those processors can be substantially reduced if a good compression method is chosen. Essentially, this allows that ubiquitously availabe ARM processors can be leveraged more than previously thought. While large parts of the community believe that FPGAs are the best choice for machine learning, it has to be stated that this is only partially true. Comprehensive experiments using different compression techniques on different processors, all with a similar power budget of about 5 Watts, have shown that ultimately GPUs allow for highest accuracy by supporting the largest models. Contrary, FPGAs cannot support such larger models, but they excel in terms of throughput if they can hold a model on-chip. In summary, the need for highest accuracy is most likely today being supported best by GPUs, while highest throughput can be achieved usually using FPGAs. General-purpose processors such as ARM are usually a trade-off in between: while they can achieve the top accuracy of GPUs (given the right software architecture), they are always behind GPUs and FPGAs with regard to throughput. Still, their ubiquitous availabilty can make them an important candidate. Last, we believe that in principle FPGAs are excellently suited for machine learning, however they lack the memory bandwidth required to support models larger than on-chip capacity. It will be interesting to see how the computing landscape changes if for instance 3D die stacking allows FPGA vendors to overcome this bandwidth limitation. Further, we conclude by disclaiming that all previous statements only hold true for CMOS-based processors, and for ML models that are based on the “Deep Learning“ paradigm, i.e. deep convolutional neural networks. Innovation in computer architecture as well as in machine learning might change this situation substantially.
Publications
-
“Resource Efficient Deep Eigenvector Beamforming”. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018
M. Zohrer, L. Pfeifenberger, G. Schindler, H. Fröning, and F. Pemkopf
-
N-Ary Quantization for CNN Model Compression and Inference Acceleration. 2019
G. Schindler, W. Roth, F. Pernkopf, and H. Fröning
-
“Towards Efficient Forward Propagation on Resource-Constrained Systems”. In: Machine Learning and Knowledge Discovery in Databases. Ed. by M. Berlingerio, F. Bonchi, T. Gärtner, N. Hurley, and G. Ifrim. Springer International Publishing, 2019
G. Schindler, M. Zöhrer, F. Pernkopf, and H. Fröning
-
“On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks”. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE Computer Society, 2021, pp. 10297–10304
W. Roth, F. Pernkopf, G. Schindler, and H. Fröning
-
“On the Difficulty of Designing Processor Arrays for Deep Neural Networks”. In: IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning. Ed. by J. Gama, S. Pashami, A. Bifet, M. Sayed-Mouchawe, H. Fröning, F. Pernkopf, G. Schiele, and M. Blott. Springer International Publishing, 2020
K. Stehle, G. Schindler, and H. Fröning
-
“Parameterized Structured Pruning for Deep Neural Networks”. In: Machine Learning, Optimization, and Data Science. Ed. by G. Nicosia, V. Ojha, E. La Malfa, G. Jansen, V. Sciacca, P. Pardalos, G. Giuffrida, and R. Umeton. Springer International Publishing, 2020
G. Schindler, W. Roth, F. Pernkopf, and H. Fröning
-
“Resource-Efficient Neural Networks for Embedded Systems”
W. Roth, G. Schindler, M. Zöhrer, L. Pfeifenberger, R. Peharz, S. Tschiatschek, H. Fröning, F. Pernkopf, and Z. Ghahramani
-
“Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled Densenets”. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020, pp. 806–810
M. Huber, G. Schindler, C. Schörkhuber, W. Roth, F. Pernkopf, and H. Fröning
-
“Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions”. In: Machine Learning and Knowledge Discovery in Databases. Ed. by U. Brefeld, E. Fromont, A. Hotho, A. Knobbe, M. Maathuis, and C. Robardet. Springer International Publishing, 2020
W. Roth, G. Schindler, H. Fröning, and F. Pernkopf
-
“Compressing and Mapping Deep Neural Networks on Edge Computing Systems”. PhD thesis. Heidelberg University, 2021
G. Schindler