vovasz.blogg.se - Efficient processing of deep neural networks

In recent years, due to the landmark performance of DNNs in natural language processing 2 and image recognition 3, they have been widely used in areas such as unmanned vehicles 4, cancer detection 5, and complex decision-making 6 especially in the image domain, where the deep learning-based AlexNet model improves the classification accuracy by a factor of two compared to the traditional algorithm represented by support vector product, thus attracting interest from the image recognition community as well as academia. We implement and extend LeNet, AlexNet, VGG, and ResNet model training for a single MT-2000+ and FT-2000+ compute nodes, as well as extended multi-node clusters, and propose an improved gradient synchronization process for Dynamic Allreduce communication optimization strategy for the gradient synchronization process base on the ARM architecture features of the Tianhe-3 prototype, providing experimental data and theoretical basis for further enhancing and improving the performance of the Tianhe-3 prototype in large-scale distributed training of neural networks.ĭeep Neural Network (DNN) is the foundation of modern Artificial Intelligence (AI) applications 1. The Tianhe-3 peak speed is designed to target E-class, and the huge computing power provides a potential opportunity for DNN training. However, the big data and complex models greatly increase the training overhead of DNN, so accelerating their training process becomes a key task. Due to the increase in computing power, it is possible to improve the feature extraction and data fitting capabilities of DNN networks by increasing their depth and model complexity.