Highly Performant, Deep Neural Networks with sub-microsecond latency on FPGAs for Trigger Applications

2020-01-07T04:47:44Z (GMT) by Christian Schmitt Christian Kahra
Artificial neural networks are becoming a standard tool for data analysis, but their potential remains yet to be widely used for hardware-level trigger applications. Nowadays, high-end FPGAs, as they are also often used in low-level hardware triggers, offer theoretically enough performance to allow for the inclusion of networks of considerable size into these system for the first time. This makes it appear very promising and rewarding to optimize a neural network implementation for FPGAs in the trigger context. We present a bottom-up approach of implementing neural networks on FPGAs. For this, we analyzed how typical NN layers could have their processing, data flow and controlling implemented to not only take the trigger environment constraints into account, i.e. incoming data rates of up to multiple tens of MHz and sub-microsecond latency limits, but to also make very efficient use of the resources of the FPGA. This allowed us to develop a highly optimized neural network implementation framework, which typically reaches 90 to 100 % computational efficiency, requires few extra FPGA resources for data flow and controlling, and allows latencies in the order of 10s to few 100s of nanoseconds for entire (deep) networks. Among the implemented layers are 2D convolutions and pooling (both with multi-channel support) as well as dense layers, all of which play a role in many physics-/detector-related applications. Significant effort needed to be put especially into the 2D convolutional layers, as a fast and simultaneously resourceful implementation is quite challenging. Results are presented for individual layers as well as entire networks. The FPGA implementations of those example networks were automatically generated from trained NN models by our supplementary toolkit, which was built around the optimized layer framework.