Abstract
Real-time in-situ image analytics impose stringent
latency requirements on intelligent neural network inference
operations. While conventional software-based implementations
on GPU-accelerated platforms are flexible and have achieved
very high inference throughput, they are not suitable for latencysensitive
applications where real-time feedback are needed. Here
we demonstrate that high-performance reconfigurable computing
platforms based on field-programmable gate array (FPGA)
processing can successfully bridge the gap between low-level
hardware processing and high-level intelligent image analytics algorithm
deployment within a unified system. The proposed design
performs inference operations on a stream of individual images
as they are produced, and has a deeply pipelined hardware design
that allows all layers of a quantized convolutional neural network
(QCNN) to compute concurrently with partial image inputs.
Using the case of label-free classification of human peripheral
blood mononuclear cell (PBMC) sub-types as a proof-of-concept
illustration, our system achieves an ultra-low classification latency
of 34.2 μs with over 95% end-to-end accuracy by using a
QCNN while the cells are imaged at throughput exceeding
29 200 cells/sec. Our QCNN design is modular and is readily
adaptable to other QCNNs with different latency and resource
requirements.
latency requirements on intelligent neural network inference
operations. While conventional software-based implementations
on GPU-accelerated platforms are flexible and have achieved
very high inference throughput, they are not suitable for latencysensitive
applications where real-time feedback are needed. Here
we demonstrate that high-performance reconfigurable computing
platforms based on field-programmable gate array (FPGA)
processing can successfully bridge the gap between low-level
hardware processing and high-level intelligent image analytics algorithm
deployment within a unified system. The proposed design
performs inference operations on a stream of individual images
as they are produced, and has a deeply pipelined hardware design
that allows all layers of a quantized convolutional neural network
(QCNN) to compute concurrently with partial image inputs.
Using the case of label-free classification of human peripheral
blood mononuclear cell (PBMC) sub-types as a proof-of-concept
illustration, our system achieves an ultra-low classification latency
of 34.2 μs with over 95% end-to-end accuracy by using a
QCNN while the cells are imaged at throughput exceeding
29 200 cells/sec. Our QCNN design is modular and is readily
adaptable to other QCNNs with different latency and resource
requirements.
Original language | English |
---|---|
Number of pages | 13 |
Journal | IEEE Transactions on Neural Networks and Learning Systems |
DOIs | |
Publication status | Published - 12 Jan 2021 |
Bibliographical note
@ARTICLE{9321210,author={M. {Wang} and K. C. M. {Lee} and B. M. F. {Chung} and S. V. {Bogaraju} and H. -C. {Ng} and J. S. J. {Wong} and H. C. {Shum} and K. K. {Tsia} and H. K. -H. {So}},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network},
year={2021},
volume={},
number={},
pages={1-14},
doi={10.1109/TNNLS.2020.3046452}}
Acceptance evidence in manuscript
Keywords
- Low latency inference
- Reconfigurable computing
- FPGA
- Cell image classification
- QCNN
- Multi-ATOM
- Real-time Image Analytics
- Neural network
- Optical Cytometry