How to measure deep learning performance?
With the huge success of deep learning in various fields, there is a critical question we need to answer. How to measure deep learning performance? In 2018, NVIDA president and CEO put forward the PLASTER framework to answer it [1].
PLASTER stands for Programmability, Latency, Accuracy, Size of a model, Throughput, Energy efficiency, and Rate of learning.
1. Programmability
There was an explosive growth of size and complexity in traditional machine learning in the past. Similarly, in the diversity of neural network architecture, there is an explosive growth too. It is a challenging problem for experts too that how to select the best neural network architecture.
“After a deep learning model is coded and trained, it is then optimized for a specific runtime inference environment. NVIDIA addresses training and inference challenges with two key tools. For coding, AI-based service developers use CUDA, a parallel computing platform and programming model for general computing on GPUs. For inference, AI-based service developers use TensorRT, NVIDIA’s programmable inference accelerator.”[1]
2. Latency
There is a time between taking actions and receiving requests in humans and machine, which is called latency. In practice, the latency time is often measured in milliseconds. For example, in Siri and Alexa, latency is very important for AI-based services. It is very unnatural that the latency is a few seconds. “Google has stated that 7 milliseconds is an optimal latency target for image-and video-based uses.”[1][2] “While there are no strict laws for response times, Jakob Nielsen’s 0.1 / 1 / 10-second limits are a good guideline. Somewhere between 2 to 10 seconds, people start wondering if the system is still working properly. Users lose the flow of their activities, and that impacts enjoyment, performance, time, and money.”[1][3]
3. Accuracy
Not surprisingly, accuracy is very important in every industry, especially in healthcare.
4. Size of model
The size of the model and capacity of the network among processors have impacts on performance. There are several factors in deep learning, like number of layers, number of neurons per layer, the complexity of computation per layer, and number of connections between a node at one layer and the nodes of next layers.
5. Throughput
“Throughput describes how many inferences can be delivered given the size of the deep learning network created or deployed. Developers are increasingly optimizing inference within a specified latency threshold. While the latency limit ensures good customer experience, maximizing throughput within that limit is critical to maximizing data center efficiency as well as revenue.”[1]
6. Energy efficiency
Energy consumption is a big problem in deep learning. Since deep learning needs more computation resource and time, deep learning is a power-hunger.
7. Rate of learning
“Throughput describes how many inferences can be delivered given the size of the deep learning network created or deployed. Developers are increasingly optimizing inference within a specified latency threshold. While the latency limit ensures good customer experience, maximizing throughput within that limit is critical to maximizing datacenter efficiency as well as revenue.”[1]
Reference:
1. https://www.tiriasresearch.com/wp-content/uploads/2018/05/TIRIAS-Research-NVIDIA-PLASTER-Deep-Learning-Framework.pdf
2. https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf
3. https://www.nngroup.com/articles/response-times-3-important-limits/
Users who have LIKED this post:
2 comments on “How to measure deep learning performance?”
Comments are closed.
PLASTER seems to be a thorough way to evaluate the efficacy and performance of deep learning. While it’s probably more focused on the technical abilities of the innovation, I feel that security should also embedded into the process of developing these technologies. I wonder if someday another S will be added for security (i.e. how secure is the technology)?
Interesting.
I was wondering about impact measurement of deep learning as well.
I agree with Spak7 on the aspect of security though, and that it should be added.