Deep learning techniques are drawing more and more attention to Web developers. A lot of Web apps perform inference of deep neural network (DNN) models within Web browsers to provide intelligent services for their users. Typically, GPU acceleration is required during DNN inference, especially on end devices. However, it has been revealed that GPU acceleration in Web browsers has an unacceptably long warm-up time, harming the quality of service (QoS).
To solve the problems, a research team led by Yun MA published their new research on 15 December 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a server precompiling approach named WPIA to reduce DNN warm-up time in Web browsers. The team evaluated WPIA, and the evaluation results show that WPIA can reduce 84.1% of the DNN warm-up time on average and 95.3% at maximum, accelerating DNN warm-up to an order of magnitude faster, with negligible additional overhead.
In the research, they investigate the reason for the long DNN model warm-up time in Web apps and find that compiling WebGL programs into binaries takes most of the time. Inspired by this finding, they propose WPIA, an approach that reduces the DNN warm-up time in Web apps by precompiling WebGL programs offline.
WPIA collects and precompiles WebGL programs at the server side, and fetches and loads the WebGL program binaries at the browser side. WPIA merges WebGL programs to reduce WebGL binaries' size and uses a record-and-replay technique to handle the execution of precompiled WebGL programs.
They evaluate WPIA on four devices and six DNN models, and results show that WPIA can reduce 84.1% of the DNN warm-up time on average and 95.3% at maximum, accelerating DNN warm-up to an order of magnitude faster, with negligible additional overhead.
DOI: 10.1007/s11704-024-40066-w