PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks | IEEE Conference Publication | IEEE Xplore

PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks


Abstract:

In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuou...Show More

Abstract:

In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.
Date of Conference: 12-14 December 2022
Date Added to IEEE Xplore: 23 March 2023
ISBN Information:
Conference Location: Nassau, Bahamas

Funding Agency:


I. Introduction

In the last few years, there has been a growing attention towards learning problems framed as continual or lifelong [1]. Even if many recent approaches exists in this direction, this setting remains extremely challenging. Applications well-suited for continual learning have access to a continuous stream of data, where an artificial agent is not only expected to use the data to make predictions, but also to adapt to changes in the environment, i.e. videos, stream of texts, etc. [2], [3]. In the case of neural nets, the most challenging context is one in which a simple online update of weights is applied at each time instant, given the information from the last received sample [4]. Despite having access to powerful computational resources for continual learning-based applications, current algorithmic solutions have not been paired with the development of software libraries designed to speed-up computations. In fact, storing and processing portions of the streamed data in a batch-like fashion is the most common approach, reusing classic non-continual learning tools. However, the artificial nature of this approach is striking. Motivated by the intuitions behind existing libraries for batched data [5] and by approaches that rethink the neural network computational scheme making it local in time and along the network architecture [6]–[9], we propose a different approach to pipeline parallelism specifically built for data sequentially streamed over time, where multiple devices work in parallel to speed-up computations. Considering D independent devices, such as D GPUs, the computational time of a feed-forward deep network empowered by our approach theoretically reduces by a factor 1/D. We experimentally show that the existing overheads due to data transfer among different devices are constant with respect to D in certain hardware configurations. On the reverse side, the higher throughput obtained by a pipeline parallelism are associated with a delay between the forward wave and the backward wave while they propagate through the network proportional to D, a feature that is not critical in applications in which data samples are non-i.i.d. and smoothly evolve over time.

Contact IEEE to Subscribe

References

References is not available for this document.