Loading web-font TeX/Main/Regular
High Performance Streaming Tensor Decomposition | IEEE Conference Publication | IEEE Xplore

High Performance Streaming Tensor Decomposition


Abstract:

We present a new algorithm for computing tensor decomposition on streaming data that achieves up to 102\times speedup over the state-of-the-art CP-stream algorithm thro...Show More

Abstract:

We present a new algorithm for computing tensor decomposition on streaming data that achieves up to 102\times speedup over the state-of-the-art CP-stream algorithm through lower computational complexity and performance optimization. For each streaming time slice, our algorithm partitions the factor matrix rows into those with and without updates and keeps them in Gram matrix form to significantly reduce the required computation. We also improve the scalability and performance of the matricized tensor times Khatri-Rao product (MTTKRP) kernel, a key performance bottleneck in many tensor decomposition algorithms, by reducing the synchronization overhead through the combined use of mutex locks and thread-local memory. For problems with constraints (e.g., non-negativity), we apply data blocking and operation fusion to the alternating direction method of multiplier (ADMM) kernel in the constrained CP-stream algorithm. By combining this ADMM optimization with the aforementioned MTTKRP optimization, our improved algorithm achieves up to 47\times speedup over the original. We evaluate the performance and scalability of our new algorithm and optimization techniques using a 56-core quad-socket Intel Xeon system on four representative real-world tensors.
Date of Conference: 17-21 May 2021
Date Added to IEEE Xplore: 28 June 2021
ISBN Information:

ISSN Information:

Conference Location: Portland, OR, USA
No metrics found for this document.

I. Introduction

Sparse tensor decomposition (TD) is a popular method for analyzing multi-way data in applications such as signal processing, topic monitoring, and trend analysis [1]. In many of these areas, data arrives in a streaming fashion over time (e.g., new updates on social media), and this poses two significant challenges in using traditional TD algorithms to analyze the data – the complete data is not available a priori, and the amount of accumulated data grows linearly with time. To address these challenges, a number of streaming TD algorithms have been proposed [2]–[5]. Among these, CP-stream [2] represents the state-of-the-art in terms of execution time, fitting error, and scalability on parallel systems. As such, we use CP-stream as the baseline for comparison against our work presented in this paper.

Usage
Select a Year
2025

View as

Total usage sinceJul 2021:474
01234567JanFebMarAprMayJunJulAugSepOctNovDec226302000000
Year Total:15
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.