The Design of a Low-latency Tensor Decomposition Algorithm and VLSI Architecture | IEEE Conference Publication | IEEE Xplore