Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture | IEEE Conference Publication | IEEE Xplore