Learning online alignments with continuous rewards policy gradient | IEEE Conference Publication | IEEE Xplore