Learning spoken words from multisensory input | IEEE Conference Publication | IEEE Xplore