Show and Speak: Directly Synthesize Spoken Description of Images | IEEE Conference Publication | IEEE Xplore