WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information | IEEE Conference Publication | IEEE Xplore