Audio2textNet: A Deep Learning Framework for Robust Audio-to-Text Transcription Using CNNs and Transformers | IEEE Conference Publication | IEEE Xplore