Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency | IEEE Conference Publication | IEEE Xplore