Integrating Vision Transformers and Text-to-Speech System for Converting Object Detection Outputs to Audio Descriptions using AI | IEEE Conference Publication | IEEE Xplore