Towards Visually Grounded Sub-word Speech Unit Discovery | IEEE Conference Publication | IEEE Xplore