Towards visually-grounded spoken language acquisition | IEEE Conference Publication | IEEE Xplore