Grounding Conversational Robots on Vision Through Dense Captioning and Large Language Models | IEEE Conference Publication | IEEE Xplore