Generating Robot Action Sequences: An Efficient Vision-Language Models with Visual Prompts | IEEE Conference Publication | IEEE Xplore