A visual language model for estimating object pose and structure in a generative visual domain | IEEE Conference Publication | IEEE Xplore