Incorporating Scene Graphs into Pre-trained Vision-Language Models for Multimodal Open-vocabulary Action Recognition | IEEE Conference Publication | IEEE Xplore