LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling | IEEE Conference Publication | IEEE Xplore