Journals & Magazines >IEEE Transactions on Multimedia >Volume: 27

Improving Vision Anomaly Detection With the Guidance of Language Modality

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recent years have seen a surge of interest in anomaly detection. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face signif...Show More

Metadata

Abstract:

Recent years have seen a surge of interest in anomaly detection. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face significant challenges due to redundant information and sparse latent space. In contrast, anomaly detectors demonstrate superior performance in the language modality due to the unimodal nature of the data. This paper tackles the aforementioned challenges for vision modality from a multimodal point of view. Specifically, we propose Cross-modal Guidance (CMG), comprising of Cross-modal Entropy Reduction (CMER) and Cross-modal Linear Embedding (CMLE), to address the issues of redundant information and sparse latent space, respectively. CMER involves masking portions of the raw image and computing the matching score with the corresponding text. Essentially, CMER eliminates irrelevant pixels to direct the detector's focus towards critical content. To learn a more compact latent space for the vision anomaly detection, CMLE learns a correlation structure matrix from the language modality. Then, the acquired matrix compels the distribution of images to resemble that of texts in the latent space. Extensive experiments demonstrate the effectiveness of the proposed methods. Particularly, compared to the baseline that only utilizes images, the performance of CMG has been improved by 16.81%. Ablation experiments further confirm the synergy among the proposed CMER and CMLE, as each component depends on the other to achieve optimal performance.

Published in: IEEE Transactions on Multimedia ( Volume: 27)

Page(s): 1410 - 1419

Date of Publication: 25 December 2024

ISSN Information:

DOI: 10.1109/TMM.2024.3521813

Funding Agency:

Contents

References is not available for this document.

Improving Vision Anomaly Detection With the Guidance of Language Modality

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Improving Vision Anomaly Detection With the Guidance of Language Modality

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?