Graphical abstract illustrating the CLIP model architecture, consisting of a vision encoder and a text encoder. The vision encoder extracts visual features from images, w...
Abstract:
In recent years, illegal passenger transport in freight trucks has become a critical concern for traffic safety and law enforcement. This study proposes an automated syst...Show MoreMetadata
Abstract:
In recent years, illegal passenger transport in freight trucks has become a critical concern for traffic safety and law enforcement. This study proposes an automated system for detecting illegal passenger transport using an improved CLIP-ILP (Illegal Passenger Detection) model. The proposed model incorporates multi-scale feature fusion, a cross-modal self-attention mechanism, and a more powerful text encoder to enhance detection performance. To evaluate the model, we constructed a comprehensive dataset consisting of tens of thousands of truck images, categorized into two types of trucks (four-wheeled and three-wheeled) and further classified into two subcategories: “illegal passenger transport” and “non-illegal passenger transport.” The model was trained using this dataset, with an emphasis on leveraging the CLIP framework’s ability to understand and integrate visual and textual data. Experimental results demonstrate that the proposed CLIP-ILP model achieves superior accuracy and robustness in detecting illegal passenger transport under various conditions. This research not only highlights the potential of deep learning technologies in enhancing traffic safety but also provides a novel and efficient approach for law enforcement agencies to monitor and address this growing issue effectively. You can access the code for our proposed method at https://github.com/wu-xuan-git/CLIP-ILP.
Graphical abstract illustrating the CLIP model architecture, consisting of a vision encoder and a text encoder. The vision encoder extracts visual features from images, w...
Published in: IEEE Access ( Volume: 13)