Loading [a11y]/accessibility-menu.js
Object Detection Using Vision Transformed EfficientDet | IEEE Conference Publication | IEEE Xplore

Object Detection Using Vision Transformed EfficientDet


Abstract:

Computer vision, a subdivision of computer science and artificial intelligence focuses on enabling computers to interpret and analyze visual data from the world, such as ...Show More

Abstract:

Computer vision, a subdivision of computer science and artificial intelligence focuses on enabling computers to interpret and analyze visual data from the world, such as images and videos. Recent advances in convolutional neural networks (CNNs), have improved the performance of computer vision systems remarkably, making them more accurate and efficient than ever before. Object detection using CNNs is a popular application of deep learning in computer vision. There are several popular frameworks for object detection that are widely used in the industry with many using the primary concept of convolution e.g., RetinaNet, EfficientNet, and EfficientDet etc. In this paper, we propose a novel hybrid approach for object detection by combining the power of Vision Transformers (ViT) with state-of-the-art EfficientDet architecture, resulting in a powerful object detection framework. The ViT backbone, known for its success in image classification and natural language processing (NLP) tasks, captures global dependencies in the input image using self-attention mechanisms. By incorporating ViT into the EfficientDet architecture, we enhance its ability to capture fine-grained details and context information, leading to improved object detection accuracy which leverages the strengths of both among other improvements to achieve highly accurate and efficient performance. The training of the model was done using PASCAL VOC 2007 and 2012 datasets and testing was executed on PASCAL VOC 2007 to achieve a mAP of 86.27%.
Date of Conference: 28-31 August 2023
Date Added to IEEE Xplore: 26 December 2023
ISBN Information:

ISSN Information:

Conference Location: Dayton, OH, USA

I. Introduction

Taking in multiple levels of features or depiction of the data, deep learning (DL) has manifested as an efficient machine learning (ML) tool which gives excellent results. Its application demonstrated remarkable performance in a variety of areas, especially in object detection, image classification, and localization. Latest developments of DL methods give positive affirmation to fine-grained image classification which aims to differentiate secondary-level categories [1]. These tasks have become steeply challenging due to its high intra-class and low inter-class variance. Object recognition and classification have witnessed significant advancements with the advent of DL and Deep Convolutional Neural Networks (DCNN). These modern techniques have outperformed traditional ML algorithms such as Support Vector Machines (SVM) [2] and Naive Bayes [3] by leveraging their ability to extract higher-level features directly from raw data.

Contact IEEE to Subscribe

References

References is not available for this document.