Loading [MathJax]/extensions/MathZoom.js
A Feature Selection and Enhanced Reuse Framework for Detection and Classification of Dental Diseases in Panoramic Dental X-Rays Images | IEEE Journals & Magazine | IEEE Xplore

A Feature Selection and Enhanced Reuse Framework for Detection and Classification of Dental Diseases in Panoramic Dental X-Rays Images


DentalDet consists of three parts: (a) Fast Adaptive Feature Extraction Backbone. Its function is to extract image characteristics.This includes fast multilayer selective...

Abstract:

Panoramic dental X-ray images are crucial tools in dental diagnosis, and accurate detection of teeth and related lesions is essential for clinical decision-making. Howeve...Show More

Abstract:

Panoramic dental X-ray images are crucial tools in dental diagnosis, and accurate detection of teeth and related lesions is essential for clinical decision-making. However, the complexity of tooth structures, variability in image quality, and scarcity of annotated data make achieving precise automatic target detection challenging in this field. In this study, a dental X-ray disease target detection framework based on feature selection and enhanced reuse was proposed to address these challenges. An improved convolutional neural network architecture was designed and implemented by combining selective receptive field fusion with shape-sensitive multi-scale feature extraction modules to enhance the ability to detect targets of different scales and shapes. In addition, a novel feature reuse and skip fusion technique was introduced to further improve the utilization of features by the backbone network. To address the current lack of annotation for impacted teeth positions, an impacted tooth location image dataset named DENIMPACT is also presented, which significantly addresses the shortcomings of current deep learning object detection algorithms in the clinical diagnosis of dental impacted teeth on panoramic X-rays. Through experiments on our dataset, our method achieved a significant improvement in detection accuracy, with the mAP50 increasing from 0.410 to 0.631. The experimental results demonstrate that our model achieves state-of-the-art performance in tooth and lesion detection tasks, providing new solutions for the automated analysis of oral medical images. The dataset will be released at: https://github.com/hexiaomo624/DENIMPACT.
DentalDet consists of three parts: (a) Fast Adaptive Feature Extraction Backbone. Its function is to extract image characteristics.This includes fast multilayer selective...
Published in: IEEE Access ( Volume: 13)
Page(s): 70741 - 70751
Date of Publication: 16 April 2025
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Panoramic dental radiography is one of the most frequently conducted examinations in dental clinics. A variety of dental problems, including caries, apical periodontitis, and impacted teeth, can be identified through panoramic dental radiographs [1]. For some patients, it can be time-consuming to identify all dental anomalies on panoramic dental radiographs. Owing to time constraints, dentists may only focus on teeth with symptoms and areas of interest, which could potentially lead to some issues not being detected in time. One way to alleviate this burden is to use computer-assisted techniques to detect anomalies.

It is not a new concept to diagnose dental anomalies through the computational analysis of panoramic dental radiography. Texture analysis has been used for the detection of cavities and cysts in panoramic dental radiography [2]. In another study, a thresholding-based algorithm was proposed to detect root fractures in dental imaging [3]. However, these methods cannot be used for the detection of more dental diseases. In recent years, deep learning technology has become an auxiliary tool for the imaging diagnosis of diseases [4], [5]. Different deep learning models have been developed for quadrant segmentation [6],tooth detection [7], diagnosis of dental problems [1], [8] and treatment planning [9]. Many of these models have achieved satisfactory results.

You Only Look Once (YOLO) is an object detection algorithm that has gained popularity in computer vision. The YOLOv8 algorithm, one of the most powerful latest versions of the YOLO family, introduces various enhancements and optimizations to improve the accuracy and speed of object detection [10]. However, to date, no study has simultaneously detected different dental diseases and performed a more detailed classification of a particular disease. These issues place higher demands on the model’s capabilities.

Multi-angle Fusion is a technique for data or feature fusion across multiple angles or dimensions, aimed at enhancing system performance and accuracy. By acquiring data from different angles, it can provide more comprehensive information [11].Fusing data from various angles can reduce the limitations of data obtained from a single angle. In medical imaging, multi-angle fusion can improve image quality and diagnostic accuracy [12].

Skip Bidirectional Feature Enhancement is a technique used for feature fusion in deep learning networks. By introducing bidirectional skip connections between the encoder and decoder, deep features are effectively transferred from the encoder to decoder, thereby preserving more contextual information and detailed features. This mechanism helps improve the ability of the model to recognize complex scenes, showing significant performance, particularly in tasks such as medical image segmentation [13].

Impacted teeth, dental caries, and periapical lesions are three common types of diseases encountered in dental practice. Among these, impacted teeth have a more complex classification compared to dental caries and periapical lesions. Most articles on object detection and classification of impacted teeth focus mainly on the Winter classification of mandibular impacted teeth [14], [15], [16]. These studies have overlooked two issues. First, they did not consider impacted maxillary teeth. Second, the Winter classification mainly focuses on the angle of the impacted teeth and does not consider the depth of the impacted teeth within the jawbone [17]. For dentists, assessing the depth of impacted teeth is a crucial factor in evaluating the difficulty of surgery [18]. The Pell and Gregory classification is a commonly used method for evaluating the depth of impacted teeth by examining the relationship between the impacted tooth and occlusal plane of the adjacent second molar [19].

Therefore, in this study, a novel object detection framework (DentalDet) was proposed for diagnosis classification, which includes four categories: caries, deep caries, periapical lesions, and impacted teeth. Based on this, a dataset was constructed, in which impacted teeth were categorized into three groups based on the Pell and Gregory classification. Through the above research, we hope to improve the diagnostic capability and accuracy of panoramic dental radiographs for different diseases.

The main contribution are as follow:

  1. A detection framework for panoramic dental X-rays images, called DentalDet, is proposed based on enhanced features and multi-angle fusion: This framework is a one-stage fast object detection technique that improves the detection performance by enhancing non-salient target features and extracting shape-sensitive features from oral images.

  2. A fast adaptive feature extraction backbone has been proposed to extract more detailed features, helping the detector differentiate similar targets: By enhancing the capability to extract shallow semantic information and providing a more adaptable receptive field, the model effectively discriminates between oral background and dental lesions while mitigating interference from additive noise in the background.

  3. A Skip Bidirectional Feature Enhancement is proposed to improve the ability of the intermediate bottleneck layer to combine features at different scales: By integrating high and low level feature information to enhance the model’s learning capability. In the backbone network, as the network becomes deeper, the spatial resolution gradually decreases, whereas the semantic level of the features increases. This method introduces skip connections with learnable weight matrices, allowing the network to learn features flexibly and efficiently on different scales.

  4. A dataset named DENIMPACT has been constructed for the detection of impacted wisdom teeth based on the occlusal plane of the adjacent molar: The dataset divides impacted teeth into three groups based on the height of the occlusal plane adjacent to the second molar, and provides the clinical labels most commonly used in this dataset. This dataset provides richer applications for the intelligent processing of oral images in clinical practice.

This paper is structured as follows. Section II provides a brief introduction to object detection algorithms and their applications in various dental diseases. Section III provides a brief introduction to the datasets used, including the public dataset DENTEX and our dataset called DENIMPACT. Section IV introduces the framework that we used, called DentalDet. Section V presents the experimental results and offers a performance comparison with the previous methods. Finally, Section VI presents the findings and conclusions.

SECTION II.

Related Works

Object detection algorithms have revolutionized the field of computer vision, enabling machines to identify and locate objects in images and videos with remarkable accuracy. With the advent of deep learning techniques, object detection algorithms have undergone revolutionary advancements. Deep learning-based object detection models using convolutional neural networks and transformers are now playing a pivotal role in the evolution of this domain. Object detection algorithms can be primarily divided into three major groups: anchor-based, anchor-free, and transformer-based detectors [20]. Anchor-based methods utilize predefined anchor boxes to predict the location and class of objects, such as Faster R-CNN [21], RetinaNet [22] and single-shot multibox detector [23]. Anchor-free approaches directly predict the keypoints of objects without relying on predefined anchor boxes, like CornerNet,YOLOv5 [24] and YOLOv8 [14]. Transformer-based methods employ self-attention mechanisms to process image sequences, achieving end-to-end object detection, such as Detection Transformer [25] and Vision Transformer [26].

In the task of dental target detection, deep learning plays an important role [27]. C.-W. Li et al. used convolutional neural networks(CNN) on periapical radiograph,and the result show the possibility of automatically identifying and judging the periapical lesions with a success rate of as high as 92.75% [28]. The fast region-based convolutional network (Fast R-CNN) model showed the high accuracy in diagnosing varies dental caries and apical lesions [29]. In the task of dental caries detection, the cascade region-based deep convolutional neural network (cascade R-CNN) model’s average mean average precision (mAP) score was 0.769 [30]. A. Haghanifar et al. proposed a network for detecting caries in panoramic dental X-rays and achieved an accuracy score of 86.05% on the test set [31].

CNNs, one of the main tools for object detection algorithms, can extract many features from abstract filtering layers and are mainly used to process complex and large images [32]. CNN architectures for object detection typically use a combination of convolutional layers and area proposal networks to identify objects and their locations in an image. Popular object detection architectures in the dental field include Faster R-CNN [21], [29] and YOLO [10], [14], [33]. Applications of CNNs in dental imaging can be categorized into the following types: tooth identification [34], periodontal disease [35], dental caries [36], forensic odontology [37] and other applications [38].

In the field of dentistry, YOLOv8 has been employed in some studies. It has been used for the detection and segmentation of radiolucent lesions in the lower jaw [33]. J. George et al. utilized YOLOv8 to detect and classify dental diseases such as cavities, periodontal disease, and oral cancers [10]. A recent study that focused on the detection and classification of impacted teeth was also performed using YOLOv8 [14]. YOLOv8 achieved satisfactory results in the above studies.

However, YOLOv8 still has some limitations. The detection head of YOLOv8 mainly relies on deep features, which may lead to the loss of local detailed information, thereby affecting the detection performance for small objects [39]. For high-noise images, the performance of YOLOv8 is unstable [40]. Various improvement methods can be applied to enhance its performance [41].

The diagnosis of dental caries, periapical lesions, and impacted teeth is important because of their prevalence. Dental caries are highly prevalent, and its prevention requires early detection and treatment. J.-H. Lee et al. utilized 3000 periapical radiographs to detect dental caries [42]. A pretrained GoogleNet Inception v3 CNN network was used for preprocessing, and the datasets were trained using transfer learning. In this study, the accuracy of caries detection was 89% for premolars, 88% for molars, and 82% for premolar-molar combinations. The AUC values were 0.917 for premolars, 0.89 for molars, and 0.845 for premolar-molar combination. Interestingly, deep learning models have been found to be more accurate than dentists at detecting caries lesions on bitewing radiographs [43]. For caries detection in panoramic dental X-rays, a feature pyramid contrastive learning framework, which builds upon the Faster R-CNN architecture, was proposed. The result achieves a significant improvement of 7.7% in the average precision (AP) score compared with existing CNN-based competitors.

Bacterial infection of the root canal system usually results in apical periodontitis which presents as periapical radiolucent lesions in radiographs. Apical periodontitis affects about 33 to 62% of the adult population and it can have detrimental effects on both oral and systemic health [44]. The early detection of periapical lesions on radiographs is essential. R. Ba-Hattab et al. train an AI model based on Faster R-CNN on 713 panoramic radiographs [45]. The model achieved an AP of 74.95%. In another study, two deep learning models, Faster R-CNN and YOLOv4, were applied to detect and classify periapical lesions from periapical radiographs [46]. Both models obtained high sensitivity, specificity, accuracy, and precision for detecting periapical lesions.

Tooth impaction occurs when tooth eruption is halted due to a physical barrier or abnormal tooth position. It is crucial for clinicians to classify impaction and assess the level of difficulty in extracting third molars to determine the best treatment and reduce potential complications [47]. T. Zirek et al. trained a model based on YOLOv8 on panoramic radiographs to detect impacted teeth and classify them according to the Winter classification [14]. An AP of 93.4% was achieved. Another study employed knowledge distillation to enhance the accuracy of the YOLOv5s model in detecting and classifying impacted mandibular third molars by transferring knowledge from the larger YOLOv5x model [15]. The study demonstrates that the distilled YOLOv5s-x model improves the mAP by 2.9% over the original YOLOv5s while maintaining a lightweight model architecture.

SECTION III.

Dataset Introduction

A. DENTEX Introduction

The DENTEX dataset was created by I.E. Hamamci et al. in 2023 to address the issues of tooth anomaly detection, enumeration, and diagnosis in panoramic X-rays [48]. The core research problem of this dataset is to accurately identify anomalous teeth through algorithms and provide corresponding diagnoses to assist in precise treatment planning and reduce errors in clinical operations. The dataset included 693 X-rays annotated with quadrant information only, 634 X-rays annotated with both quadrant and tooth numbering information, and 1005 X-rays fully annotated with quadrant, tooth numbering, and diagnostic information. Additionally, 1571 unannotated X-rays were provided for pretraining purposes. Diagnostic categories included caries, deep caries, periapical lesions, and impacted teeth. The X-rays with diagnostic information were used in this study.

B. Data Acquisition and Preprocessing of DENIMPACT

The dataset comprised of 200 panoramic dental X-rays captured under standard clinical conditions. However, variations in the equipment and imaging protocols used result in different image qualities, reflecting the diversity of clinical practice. These X-rays were sourced from patients aged 18 years or older. All dental radiographs contained at least one impacted tooth and an adjacent second molar. The original images were in the DCM format. All DCM files were opened using Dental Imaging Software (Carestream Health), and images were enhanced to facilitate further identification of image features. They were exported in JPG format, ensuring that the images did not contain any personal patient information during the export process, and were then converted to PNG format using Python. Randomly select 85% of the panoramic dental radiographs as the training dataset and the remaining 15% as the validation dataset. We prepared 30 additional panoramic dental X-rays as the test set. The processing method was the same as that described above.

C. Definition of the Classification of Impacted Third Molars

All impacted teeth detected in DENIMPACT are classified into three categories according to the Pell and Gregory classification [19]. As shown in Figure 1, impacted teeth in the upper and lower jaws were classified into three categories based on the height of the occlusal plane of the adjacent second molar. Class A: occlusal surface of the third molar at the same level or above the occlusal plane; Class B: occlusal surface of the third molar between the occlusal plane and cervical portion of the second molar; Class C: occlusal surface of the third molar below the cervical portion of the second molar. The dataset was labeled using the Labelme, a rectangular box that contains the crowns and roots of the third molars, as shown in Figure 2. The dataset’s annotations are carefully created by a team of dental specialists. A total of 624 impacted teeth were labeled, with 383 labeled as Class A, 152 as Class B, and 89 as Class C.

FIGURE 1. - Classification of Impacted Teeth. (a) Maxillary third molar, ClassA, (b) Maxillary third molar, ClassB, (c) Maxillary third molar, ClassC, (d) Mandibular third molar, ClassA, (e) Mandibular third molar, ClassB, (f) Mandibular third molar, ClassC.
FIGURE 1.

Classification of Impacted Teeth. (a) Maxillary third molar, ClassA, (b) Maxillary third molar, ClassB, (c) Maxillary third molar, ClassC, (d) Mandibular third molar, ClassA, (e) Mandibular third molar, ClassB, (f) Mandibular third molar, ClassC.

FIGURE 2. - Label example diagram.
FIGURE 2.

Label example diagram.

SECTION IV.

Method

In response to the difficulty in detecting dental diseases in panoramic dental X-ray images, this study proposes a high-precision tooth disease detection framework called DentalDet, whose overall architecture is shown in Figure 3. DentalDet is an improved model based on YOLOv8 that is specifically designed to handle the characteristics of panoramic dental X-ray images. It consists of three main components: a fast adaptive feature extraction backbone, region selection connection feature enhancement, and a region dense feature detection head. First, the X-ray image was entered into the fast feature extraction backbone network to extract five different-sized feature maps. Subsequently, for better multiscale detection, DentalDet uses bidirectional skip connection feature enhancement to fuse the extracted feature maps. Finally, by refining the input features through the region dense detection head, it determines the positions of tooth diseases in different images.

FIGURE 3. - DentalDet consists of three parts: (a) Fast Adaptive Feature Extraction Backbone. Its function is to extract image characteristics.This includes fast multilayer selective feature fusion and shape sensitive convolution. (b) Skip Bidirectional Feature Enhancement. It is designed to integrate contextual information. (c) Regional optimization detectors. Its purpose is to complete the output of detection.
FIGURE 3.

DentalDet consists of three parts: (a) Fast Adaptive Feature Extraction Backbone. Its function is to extract image characteristics.This includes fast multilayer selective feature fusion and shape sensitive convolution. (b) Skip Bidirectional Feature Enhancement. It is designed to integrate contextual information. (c) Regional optimization detectors. Its purpose is to complete the output of detection.

A. Fast Adaptive Feature Extraction Backbone

Owing to the low contrast of oral disease lesions, teeth, and other elements in panoramic dental X-ray images, it is extremely difficult to accurately distinguish between targets and backgrounds. To address this issue, a new Fast Adaptive Feature Extraction Backbone (FAFEB) was proposed. The Backbone Network is the core part of a deep neural network and is primarily responsible for extracting features from input data. The workflow of FAFEB consists of three layers, as shown in Figure 3. FAFEB enhances the ability to extract shallow semantic information and provides a more flexible receptive field, allowing for the dynamic adjustment of the receptive field size based on the characteristics of the input data. This enables the model to effectively differentiate between the oral background and target objects, such as tooth lesions, while resisting interference from additive noise in the background. Meanwhile, through the Fast Multi-layer Selective Feature Fusion module (FMSFF), the network can obtain richer feature expressions with the same parameter volume. These fused features are then sent to the feature fusion section to strengthen the network’s unique representation of the lesions.

1) Fast Multi-Layer Selective Feature Fusion Module

In the task of target detection in panoramic dental X-ray images, traditional feature extraction methods are no longer sufficient because of the unique characteristics of panoramic dental X-ray images, such as the similarity between the foreground and background and interference from background noise.

Therefore, this paper proposes an innovative fast multilayer selective feature fusion extraction module. This module adopts an Efficient Convolution Operator and multi-branch grouped convolution and solves the aforementioned problems through a structured design. Specifically, a flexible quantized convolution operator called Variable Quantization Kernel (VQK) is introduced in the distribution shifting convolution [49] (DSconv) branch, which can replace traditional standard convolution operations without retraining the model. This technology allows us to achieve an accuracy close to the state-of-the-art results with less than 1% loss when using only 4-bit quantization. DSConv consists of two key parts: the VQK and the distribution offset. VQK reduces memory consumption and improves computational speed by storing only integer values, while the distribution offset maintains the same output effect as the original convolution by applying kernel-based and channel-based offsets. As a result, both accuracy and computational resources can be significantly saved. In dental X-ray analysis, this mechanism is particularly important for handling variations in image brightness, contrast differences, and noise interference. For example, X-ray images captured by different devices may have inconsistent exposure problems, and this module can help the model adapt better to these changes, thereby improving the consistency and reliability of the detection results. Finally, this overall structured design enables flexible response to various task requirements in dental X-ray analysis.

2) Shape Sensitive Convolution

The Shape-Sensitive Convolution proposed in this study is a variant of CNNs that enhances the sensitivity of the model to shape features by introducing receptive field regions for shape information. In traditional convolution operations, the convolution kernel slides over the entire input space and performs the same convolution operations at each position. However, this approach ignores the shape information in the input data, which can lead to performance degradation when dealing with complex shapes. To address this issue, Shape-Sensitive Convolution introduces shape information into convolution operations. Specifically, it uses an additional offset encoder to capture the shape features of input data. The output of the shape encoder is used as a weight for the convolution kernel, allowing the adjustment of the convolution operation based on the shape features of the input data. Based on receptive field positions with offsets, this type of convolution can better capture local shape information in the input data and improve performance when dealing with data with complex shapes. In addition, Shape-Sensitive Convolution can be combined with existing CNN structures to enhance the overall model performance.

B. Skip Bidirectional Feature Enhancement

Skip Bidirectional Feature Enhancement is a technique applied in deep neural networks that aims to enhance the learning capability of models by combining high-level and low-level feature information. In the backbone network, as the network becomes deeper, the spatial resolution gradually decreases, while the semantic level of the features increases. Low-level features contain more spatial detail information, whereas high-level features contain more semantic information. The purpose of Skip Bidirectional Feature Enhancement is to establish a bidirectional connection between different levels, allowing bottom-level detailed information to complement top-level semantic information and thereby improve the feature representation ability. In forward fusion, assuming that the input image is I, it first undergoes a series of convolutional layers for the feature extraction. Let $F_{i}$ denote the feature output in layer i. In a traditional Feature Pyramid Network (FPN), each $F_{i}$ depends only on the features of the previous layer, namely $F_{i-1}$ .\begin{equation*} F_{i}=f(F_{i-1};W_{i}) \tag {1}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $f(\cdot)$ is a function that includes convolution operations and other non-linear operations, $W_{i}$ represents the weight parameters of the i-th layer. Skip Bidirectional Feature Enhancement introduces skip connections, allowing the output of a certain layer not only to depend on its immediate predecessor’s output, but also directly incorporate information from earlier layers by skipping several intermediate layers. Assuming that the network needs to enhance the features of a particular layer (denoted as $F_{L}$ ), and in the FPN, its corresponding inverted position is denoted as $F'_{L}$ , which can be expressed as a linear combination:\begin{equation*} \sum F_{e}=f(F_{li-1};W_{i},F'_{li-1},F_{lj}), \tag {2}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $F_{e}$ represents the enhanced feature map, $F_{li-1}$ and $F'_{li-1}$ represent a pair of corresponding features, $W_{i}$ represents the weight for each fusion operation, and $F_{lj}$ represents the feature map from the other specified layers. The purpose is to introduce more effective information into the subsequent detection network. Compared with traditional forms of feature fusion such as FPN or Path Aggregation Network (PAN), this method can better stimulate the feature representation of the detector for non-salient objects.

C. Regional Optimization Detectors

Regional Optimization Detectors (ROD) are specially designed network modules that are used to enhance target detection performance in panoramic dental X-ray images. This module is located in the detection head part of the entire network, and its core function is to extract targets with high similarity to the background by finely dividing the feature map input into the detection head. This process relies on the powerful feature extraction ability of the previous backbone network (FAFEB), particularly its ability to capture small or non-significant targets. The FAFEB backbone network can provide high-quality multiscale feature representations for ROD, thereby enhancing the performance of subsequent detection tasks.

The core idea of the ROD module is to improve the accuracy and robustness of the object detection through regional optimization. Specifically, it divides the input feature map into multiple local regions and performs detection independently for each region. This denser grid division method enables the detection head to provide more refined results, particularly when dealing with nonsignificant objects in complex backgrounds. For example, if the size of the input feature map is $H \times W$ , ROD divides it into $m \times n$ local regions where the size of each region is $\frac {H}{m} \times \frac {W}{n}$ . For each local region $R_{ij}$ , ROD calculates the optimized feature representation using the following formula:\begin{equation*} F_{\text {opt}}(R_{ij}) = \sum _{k=1}^{K} w_{k} \cdot \phi _{k}(R_{ij}). \tag {3}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Among them, $\phi _{k}(R_{ij})$ represents the feature extraction result of the k-th layer feature map in the region $R_{ij}$ , $w_{k}$ is the corresponding weight parameter, and K is the total number of feature layers. In this way, ROD can not only better capture the subtle features of the target, but also effectively reduce the sensitivity to background noise.

This design makes full use of the multilevel feature information provided by the FAFEB backbone network, thereby ensuring the efficiency and robustness of the detection system.

The loss function in the detector is composed of multiple parts. The classification loss ($loss_{cls}$ ) and object($loss_{obj}$ ) loss are binary cross entropy functions using Sigmoid function, and the bounding box loss($loss_{bbox}$ ) is cIoU. The overall loss function calculation can be expressed as follows:\begin{equation*} Loss = loss_{cls}+loss_{obj}+loss_{bbox} \tag {4}\end{equation*} View SourceRight-click on figure for MathML and additional features.

SECTION V.

Experimental Results

A. Experiment Setting

For the hardware configuration, an Nvidia RTX4090 GPU (24GB) was used, along with an Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz. On the software side, the respective versions utilized were Python 3.11, CUDA 11.7, and PyTorch 2.0. The system operated under the Linux environment.

In the part of hyperparameters, in the network training proposed in this paper, the image size is 640, the number of iterations is 300, the batch size is 16, the number of workers is 8, the momentum parameter is 0.9, and the optimizer is Adam.

B. Evaluation Metrics

To quantitatively assess the detection performance of the proposed model, AP, mAP, mAP at a 50% Intersection over Union (IOU) threshold (mAP50), and mAP across IOU thresholds from 50% to 95% (mAP50-95) are utilized. For each category, precision values were calculated at different levels of recall and the average of these precision values was computed to obtain the AP for that category. The mean of the AP values across all categories was calculated to obtain mAP. mAP is a metric used to measure the accuracy of a model’s detection results. In addition, metrics such as floating-point operations per second, weight of each model, number of parameters, and frames per second were used to evaluate the size and efficiency of the model. For the detection of dental diseases, high precision must be considered to ensure diagnostic accuracy and high recall, to reduce missed diagnoses. The AP and mAP help evaluate the model’s performance under these requirements. The calculation formulas for AP and mAP are shown in Eq. 5 to Eq. 8:\begin{align*} P & = \frac {TP}{TP + FP} \tag {5}\\ R & = \frac {TP}{TP + FN} \tag {6}\\ AP & = \int _{0}^{1} P(R) \, dR~ \tag {7}\\ mAP & = \frac {1}{N} \sum _{i=1}^{N} AP(i) \tag {8}\end{align*} View SourceRight-click on figure for MathML and additional features.

$ TP $ stands for true positive examples, referring to the number of correctly detected targets. $ FP $ stands for false positive examples, referring to the number of incorrectly detected targets. $ FN $ stands for false negative examples, referring to the number of missed targets. In Eq. 5, P represents precision rate. In Eq. 6, R represents the recall rate. N represents the total number of categories in Eq. 8.

C. Quantitative Experiment Results

To validate the effectiveness of the proposed method, it was compared with other mainstream object detection algorithms on the publicly available DENTEX dataset and our own dataset named DENIMPACT. The performances on the validation sets are listed in Table 1 and Table 2, respectively. Table 1 shows the performance of different models on the DENTEX dataset for detecting caries, deep caries, periapical lesions, and impacted teeth. Table 2 presents the performance of the different models on the DENIMPACT dataset for the classification of impacted teeth. The results of the different models were based on our evaluations. It can be observed that on the public dental diagnosis dataset DENTEX, our method achieved an mAP50 of 0.631 and an mAP50-95 of 0.431. This indicates that our proposed method is capable of accurately diagnosing a range of issues in panoramic dental X-ray images, thereby facilitating correct decision-making during clinical treatment. Furthermore, the precision and recall rates of our algorithm (0.630) also surpass many excellent object detection algorithms, such as YOLOv8 and YOLOx, demonstrating the effectiveness of our optimization for panoramic dental X-ray images.

TABLE 1 Validation on DENIMPACT Dataset. Comparison Results of Our DentalDet with$(w)$ and without($w/o$ ) Ast Adaptive Feature Extraction Backbone(FAFE), with$(w)$ and Without($w/o$ ) Skip Bidirectional Feature Enhancement(SBFE)
Table 1- Validation on DENIMPACT Dataset. Comparison Results of Our DentalDet with$(w)$
 and without($w/o$
) Ast Adaptive Feature Extraction Backbone(FAFE), with$(w)$
 and Without($w/o$
) Skip Bidirectional Feature Enhancement(SBFE)
TABLE 2 Validation on DENIMPACT Dataset. Comparison Results of Baseline with$(w)$ and without($w/o$ ) Skip Bidirectional Feature Enhancement(SBFE), Path Aggregation Network(PAN), Shape Sensitive Convolution(SSC) and Regional Optimization Detectors(ROD)
Table 2- Validation on DENIMPACT Dataset. Comparison Results of Baseline with$(w)$
 and without($w/o$
) Skip Bidirectional Feature Enhancement(SBFE), Path Aggregation Network(PAN), Shape Sensitive Convolution(SSC) and Regional Optimization Detectors(ROD)

In addition, when testing with the DENIMPACT dataset proposed in this study, our method achieved an mAP50 of 0.844 and an mAP50-95 of 0.614. Compared to the best-performing algorithms, YOLOv8 and YOLOX, our method significantly outperformed the other methods. This indicates that our proposed method can provide more accurate classification results for impacted teeth, assisting clinicians in further comprehensive evaluation of the difficulty in extracting impacted teeth.

D. Qualitative Analysis

The detection results of different algorithms are visualized, as shown in Figure 4 and Figure 5: Figure 4 presents the classification detection results for impacted teeth, while Figure 5 shows the diagnosis results for oral diseases. From the classification results of impacted teeth, it can be observed that other methods have varying degrees of false positives or false negatives, whereas our proposed method is able to accurately and comprehensively detect different categories of impacted teeth. This also demonstrates that our proposed method can overcome the detection noise caused by background interference and highly similar targets in dental X-ray images, enabling the fast and accurate localization of different types of impacted teeth. From the diagnosis results for oral diseases, it can be observed that our proposed method is capable of accurately identifying the majority of oral health issues. Owing to the presence of more potential interferences in oral disease diagnostic data (such as fillings, missing teeth, and background noise), our algorithm performs exceptionally well in resisting variations.

FIGURE 4. - The detailed structural schematic diagram of “Fast Multi-layer Selective Feature Fusion”, this module is composed of two branches, which can help the network extract more useful features better.
FIGURE 4.

The detailed structural schematic diagram of “Fast Multi-layer Selective Feature Fusion”, this module is composed of two branches, which can help the network extract more useful features better.

FIGURE 5. - Schematic diagram of the shape sensitive convolution structure. Using the DCN module can help extract the irregular features of teeth and improve the detection efficiency.
FIGURE 5.

Schematic diagram of the shape sensitive convolution structure. Using the DCN module can help extract the irregular features of teeth and improve the detection efficiency.

FIGURE 6. - The structure diagram of the core ROGG module in Regional Optimization Detectors. Using this denser gridding method can help the network capture detailed information.
FIGURE 6.

The structure diagram of the core ROGG module in Regional Optimization Detectors. Using this denser gridding method can help the network capture detailed information.

FIGURE 7. - Qualitative comparison of the proposed method with other object detection methods on the DENIMPACT. False positives and misclassifications are marked with a bold blue box, while false negatives are marked with a bold green box.
FIGURE 7.

Qualitative comparison of the proposed method with other object detection methods on the DENIMPACT. False positives and misclassifications are marked with a bold blue box, while false negatives are marked with a bold green box.

FIGURE 8. - Qualitative comparison of the proposed method with other object detection methods on the DENTEX dataset. False positives and misclassifications are marked with a bold blue box, while false negatives are marked with a bold green box.
FIGURE 8.

Qualitative comparison of the proposed method with other object detection methods on the DENTEX dataset. False positives and misclassifications are marked with a bold blue box, while false negatives are marked with a bold green box.

E. Ablation Experiment

To further validate the effectiveness of the various modules used in DentalDet, ablation experiments were conducted using the DENTEX dataset. The performance indicators of each model are shown in Table 3. In the experiments conducted on the baseline (YOLOv8) without any improvements, both mAP50 and mAP50-95 reached only 0.410 and 0.262, respectively. However, after adding the FAFE and SBFE modules, these metrics increased to (0.553, 0.373) and (0.625, 0.421) respectively. This also demonstrates that our proposed methods improved the detection performance of panoramic dental X-ray images to varying degrees. When using our algorithm for experimentation, it was observed that it achieved the highest scores owing to DentalDet’s utilization of multiple feature selection and reuse techniques, which enhances the network’s ability to utilize input features and improve detection accuracy, thus proving the effectiveness of this method.

TABLE 3 Comparison of Different Models in Oral Disease Detection (DENTEX Dataset)
Table 3- Comparison of Different Models in Oral Disease Detection (DENTEX Dataset)
TABLE 4 Comparison of Different Models in Impacted Tooth Detection and Classification (DENIMPACT Dataset)
Table 4- Comparison of Different Models in Impacted Tooth Detection and Classification (DENIMPACT Dataset)

Furthermore, we conducted an in-depth analysis of the impact of the SBFE (Spatial Boundary Feature Enhancement), SSC (Semantic Spatial Calibration), and ROD (Region Optimization Detection) modules when used individually on the network performance. The experimental results are shown in Table 30, and these data clearly demonstrate the specific improvement effects of each module on model performance. First, we observed that when only using the PAN (Pyramid Attention Network) structure for feature fusion, the mAP50 (mean Average Precision at IoU threshold 0.5) of the network was only 0.41. However, when the SBFE module proposed in this study was introduced, mAP50 increased significantly to 0.574, with an increase of 0.164. This result indicates that the SBFE module has significant advantages in enhancing the spatial boundary features and can effectively improve the model’s ability to detect target boundaries. Similarly, the SSC and ROD modules demonstrated strong performance improvement. Specifically, the SSC module optimized the spatial distribution of the feature map through the semantic spatial calibration mechanism, increasing the mAP of the baseline by 0.046; while the ROD module further improved the positioning accuracy of the model by optimizing the detection strategy of the target region, increasing the mAP of the baseline by 0.127. These results fully prove the effectiveness of SSC and ROD modules in their respective fields. Overall, the above experimental results not only verify the significant improvement effect of the SBFE, SSC, and ROD modules on the network performance but also further illustrate the superiority of the method proposed in this paper in oral X-ray image analysis. Through the synergy of these modules, the model can extract the key features more effectively and achieve precise target positioning. This lays a solid foundation for subsequent practical applications in stomatology.

SECTION VI.

Conclusion

The intelligent detection and diagnosis of panoramic dental X-ray images are becoming increasingly prevalent in dental medicine. However, there is a lack of effective methods for extracting the features of oral data from these images. Additionally, the classification of impacted teeth is more complex than that of carious and periapical lesions. Most existing studies on object detection and classification of impacted teeth primarily focus on mandibular impacted teeth. To address these challenges, this study introduces an efficient panoramic dental X-ray target detection algorithm named DentalDet. This method employs a Fast Adaptive Feature Extraction Backbone to provide a more robust feature extraction, further enhances feature utilization through Skip Bidirectional Feature Enhancement, and learns key distributions within the features to mitigate the issue of missing targets, which is common in panoramic dental X-ray images. Finally, it optimizes region proposals to output the final detected targets. Through experiments on our dataset, our method achieved a significant improvement in detection accuracy, with mAP50 increasing from 0.410 to 0.631. This study also proposes a dataset designed for detecting impacted teeth based on the occlusal plane of the adjacent second molar. The dataset classifies impacted teeth according to their positions relative to the height of the occlusal plane of the adjacent second molar, thereby effectively enhancing the clinical detection and intelligent diagnosis of impacted teeth.

Furthermore, we conducted an in-depth analysis of this study’s limitations. This study mainly focused on panoramic dental X-ray films. However, the effectiveness of this framework on other types of dental images (such as CT scans and 3D imaging) has not been fully verified. Different types of images exhibit unique characteristics and application scenarios. Therefore, their performances require further research and verification. Although technological improvements have significantly enhanced detection performance, its feasibility and effect in actual clinical applications still need to be confirmed through more experimental evidence and long-term tracking. The clinical environment is complex and changeable, and the practical application of the model may be affected by various factors, such as equipment conditions and operator experience. In future studies, we will focus on these aspects to improve the diagnostic effect of the model.

References

References is not available for this document.