Research on Bubble Detection Based on Improved YOLOv8n

Bubble detection has important applications in industries such as tightness testing, fluid measurement, healthcare, and chemical engineering. Due to the fact that bubbles belong to small targets with characteristics such as reflection and shadows, they have fewer available features and are highly susceptible to environmental interference, which makes it difficult for detection models to accurately locate and recognize the bubbles. Aiming at these problems, an improved YOLOv8n for bubble detection is proposed. Firstly, the deformable convolution network (DCN) is used in the Backbone module to take place of the original C2f module, which enables the model to have stronger feature extraction and adaptive generalization capabilities for small targets with different deformations. Then the global attention mechanism (GAM) is introduced into Neck network to locally enhance the channels or regions of interest, which is beneficial for capturing the important features of targets, especially small targets like bubbles. Finally, the loss function is improved so that boundary box regression and target detection become more accurate. Experiments are conducted on the bubble dataset, and the results indicate the mean average precision (mAP) of the improved algorithm reaches 97.4%, increasing by 2.2% compared to the original YOLOv8n.This shows the proposed method has better comprehensive performance on bubble detection.

Bubbles are widely present in nature, industrial processes, and our daily lives, such as bubbles generated in gas-liquid two-phase flow fields, and bubbles generated by industrial sealed container leakage and so on.Gas-liquid two-phase reactions are very common in industrial processes, such as chemical engineering, energy, and wastewater treatment.For such reaction, studying bubble detection methods is of great significance for obtaining the changes in the gasliquid interface [1], [2].Industrial sealed containers are a common type of gas storage equipment, such as natural gas storage tanks, oxygen cylinders, etc. [3], [4].The sealed container submerged in water is inflated to maintain pressure, and the leakage situation can be determined through bubble detection.Therefore, bubble detection has very important The associate editor coordinating the review of this manuscript and approving it for publication was Tai Fei .theoretical research value and practical application significance.In order to improve the accuracy of bubble detection, researchers from various industries have proposed many methods, which are mainly divided into two categories: traditional machine vision based detection methods and deep learning based detection methods.

B. TRADITIONAL DETECTION METHODS
For bubble detection, traditional machine vision methods mainly adopt the idea of foreground and background segmentation.Some researchers use algorithms such as optical flow [5], [6] and edge detection [7], [8] to extract bubble features, while others use improved background modeling, frame difference, and other algorithms to identify foreground pixels and improve bubble detection accuracy [9], [10], [11], [12].
The optical flow algorithm utilizes the temporal variation and correlation of pixel intensity data in bubble image sequences to determine the respective pixel positions.Wu [5] applied the Horn-Schunck optical flow method to detect leaking bubble images, and obtained their features based on the Gaussian background model after bubble image filtering and region segmentation processing, achieving the detection and recognition of leaking bubbles in sealed tanks.Zhang et al. [6] proposed an improved bubble detection technique using a Gaussian mixture model, and improved optical flow calculation using brightness constancy and local smoothness constraints to verify its effectiveness in leak bubble measurement.The optical flow method does not require any prior knowledge of the scene to detect moving objects and can handle situations with moving backgrounds.However, factors such as noise, multiple light sources, shadows, and occlusion can have a serious impact on the calculation results of the optical flow field distribution.
The frame difference method utilizes the grayscale difference between adjacent two or three frames of pixels in an image sequence to extract foreground targets by setting a threshold.P. Ramya et al. [9] put forward a technique to improve the frame difference method.Firstly, the correlation coefficient was used to divide the blocks into two categories: background and others.Then the blocks that were not considered as background were refined by classification based on pixel level.Experiments were conducted on standard datasets and the results showed that it had a good performance in some critical conditions.An et al. [10] proposed an improved three frame difference method to detect moving bubbles during tightness measurement.This method combined Kirsch edge detection and background subtraction to address the phenomenon of partial overlap and incomplete contour in the traditional three frame difference method for moving object detection.The frame difference algorithm is simple and easy to implement, with low computational complexity.However, it is more sensitive to environmental noise such as changes in lighting.And the selection of threshold is crucial.A threshold that is too low is not enough to suppress noise in the image, while a threshold that is too high ignores useful changes in the image.
To sum up, traditional machine vision methods have certain advantages in algorithm complexity.However, due to changes in the scene, such as weather, lighting, shadows, and cluttered background interference, the detection of small targets such as bubbles becomes quite difficult and prone to missed detections and false detections.

C. DEEP LEARNING BASED DETECTION METHODS
Recently, object detection algorithms based on deep learning have received favor from researchers.This type of model can learn features automatically and perform well in object detection tasks, which mainly consists of two object detection methods: two-stage detection algorithm and one-stage detection algorithm [13].
Two-stage algorithm divides the object detection problem into two steps: First, generate candidate regions, and then classify and regress the candidate regions.The classic algorithms mainly include R-CNN [14], Fast R-CNN [15] and Faster R-CNN [16], etc.In 2020, Zhang et al. [17] from Harbin Institute of Technology designed a detection system for capsule surface leakage bubbles based on Faster R-CNN network, which can basically achieve accurate detection of capsule leakage points.In 2021, Hong et al. [18] used faster R-CNN to detect bubble positions and segmented bubble shapes using pixel classification networks.This model network has high universality and can represent bubble information in real-time.Yewon et al. [19] developed an automatic bubble detection tool for gas-liquid two-phase flow based on Mask R-CNN to reduce trial and error optimization of threshold parameters.When tested using bubble clusters not included in the training set, the accuracy of the model can reach over 95%.In 2022, Cui et al. [20] proposed a highly overlapping submillimeter bubble detection method based on Mask R-CNN.They combined the feature pyramid architecture with ResNet101 and Feature Pyramid Network to detect submillimeter bubbles, which make it possible to detect objects with significant size differences.In 2023, M Ahmed et al. [21] used the Mask R-CNN model for bubble detection on heat exchanger plates in experimental videos, achieving a maximum bubble detection accuracy of 78.6% for different coated plates.The two-stage detection algorithm has high accuracy and good detection performance for small targets.However, its detection speed is relatively slow, and it requires the generation of a large number of candidate regions, which increases computational complexity and time complexity.
The one-stage detection algorithm directly classifies and locates images without generating candidate regions, simplifying the detection process and is usually faster than the two-stage algorithm.The classic models mainly include SSD [22], RetinaNet [23], and YOLO series [24], [25], [26] [27], etc.These algorithms perform better than traditional machine vision methods in terms of detection accuracy and speed [28].In 2020, Wu et al. [29] put forward an improved deep learning object detection algorithm RetinaNet to detect bubbles on the surface of pharmaceutical empty bottles, in response to issues such as weak robustness and weak resistance to noise interference.Through validation on the dataset, the mean average precision (mAP) increased by nearly 2.4% compared to the original algorithm.In 2022, Zuo et al. [30] proposed an algorithm used to detect bubble defects in composite thin films on the basis of YOLOv5 model.The results showed that the mAP reached 94.3%, and bubbles can be accurately identified with relatively high confidence.Ding et al. [31] designed a dry leakage measurement framework for sealed container based on deep learning.The YOLOv5 model with asymmetric convolutional blocks in the backbone was used for bubble detection in tightness measurement, making the features of small targets easier to extract.Compared to the two-stage algorithm, the one-stage detection algorithm has faster speed and has good application value in some scenarios that require real-time detection.However, the accuracy is relatively low, and the detection effect for small targets is not ideal, which can easily lead to false or missed detections.

D. PROPOSED METHOD
Due to the small proportion of bubbles in the image, bubble detection tasks belong to the category of small object detection, and there are few available features in specific detection tasks, which makes feature extraction and recognition of bubbles difficult.Meanwhile, due to environmental and other factors, general bubble detection tasks are often accompanied by interference such as shadows and reflections, making it difficult for traditional image recognition algorithms to achieve good results and are prone to missed detections and false detections.
Deep learning algorithms can automatically discover hidden features in images and have better generalization ability in bubble image classification tasks in complex scenes such as reflections and shadows.In this article, YOLOv8n is used as the basic network model to avoid the complex step of generating a large number of candidate regions in the two-stage algorithm, reducing computational and time complexity.At the same time, the introduction of deformable convolution network (DCN) and the global attention mechanism (GAM) modules and improvement of the loss function have increased the model's ability to detect small targets such as bubbles, which has good application value in some scenarios that require real-time detection of bubbles.This paper summarizes the following major contributions: 1) The DCN is introduced into the Backbone to take place of the original C2f module, making the model have stronger feature extraction and adaptive generalization capabilities for small targets with different deformations.
2) The GAM is added into Neck network to locally enhance the channels or regions of interest, which is beneficial for capturing the features of small targets to improve detection accuracy.
3) The loss function is improved so that boundary box regression and target detection become more accurate.In the complex background of actual bubbles, the convergence speed can be further improved to optimize the network and improve its accuracy without introducing additional parameters or increasing training time.
The other parts of the paper are organized as follows: Section II introduces the related theories.Section III describes the improved YOLOv8n network architecture in detail.Section IV describes and analyzes the experimental results.Section V draws the conclusion and looks forward to the future research.

II. REATED THEORIES A. YOLOv8n ALGORITHM INTRODUCTION
YOLOv8 is the recently proposed model in the YOLO series, with excellent detection accuracy and speed.It mainly includes five versions: YOLOv8n, YOLOv8x, YOLOv8l, YOLOv8m, and YOLOv8s.Considering speed and accuracy, YOLOv8n, which has the faster detection speed, was used as the baseline model, and improvements were made on this basis.Fig. 1 shows the YOLOv8n network architecture.
Input module is mainly used to scale the input image to the size required for training, and also includes some operations such as scaling, changing the color tone of the image, and Mosaic data enhancement.
Backbone module is mainly used for extracting the features of targets.For YOLOv8n, the Cross Stage Partial (CSP) Network module in YOLOv5 has been replaced with a C2f module, which adds more branches to enrich the tributaries during gradient retrieval and enhances the ability to express features through dense residual structures.According to the scaling coefficient, the number of channels is changed through splitting and splicing operations to reduce computational complexity and model capacity.
The SPPF module is retained in YOLOv8n, which is a spatial pyramid pooling layer that can expand the acceptance domain and achieve the fusion of local feature and global feature, enriching feature information.
Neck is mainly used to fuse features from different dimensions, and it follows FPN+PAN structures.It integrates feature maps of different scales output from the three stages of Backbone, helping to aggregate shallow information into deep features.
Head uses a decoupling head structure to separate the classification and detection.It is mainly used to calculate the strengthened features and ultimately obtain the confidence and position of different targets.

B. PROBLEMS OF YOLOv8n IN BUBBLE DETECTION
Due to the small proportion and irregular shape of bubbles in the image, there are few available features for specific detection tasks.Additionally, due to environmental factors, general bubble detection tasks are often accompanied by interference such as shadows and reflections.Therefore, YOLOv8n has some problems in bubble detection: 1) YOLOv8n uses standard convolution, and the fixed weight of convolutional kernels can result in the same receptive field size when processing different regions of a graph in the same convolutional neural network, resulting in some bubbles not being detected well.
2) YOLOv8n is prone to missing some detailed information during feature extraction of small targets like bubbles, resulting in missed and false detections.

III. IMPROVED YOLOv8n NETWORK ARCHITECTURE
In order to solve the above problems of YOLOv8n in bubble detection, an improved YOLOv8n algorithm is proposed.The deformable convolution and attention mechanism are introduced into the Backbone module and Neck network respectively, and the loss function is improved, enhancing the bubble detection ability of the model.

A. DEFORMABLE CONVOLUTION NETWORK
For small targets like bubbles, their overall size is small and their shape is not fixed (ellipsoidal, willow shaped, etc.) due to the influence of fluid forces.If traditional convolution is still used, the fixed weight of convolutional kernels can result in the same receptive field size when processing different regions of a graph in the same convolutional neural network, resulting in some bubbles not being detected well.Inspired by Dai et al. [32], the deformable convolution is introduced into the Backbone to take place of the original C2f module.The convolutional kernel of deformable convolution is not a fixed N × N grid, but sampling of non-standard shapes.For different stages, different feature maps, and even different pixels, there may be their optimal convolutional kernel structure.By adding offsets to expand the receptive field, as shown in Fig. 2, sampling can be closer to the shape and size of bubbles, making it more robust.Fig. 3 shows the deformable convolution module [32].First, by performing convolution operation on input feature map, an offset field with 2N channels can be obtained, and then the offset p n is formed by the offset matrix of pixels on the offset field.Assuming that x represents input feature map, y represents the output feature map, and p 0 is the position in y, then the output features of traditional convolution and deformable convolution are shown as follows: where = {(−1, −1), (−1, 0), . . ., (0, 1), (1, 1)}.p n is a coordinate in , w indicates the weight of the sampling point.
However, decimals often appear in the positions after adding offsets, which do not match the actual pixel points.Therefore, interpolation methods are needed to obtain the 9662 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
offset pixel values.To solve the above problems, bilinear interpolation is usually selected.
where, I indicates the value of pixel, and I x ′ y ′ indicates offset pixel value.In the YOLOv8n network, the Backbone is mainly used to better understand and describe images.It typically consists of multiple convolutional layers, which can extract features at different levels by using different convolutional layer structures.By fusing and integrating features from different levels, more comprehensive and accurate image features can be obtained, thereby improving the performance.Therefore, to enable the detection model to adaptively adjust the receptive field of objects, it is considered to introduce a deformable convolution module into the backbone of YOLOv8n.Firstly, two deformable convolutional layers are sequentially connected to construct D_Bottleneck module, as shown in Fig. 4.Then, C2f_DCN module is reconstructed from D_Bottleneck, as shown in Fig. 5. Finally, C2f_DCN module is introduced into the backbone network to replace the original C2f module, as shown in Fig. 6.

B. ATTENTION MECHANISM
Attention mechanism can locally enhance the channels or regions of interest by learning different weights, which is beneficial for capturing the important features of small targets, and improving the ability to extract features.Therefore, it is currently widely used for object detection.At present, the attention mechanisms only pay attention to one aspect of the channel or space, with low efficiency, such as SE [33].
Afterwards, some scholars have combined these two attention mechanisms in a sequential manner, such as CBAM [34], and the results achieved significant improvements.However, the above methods ignore the interaction between channels and space, resulting in the loss of cross-dimensional information.The GAM [35] can make cross dimensional interactions stronger and simultaneously obtain features from three dimensions, avoiding information loss and improving detection accuracy.The GAM attention module has been redesigned based on CBAM, and the entire process is shown in Fig. 7. GAM is a module that combines channel attention mechanism and spatial attention mechanism, reducing diffuse information while amplifying global dimensional feature interaction.Given a feature map F 1 , it is first processed by M C module and convolved with F 1 to get F 2 , and then F 2 is processed by M S module and convolved with F 2 to get F 3 .The definitions of F 2 and F 3 are shown in equation ( 4) and equation (5).
where, M C is a channel attention graph, M S is a spatial attention graph, ⊗ represents element-based multiplication.Fig. 8 and Fig. 9 shows the channel attention and spatial attention submodule, respectively.In Fig. 9, the size of F 2 is C × H × W. Two convolutions of 7 × 7 is used to model nonlinear relationship of pixels in 7×7 blocks, which enables the parameters to capture more of the relationships between pixel    spaces.Due to the small proportion of bubbles in the image, there are few features that can be used for specific detection tasks.In addition, they are also susceptible to environmental factors, such as shadows and reflections.To further improve the model's feature extraction ability for bubbles, GAM is introduced into the neck of YOLOv8n.The improved part is shown in Fig. 10.

C. LOSS FUNCTION
The difference between predicted results and actual labels is measured by the loss function.A good loss function can accelerate network convergence and improve network accuracy.In the original YOLOv8n network, the CIoU [36] is used for calculating the coordinate loss of the prediction box (PB).It takes into account three factors: the aspect ratio, distance between center points, and overlap area.However, it can't converge well for some anchor boxes with lower annotation quality.The relevant formulas for CIoU are as follows: In equation ( 6), ρ indicates the Euclidean distance between two center points, b, b gt respectively indicate the boundary center points of the PB and the ground truth box (GTB), w, h are the width and height of the minimum bounding rectangle of the PB and the GTB, α is a weight coefficient, ν is used for measuring the consistency of the aspect ratio, w gt , h gt are the width and height of the GTB.In equation ( 8), the ratio is used in CIoU.When the aspect ratio of the PB meets certain conditions, the penalty function of CIoU will degenerate and fail, which will hinder the convergence of the model.The gradients of w and h relative to ν in CIoU are shown in equations ( 9) and (10).Through observation, it is found that these two gradients are a pair of opposite numbers, meaning that w and h are not able to increase or decrease simultaneously.Inspired by the loss function of focal EIoU [37], the WIoU [38] is introduced.In the function of WIoU, a gradient gain is used to ensure good anchor frame effects while reducing the impact of harmful gradients, which can improve the overall performance.In order to avoid large harmful gradients that are generated by lower quality samples, the WIoUv3 loss function is used for experiments, which is shown in equation (11).(11) where, β is used to describe the quality of the bounding box, α, δ are the learning parameters, x, y, x gt , y gt are the center point coordinates of the PB and the GTB.w, h are the width and height of the minimum bounding box between PB and GTB, and * represents converting a variable into a constant.
To ensure the quality of the bounding box, we want the value to be as small as possible.The equation ( 12) is as follows.
Based on the above improvements, the YOLOv8n network structure (ours) is shown in Fig. 11.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In this part, bubble dataset is built based on what might be relevant to bubble detection in industrial practice, and the implementation details are given.Then, some experiments are conducted on the bubble dataset to verify the effectiveness of the model.Finally, the bubble detection results using the two algorithms before and after the improvement are visualized.

A. DATASET AND IMPLEMENTATION
Considering the validity of the bubble dataset, the dataset used in this article was collected by simulating industrial sites.When natural light is weak, the bubble image becomes darker, and when natural light is strong, the image background becomes more complex, including shadow interference, reflection, and differences in bubble shape.Therefore, bubble images were collected under normal lighting, strong lighting, and weak lighting conditions.The camera used is Sony FDR-AX700, which is mainly used for data collection.
The size of the captured image is 3456 × 4608 and the aspect ratio is 3:4.A total of 6000 bubble images in JPG format were collected in the above environment.Then, the size of the bubble image was set to 640 × 640 for subsequent network training.Fig. 12 shows part of the dataset.
Furthermore, the excellent detection performance of convolutional neural networks relies on a large number of annotated samples.Therefore, by performing generalization processing on the original bubble dataset (such as adding noise, geometric transformations, etc.), the bubble dataset is made larger and possible overfitting during model training is also avoided.The generalized dataset contains a total of 15000 bubble images, with a ratio of 6:2:2 for the testing, training, and validation datasets.The final structure of the bubble dataset is shown in the Table 1 below.
The operating environment is as followings: Window10 operating system, the GPU is NVDIA RTX3080, 10G graphics memory.The runtime environment is built using the Python language based on the Pytorch, and the CUDA 11.3 acceleration toolbox is used.The model is trained using SGD as the optimization function, with 300 epochs.The initial learning rate is 0.01, and the size of a batch is 32, and the momentum parameter is 0.937.

B. ABLATION EXPERIMENTS
The algorithm proposed in this article introduces deformable convolution and GAM attention modules based on the YOLOv8n model, and improves the loss function.To verify the effect of different improvements, a total of 5 sets of experiments were designed.mAP@0.5/%,Params and GFLOPs were selected as evaluation indicators.During training, input images and training hyper-parameters maintained consistency.Table 2 shows the experimental results, in which √ indicates that corresponding module is introduced into the original model.
From Table 2, we can see that adding a deformable convolution module or GAM module separately to the YOLOv8n model only adds a small number of parameters, but the accuracy is improved, with mAP improved by 0.9% and 0.6%, respectively.Adding both deformable convolution module and GAM module simultaneously further improves the accuracy, with an increase of 1.3% in mAP.The results indicate that the introduction of deformable convolutional module and GAM module can help improve network performance.Finally, the loss function WIoU is applied to the model, which further improves the model's accuracy and detection performance for bubble targets.Although the model parameters have increased slightly, the ability to extract feature information is improved.While meeting the real-time requirements, mAP is increased by 2.2%, demonstrating the effect of the improvements.

C. COMPARATIVE EXPERIMENTS
To verify the effect of our algorithm in bubble detection, the comparative experiments between improved YOLOv8n and the other mainstream algorithms are conducted under the same experimental environment, the results of which are presented in Table 3 below.
In the Table 3, we can see that our algorithm has higher accuracy compared to SSD, Faster R-CNN, YOLOv5s, YOLOv7-tiny, EfficientDet-D1 [39], [40] and SpineNet-49 [41], with an increase of 7.2%, 4.1%, 5%, 3.3%,1.6%and 0.9% in mAP, respectively.Compared to the improved YOLOv8n algorithm we proposed, the SpineNet model has a much lower detection efficiency, and the model size is nearly ten times larger than ours.The EfficientDet model has a slightly shorter inference time than our algorithm, but its detection accuracy is lower, and the model size is twice that of ours.In the field of object detection, the accuracy of measurement systems and model size are both important.Considering that the bubble detection task mentioned in this article generally requires real-time performance, we also need to consider the detection efficiency of the model during the model selection stage.In summary, the improved YOLOv8n model (ours) has the better overall performance and can meet the task of bubble detection.

D. VISUALIZATION ANALYSIS OF BUBBLE DETECTION RESULTS
To evaluate our algorithm more intuitively, the original YOLOv8n and ours were used for detection on test images under different lighting conditions, the results of which are shown in Fig. 13.
From the Fig. 13, it can be seen that under normal lighting (a) and weak lighting (b), there is little difference in missed and false detections before and after model improvement.However, the improved model has higher confidence scores.Under strong lighting conditions (c) (with obvious shadows and reflections), the improved algorithm can reduce false detections and missed detections on the basis of increasing confidence scores, indicating that improved algorithm performs better in bubble detection tasks.

V. CONCLUSION
Aiming at the problems of bubble detection in complex situations, an improved YOLOv8n algorithm for bubble detection is proposed.Firstly, deformable convolution is added into the Backbone, and then GAM is introduced into Neck network to increase detection accuracy.Finally, the loss function is improved so that boundary box regression and target detection become more accurate.The experiments demonstrate that the mAP of ours reaches 97.4%, with a parameter quantity of 3.21M and an inference time of 12ms.While meeting the real-time requirements, it makes the bubble detection more accurate in complex situations.In future work, the improved model will be lightweight processed so that it can be deployed on detection devices without losing detection accuracy and speed.In addition, adding bubble datasets in various environments to further improve the generalization performance.

FIGURE 12 .
FIGURE 12. Part of dataset.(a) Under normal lighting conditions;(b) Under weak lighting conditions;(c) Under strong lighting conditions.

FIGURE 13 .
FIGURE 13.Comparison of bubble detection results between two algorithms under different lighting conditions: (a) Normal lighting; (b) Weak lighting;(c) Strong lighting.

TABLE 1 .
Structure of bubble dataset.

TABLE 2 .
Evaluation indicators comparison of different improvements.

TABLE 3 .
Results of comparative experiments.