A New Forensic Video Database for Source Smartphone Identification: Description and Analysis

In recent years, the field of digital imaging has made significant progress, so that today every smartphone has a built-in video camera that allows you to record high-quality video for free and without restrictions. On the other hand, rapidly growing internet technology has contributed significantly to the widespread use of digital video via web-based multimedia systems and mobile smartphone applications such as YouTube, Facebook, Twitter, WhatsApp, etc. However, as the recording and distribution of digital videos have become affordable nowadays, security issues have become threatening and spread worldwide. One of the security issues is identifying source cameras on videos. There are some new challenges that should be addressed in this area. One of the new challenges is individual source camera identification (ISCI), which focuses on identifying each device regardless of its model. The first step towards solving the problems is a popular video database recorded by modern smartphone devices, which can also be used for deep learning methods that are growing rapidly in the field of source camera identification. In this paper, a smartphone video database named Qatar University Forensic Video Database (QUFVD) is introduced. The QUFVD includes 6000 videos from 20 modern smartphone representing five brands, each brand has two models, and each model has two identical smartphone devices. This database is suitable for evaluating different techniques such as deep learning methods for video source smartphone identification and verification. To evaluate the QUFVD, a series of experiments to identify source cameras using a deep learning technique are conducted. The results show that improvements are essential for the ISCI scenario on video.


I. INTRODUCTION
Cellphone has developed rapidly over the past century due to its economic advantages, functionality and ease of access [1]. It allows the creation of digital audiovisual content without any constraints such as time, objects, places and network connections [2]. Smartphone devices can provide some pertinent information for crime prosecution and forensic investigations in massively important manner [1]. These types The associate editor coordinating the review of this manuscript and approving it for publication was Roberto Caldelli. of investigations have a potential importance in research fields in all the sectors like medicine, law, and surveillance system where images and videos authenticity is important. In general, video forensic analysis is much more difficult than the image analysis due to lossy video compression, so the current traces can be erased or significantly damaged by high compression rates, making all or part of the processing records unrecoverable. While numerous forensic methods have been developed based on digital images [3]- [9], the forensic analysis of videos has been less explored. It should be noted that methods based on images cannot also be applied directly to videos [10]- [12]. This is due to some challenges such as compression, stabilization, scaling, and cropping, as well as the differences between frame types that can occur when producing a video.
Video identification algorithms are used to identify and distinguish camera types based on video produced by digital cameras. During the last few years, forensic specialists have been particularly interested in this topic. In general, there are two main ways to identify images and videos, namely, examining images or videos to extract a unique fingerprint of the camera and using metadata associated with the images or videos (the DNA of a video). Lopez et al. [13] demonstrated that the internal elements and metadata of video can be used for source video identification. Since metadata can be removed from an image or video, identifying video or images based on fingerprint is a reliable method. Moreover, two concepts are considered for identifying camera: individual source camera identification (ISCI) and source camera model identification (SCMI). ISCI distinguishes cameras from both the same and different camera models, while SCMI is a subset of ISCI that distinguishes a particular camera model from others, but cannot distinguish a particular camera model from the same camera models. SCMI has been more researched compared to ISCI [14]. Other aspect that is important in identifying source camera on video is codec for compensation purpose. The codec may affect the accuracy of the source camera identification, as some useful information is lost during the action.
As a result of the challenges and advances in research in the field of forensic video analysis, such as deep learning methods, there is a need for standard databases that allow researchers to more easily compare techniques using the same experimental protocols. Although there are several databases for identifying source cameras for images [15], [16], there are few databases for videos. Therefore, for new challenges such as ISCI and deep source camera identification analysis that focus on video, it is essential to have a database to perform new methods on video.
Since most video databases focus on videos recorded with a (videocassette recorder) VCR, and among them there is only one database for smartphones (Daxing) [1], in order to give new tasks to the databases, we focus on presenting a smartphone database for videos. It should be noted that the Daxing database cannot cover the ISCI challenge for all devices (out of the 22 models, 16 models can be used for the challenge) and QUFVD is more suitable to train a Deep Learning method due to the number of videos recorded in the database (6000 videos) compared to Daxing (1400). This study is an attempt to develop a database that is unique to the new challenges of smartphone video. The structure of the database for identifying source camera is shown in Figure 1. As shown in the figure, to evaluate the database, we need to extract frames. Generally, frames consists of intra-coded picture (I-frame), predictive coded picture (P-frame), and bi-predictive coded picture (B-frames), which show promising results among those obtained with I-frames [12], [14]. The database is presented by both videos and I-frames corresponding with videos. We also consider training, validation, and testing data for two common categories of methods used in the field, namely Photo Response Non Uniformity (PRNU) and Machine Learning approaches. PRNU, which is understood to be the unique fingerprint of the camera, is often referred to as residual noise or sensor pattern noise (SPN). PRNU occurs when the CCD (Charge Coupled Device) or CMOS (Complementary Metal Oxide Semiconductor) sensors process the input signal (light) and convert it into a digital signal. In deep learning methods, which are a popular category of machine learning, this traing step should be performed to extract the fingerprint of the camera. The main challenges for these methods are the separation of content from noise and the number of training data. The first challenge can be solved by introducing architectures that address the problem by, for example, adding new layers and loss functions.
The paper is organized as follows. Section II is a review to available databases for videos with a brief description. Our motivation is explained in Section III. Our new video database is completely presented in Section IV. Section V describes database evaluation based on the deep learning method. The last section concludes this work.

II. LITERATURE REVIEW
The databases presented based on videos are summarized in Table 1.
One of the main reasons for the lower exploration of videos compared to images is that there are few standard digital video databases to develop the methods [14]. We explore the databases in the section.
CAMCOM2010 [17] is a contest designed to identify source YouTube videos. Two participants submitted results despite a satisfactory number of participants at first. However, the database is not available publicly.
The University of Surrey's website provides access to SULFA database [18] 1 that contains original videos and forged ones. The original videos are suitable for source camera identification purpose. About 150 videos were collected from three sources that was also extended by [19]. Method presented in [20] was tested in the study.
VISION database 2 was introduced in [21] that is the most popular database in the field. In total, 35 portable devices from 11 major brands contributed 34,427 images and 1914 videos, all in native format and social format (Facebook, YouTube and WhatsApp are included). There are videos captured in indoor, outdoor, and flat scenarios. Videos of flat surfaces such as walls and sky are included in the flat scenario. Videos depicting offices or shops are included in the indoor scenario, while videos depicting gardens are included in the outdoor scenario. Three recording modes were used for each scenario: Still mode, where the user stands still while the video is recorded. While capturing the video (moving video), the user walks, the panrot mode combines a pan with a rotation to achieve a recording. YouTube and WhatsApp social media platforms were used to exchange videos belonging to each scenario. In the study, they evaluated the database by method presented in [4].
video-ACID database 3 was presented in [14] to source camera identification that is accessible publicly. Over 12,000 videos were collected from 46 physical cameras representing 36 different camera models in the video-ACID's database. All of these videos were shot manually to represent a range of lighting conditions, content, and motion. Moreover, this database is suitable for both SCMI or ISCI scenarios. They evaluated deep learning method presented in [22].
[1] presented a Daxing smartphone identification database, 4 which include both images and videos from extensive smartphones of different brands, models and devices. The data from 90 smartphones, representing 22 models and 5 brands, includes 43400 images and 1400 videos. In the case of the iPhone 6S (Plus), 23 different smartphone models are available. Scenes selected normally include a sky, grass, rocks, trees, stairs, a vertical printer, a lobby wall, and a white wall in a classroom, among others. The videos were shot vertically in each scene. Each scene contains at least three videos. In addition, all videos were recorded over 10 seconds. The database was evaluted by method presented in [23].
SOCRatES database 5 [24] captured by smartphones. Around 9700 images and 1000 videos were taken by 103 different smartphones from 15 different brands. [3] and [25] were assessed on the database.

III. MOTIVATION
The rapid development of new smartphones in the field of imaging may be an important reason for the development of databases in forensic analysis, especially in the identification of source cameras. On the other hand, the coverage and completion of aspects that other databases have not considered in the development of the databases may lead researchers to present a new database.
As described in the previous sections, most databases contain videos recorded on VCR, and only one of these databases is dedicated to smartphones (Daxing) [1]. Although the database can be considered as an important database in this field as it covers a wide range of devices, there are some aspects that may lead researchers to develop a new database to meet new challenges in this field. Table 2 shows the database in detail based on number of videos for each device. Of the 90 devices used in the database, 85 devices were considered for recording the videos. As can be seen from the table, the range of videos recorded by the devices is limited. This may be because the Daxing database focuses on both videos and images. The smallest number of videos recorded by a device in the Daxing database is 4 while the largest number of videos recorded by a device is 106 videos, where only one device has 106 videos and the rest of the devices have less than 31 videos. More devices have 12 to 28 videos. On average, the number of videos for each device is around 26. As a result, the assessment of PRNU-based methods may not be reliable. Furthermore, source camera identification techniques based on machine learning may face a problem of unbalanced data since the number of training videos is small and differs across the devices. This prompts the researchers to adjust and balance the database before use. For example, for the iPhone 8 Plus, 24 videos were recorded for device #1 and only 4 videos were recorded for device #2.
As our experiments have shown (Section V), increasing the number of training data can improve the results in our database. Also, since most machine learning methods require enough data to train, it is obvious that a database with many more videos is better suited to machine learning methods compared to Daxing for the ISCI scenario. As shown in Table 2, for implementing ISCI scenario, only one device has 106 videos to train, and the rest of the devices have less than 31 videos to train. Additionally, to design the structure of the Daxing database to be suitable for a machine learning approach, the videos need to be divided into training, testing and validation sets. If we consider 26 videos for each device as the average of the database and use our structure to split the database, we have 15, 7 and 4 videos for training, testing and validation respectively. Therefore, it is obvious that it can be less to compare the machine learning methods fairly. It should be noted that the Daxing database may be more suitable for machine learning methods for the SCMI scenario for more models.
Finally, it should be noted that a new database can be connected to other databases such as Daxing to obtain more data and deal with new challenges.

IV. QUFVD DESCRIPTION
In this section, we discuss the features and structure of QUFVD. For describing a database, the following options are important: number of videos and camera, resolution, codec, suitable for SCMI or ISCI. The options are described in more detail in this subsection. Table 3 summarizes our database with its features. The QUFVD is publicly available. 6

A. DEVICES
There are several popular manufacturers that produce different smartphone brands. Among them, there are few brands that are widely used by people. In order to have a variety of brands, we collected the devices to be used for video recording and selected 5 popular brands: iPhone, Samsung, Huawei, Xioami and Nokia. For each brand, we selected two different models, and for each model, we selected two devices. Therefore, four devices are considered for each brand. The total number of devices used to collect this database should be 20 devices.

B. SIZE PROPERTIES
With the development of deep learning methods in this area, a large number of videos or frame can improve the results in this area, as shown in this article. Therefore, a database with suitable size can be considered for both traditional methods such as PRNU and deep learning methods. In our database, 300 videos are collected for each device, making a total of 6000 videos available. The length of the videos is between 11 and 15 seconds at a frame rate of 30 frames per second. Since I-frames play an important role in identifying the source [12], [14], these types of frames are also extracted. Depending on the length and content of each video, they are

C. CONTENT PROPERTIES
In this database collection, we rely mainly on the static camera despite the static and moving state of the camera when recording. This database contains very diverse video collections of different scenes, either outdoor, indoor, moving or still objects, mainly gardens, sky, streets, shops, domestic staff and sea. Figure 2 shows samples of the data based on each device.

D. ISCI PROPERTIES
One way to challenge the identification of the source camera is that the captured videos are from smartphones with the 7 https://www.ffmpeg.org/ same camera model. In the database, two devices are considered for each smartphone. For example, for Samsung Galaxy A50, the videos were captured from two devices. Moreover, the challenge is studied in the evaluation section. Therefore, our database contains both SCMI and ISCI scenarios for all models defining two 10-and 20-class problems.

E. CODEC PROPERTIES
Video files are compressed with codecs, which are always a tradeoff between quality and size (better quality vs. larger file size). Video files can be compressed to reduce their size, which can reduce bandwidth usage and increase streaming speed. For encoding high-definition video, AVC is the standard codec used by several online video services, including YouTube and Vimeo. The MPEG-4 and the H.264 standards are implemented by the library 'libx264' in FFmpeg. All smartphones used for our database recorded videos according to the H.264 video encoding standard, except for the iPhone Xs Max and the Samsung Note9 (H.265).

F. I-FRAME PROPERTIES
Group of Pictures (GOP) consists of I-frames, P-frames and B-frames as intra-coded picture, predictive-coded picture and bi-predictive coded picture respectively in coding standards like MPEG series and H.264. I-frames are the least compressible and do not require other video frames for decoding. P-frames can be decompressed with data from previous frames and are more compressible than I-frames. For B-frames, previous and forward frames can also be used as data references to achieve the highest compression. The I-frame is generally more detailed than the P-and B-frames. The GOP size, generally divided into fixed and unfixed, is the number of B-and P-frames between two consecutive I-frames. Several studies have demonstrated that methods based on I-frames give better results compared to other frames [26]- [28].

H. RESOLUTION AND COLOR MODE
The resolution of a video is the width and height of the video in pixels. All videos recorded in the database are based on the rear camera of smartphones. There are two types of resolutions in the database, namely 720 × 1280 and 1080 × 1920. Also, the frames are stored in two modes: Color (True Color) and Grayscale. It can be tested whether the resolution and color mode affect the results.

I. STRUCTURE OF THE DATABASE
The overall structure of QUFVD is shown in Figure 3. The structure can be modified by researchers according to their methods and facilities. Moreover, the database can be combined with other databases (e.g. the Daxing database) to address new scenarios and challenges and to have a wider choice of brands.

V. QUFVD EVALUATION
The quality of our database is evaluated in this section by experimenting with ISCI and SCMI scenarios with different settings based on a Deep Learning method. We divide the experiments into different scenarios showing the influence of some conditions on the results. This result provides a baseline for the accuracy of camera model identification in the QUFVD database and can be used for comparison with other methods. The division of the database for the experiments is that 80% of these videos are considered as training set and the remaining 20% are considered as test set. Also, 20% of the training data is considered as validation data. The structure can be used for both classical (like PRNU) and machine learning methods. For example, in PRNU methods, reference patterns can be obtained from videos in the training data and query patterns in the test data. Since, as mentioned earlier, the I-frames lead to better results, the I-frames of the videos are extracted to evaluate the database. The statistics for training, testing, and validation at both the video and frame levels are shown in Table 4. For each video in each experimental series, we selected all I-frames related to the videos in the training, testing, and validation series. A total of 76531 I-frames were extracted.
The method presented in [29] is used to evaluate our database. Also, in references [30], [31] and [22], the CNN method (MISLnet CNN architecture [29]) was used to identify the source camera, using frames to train the network. The network used a constrained convolutional layer that was added as the first layer that used three kernels with size 5. This layer is constructed in such a way that there are relationships between adjacent pixels that are independent of the content of the scene. The methods was tested on VISION database [21]. The experiments showed that the layer can improve results compared with deep learning architectures without the layer. The structure of the CNN for the three studies is shown in Figure 4. As shown in the figure, a constrained convolutional layer is added to a simple CNN.
Our database is evaluated against two main scenarios of ISCI and SCMI. A 10-class problem should be considered for SCMI and a 20-class problem for ISCI. For each, the effect of the number and size of patches is examined, as well as the effect of the color mode, i.e., gray and true color modes. All videos were encoded according to the respective device codec using the H.264 or H.265 video encoding standard, and no video was edited or re-encoded.
To identify a video based on its I-frames, all I-frames in the test set are considered. The scores obtained by the CNN based on the highest probability show which I-frames belong to which classes. At the video level, a majority vote then decides all the frames that belong to a video.
A 64-bit operating system (Ubuntu 18) with a CPU E5-2650 v4 @ 2.20 GHz, 128.0 GB RAM, and four NVIDIA GTX TITAN X. was used in order to run our experiments.

A. ISCI VS SCMI
The performance of the network is measured by computing the accuracy based on frame-level and video-level in both scenarios ISCI and SCMI. In classification stage, each frame/video in the test data is classified into one of the 10-class (SCMI) or 20-class (ISCI). The frame-level and video-level results for SCMI scenarios for each smartphone model are shown in Table 5.
To investigate the effects of device dependency, the ISCI scenario is considered. The results of the frame and video levels in terms of accuracy for the ISCI scenario for each device are shown in Table 6. The result is based on the accuracy for each device.
Also overall accuracy, precision, recall and F1-score based on frame-level for both scenarios ISCI and SCMI are reported in Table 7. Precision is also called Positive Predictive Value (PPV) which is a measure of the closeness of the set of predicted results. Recall is also known as True Positive Rate (TPR) and F1-score is the harmonic average of the precision and recall, where it is at its best at a value of 1 meaning perfect precision and recall.
Tables 5 and 6 also list the effects of color mode, i.e., grayscale and true color. With this premise, Figure 5 and 6 provide a more comprehensive picture of camera identification performance to check the quality of the CNN by presenting the Receiver Operating Characteristic (ROC) curves for a selected group of ten and twenty cameras from our database. Two values are calculated for each threshold: True Positive Ratio (TPR) and False Positive Ratio (FPR). The TPR of a given class, e.g. Huawei Y7, is the number of outputs whose actual and predicted class is Huawei Y7 divided by the number of outputs whose predicted class is Huawei Y7. The FPR is calculated by dividing the number of outputs whose actual class is not Huawei Y7, but whose predicted class was Huawei Y7 by the number of outputs whose predicted class is not Huawei Y7.
One of the most important factors in machine learning is how much training data the model needs to perform well. To show how it works, a series of experiments were conducted with increasing training data for the SCMI scenario for both gray and color modes. Table 8 shows the effect of the factor.
In addition, the size of the patches can affect the performance of the CNN methods. For the experiment, four different sizes were considered for the SCMI scenario based on 10000 greyscale patches (see Table 9).
For a more detailed analysis of the error detections, the confusion matrix for the ISCI scenario in grayscale mode is given in Table 10.
Also, the processing time for patch, frame and a video with 11 I-frames is shown in Table 11, which is performed by processing frame size of 1920 × 1080.

B. RESULT DISCUSSION
State-of-the-art source camera identification methods have faced challenges such as compression, stabilization, and ISCI. Various methods have been presented to overcome these challenges. Recently, Deep Learning methods have been introduced to solve these challenges. As mentioned earlier, our database is also evaluated using deep learning method developed to solve these problems. Overall, the results in frame and video levels show that the method is successful for the SCMI problem, but it does not work well for the ISCI challenge. For both scenarios, when the results are reported at the video level, improvement can be seen. The results are discussed in more detail below.
As shown in Table 5 in frame level, all devices except the Y7, 8 Plus, and Redmi Note9 Pro achieve more than 70 % accuracy in grayscale mode. The biggest improvement in the mode is for the Note 9 compared to the color mode. The best results are reported for the Note 9 and Xs Max, which have the same codec (H.265). At the video level, an overall improvement is seen for all devices. The best result with 95% is also obtained for the Note 9. However, its codec is similar to that of Xs Max, so Xs Max cannot see an improvement like the Note 9. In general, we cannot make the decision that the codec has a direct effect on the results. Moreover, we can see that the resolution does not affect the results, since Y7 and Y9 have the lowest resolution, but their results are not worse. Therefore, as a result of this work, the two cases cannot confirm that codec and resolution are two effective factors in this area. However, in grayscale mode, it can be seen that the results are better than in color mode.
Based on Table 6 (ISCI scenario), although only 3 devices have an accuracy of less than 65%, half of the devices achieve an accuracy of less than 50% at the frame level. Even though Note 9 for Device 1 scores the best among all devices, similar to the SCMI scenario, Device 2 scores only 66.7%, which means the fifth place. There are no meaningful results in the table, except that grayscale mode still performs better than color mode for both frame and video levels. Figures 5 and 6 show the TPR compared to the FPR for the two scenarios SCMI and ISCI in two modes (color and grayscale) at different frame-level thresholds. As can be seen from the figures, we have different analysis for the devices in terms of TPR and FPR. The best performance is shown by Nokia 5.4 with Area Under Curve (AUC=0.989) compared to the second ranked Note 9 with AUC=0.987 in grayscale mode ( Figure 5 (b)). Moreover, as shown in Figure 6 (a and b), RedmiNote9Pro performs significantly better in grayscale mode. From the figure, it can be seen that Note 9 device 1 has the best performance with AUC=0.989.
As shown in Table 7, all metrics are better in the SCMI scenario than in the ISCI scenario. Based on the results,  it is essential to improve the scenario in machine learning approaches. Table 8 shows that increasing the number of patches to be trained can improve the results in both gray scale and color modes for SCMI at frame level. 37.3% is improved from 5000 patches to all patches (about 90000) trained for each class. In the ISCI scenario, the improvement is 49.9% if all patches (about 45000) are trained, which is an improvement of about 6%. Also in gray mode, when all patches are trained, the results are 2 % higher than in color mode. VOLUME 10, 2022 Table 9 shows that while the size of each patch can improve performance, it is limited by the size of the 350 × 350. It should be noted that the experiment was conducted   with 10000 patches for each class. For sizes over 350 × 350, a drop in performance is indicated. Therefore, we chose this size for all experiments in the evaluation. Confusion matrix of ISCI scenario in grayscale mode. Classes 1 to 20 are Y7 (device 1), Nokia 5.4 (device 2), Nokia 7.1 (device 1), Nokia 7.1 (device 2), A50 (device 1), A50 (device 2), Note 9 (device 1), Note 9 (device 2), RedmiNote8 (device 1), RedmiNote8 (device 2), RedmiNote9Pro (device 1), Y7 (device 2), RedmiNote9Pro (device 2), Y9 (device 1), Y9 (device 2), 8 Plus (device 1), 8 Plus (device 2), Xs Max (device 1), Xs Max (device 2), Nokia 5.4 (device 1), respectively. TABLE 11. The processing time (second) for patch, frame and a video with 11 I-frames. Table 10 shows the confusion matrix obtained for the ISCI scenario in grey scale mode. As mentioned earlier, the scenario is more challenging than SCMI and the results can be improved in the next studies. The confusion matrix can show misclassifications between all classes. As shown in the table, misclassifications between devices of the same brand occur in most cases, e.g., classes 14 and 15 (two devices Y9) have the most misclassifications when they misidentify each other.
As can be seen in Table 11, the time increases in all patch, frame and video levels as the patch size increases.

VI. CONCLUSION
This paper presents a new video database (QUFVD) based on smartphones for source camera identification. The database includes five popular smartphone brands with two models per brand with two devices for each model, 6000 original videos, and 76531 I-frames. The entire database is provided with an evaluation analysis for use by the research community.
The database is suitable for new challenges such as ISCI and for use by deep learning methods. The results show that improvement is essential for ISCI. Although it is not a fair comparison, the Deep Learning method used in our study achieves promising results compared to the results reported by Daxing, which are based on the PRNU method.
In order to improve the video level results, different decision making approaches such as fusion methods based on weighting the score of the classifiers can be applied in the future. We will add a few more tasks to the database where we transfer videos over social media such as WhatsApp and Facebook to study the impact of compression on source camera identification. To detect video tampering, another task adds forged videos to the database. To get more data and new challenges, our database can be attached to other databases. A augmentation method can be applied on the database to have more data to train them. Although we cannot clearly see the effects of codec and resolution in the method, it can be studied by other methods.

ACKNOWLEDGMENT
This publication was made possible by NPRP grant # NPRP12S-0312-190332 from Qatar National Research Fund (a member of Qatar Foundation). Open Access funding provided by the Qatar National Library. The statement made herein are solely the responsibility of the authors.
NOOR AL-MAADEED (Member, IEEE) received the Ph.D. degree in computer engineering from Brunel University, U.K., in 2014. She is currently an Associate Professor with the Computer Science and Engineering Department, Qatar University. She participated in many regional and international conferences and published an important number of research articles in prestigious peerreviewed journals, book chapters, and conferences proceedings. She has improved the relationship between academia and the industry by leading many research projects, both domestically and abroad, totaling over eight million QAR in her fields of specialization, such as image processing, speech, speaker recognition, intelligent pattern recognition, video-surveillance systems, and biometrics. She is a member of the First Batch Qatar Leadership Center, the Current and Future Leaders Program, Qatar University Senate, and other committees. She is also a member of various international associations, such as IET, BA, and IAENG. She participates in activities which connect her to the community, such as working with charities and volunteering in sport events. She received the following awards, such as the Qatar Education Excellence Platinum Award for new Ph.D. holders from Highness the Emir of Qatar, in 2014 and 2015, the Premium Award from IET Biometrics, in 2017, and the Barzan Award, in 2019.
AL ANOOD NAJEEB received the bachelor's degree in computer application from the Sree Narayana Institute of Technology, India, in 2019. She is currently pursuing the M.S. degree in computer science with Qatar University, Doha, Qatar. She is also working as a Research Assistant for Dr. Somaya Al-Maadeed at Qatar University. Her research interests include image processing, computer vision, and machine learning.
AFNAN AL-ALI received the Master of Science degree in computer engineering from the University of Basra, Basra, Iraq. She is currently pursuing the Ph.D. degree with Qatar University. Her research interests include machine learning, AI, computer vision, object detection and classification, and machine learning for health care.