Exploration of EEG-Based Depression Biomarkers Identification Techniques and Their Applications: A Systematic Review

Depression is the most common mental illness, which has become the major cause of fear and suicidal mortality or tendencies. Currently, about 10% of the world population has been suffering from depression. The classical approach for detecting depression relies on the clinical questionnaire, which depends on the patients’ responses as well as observing their behavioral activities. However, there is no established method to detect depression from EEG biomarkers. Therefore, exploration of EEG biomarkers for depression assessments is vital and has a great potential to improve our understanding and clinical interventions. In this study, we have conducted a systematic review of 52 research articles using the PRISMA-P systematic review protocol, where we analyzed their research methodologies and outcomes. We categorized the experimentations in these articles according to their physical and psychological aspects scaled by the commonly used clinical questionnaire-based assessments. This study finds that the negative stimuli are the better identification strategies for evaluating depression through EEG signals. From this exploration, researchers observed that the Neural Connectivity Analysis and Brain Topological Mapping have huge potentials for finding depression biomarkers, and it is evident that the right-side hemisphere and frontal and parietal-occipital cortex are distinct regions to detect depression using EEG signals. For this mechanism, researchers are using many signal processing and machine learning approaches. In the case of filtering, Independent Component Analysis (ICA) is commonly used to eliminate physiological and non-physiological artifacts. Among machine learning approaches, Convolutional Neural Network (CNN) and Support Vector Machine (SVM) showed better performance for classifying healthy and depressed brains. The authors hope, this study will create an opportunity to explore more in the future for EEG as diagnostic tool by analyzing brain functional connectivity for focusing on clinical interventions.


I. INTRODUCTION
Depression is one of the most common mental illnesses, which affects our mood, and it has become a global health The associate editor coordinating the review of this manuscript and approving it for publication was Juan Wang .
concern as it affects millions of lives every year. According to the World Mental Health Survey, every day about 3,000 suicide cases and 60,000 cases are found who may attempt to end their lives [1] and therefore by 2022, we can expect that depression will be the number one leading cause of destructive disease [2] According to the World Health Organization (WHO), more than 264 million people are affected by depression [3] and about 35.8% of suicides occur due to depression [4]. If the level of depression is severe, then the patients may fall into the depth of suicide temptation [5]. According to WHO reports, each year about 30-35% of Major Depressive Disorder (MDD) or severely depressed patients have taken suicide attempts, and about 2-15% of them die due to depression [6]. Although the exact cause of the disorder is still unknown, most researchers have blamed genetic, environmental, or psychological factors for this disorder [7]. Recent studies show that women are diagnosed more than men [8].
Due to the COVID-19 pandemic, the proportion of depressed individuals increased further [9]. Stress from losing jobs, financial crisis, losing friends or family members due to COVID-19, and deprivation of going outside are the main reasons for rising this ratio [10]. A study from the Boston University School of Public Health shows that in mid-April of 2020, approximately 27.8% of U.S. adults showed signs of depression compared to 8.5% before the COVID-19 pandemic [11].
From the 17 Sustainable Development Goals (SDG), the third goal is to ensure healthy lives and promote well-being for all at all ages. The target 3.4 is to reduce premature mortality from non-communicable diseases by one third through prevention and treatment and promote mental health and well-being by 2030 [12]. Suicide has been declared a crime in 25 countries, and 20 countries following Islamic laws have banned suicide attempts [13]. Therefore, early identifications of depression are essential for early intervention to control or reduce this mental illness.
Physicians and psychiatrists mostly use the clinical questionnaire-based assessment as a traditional way of diagnosing depression, which mostly depends on patients' replies and their behavioral activities. Though, the questionnaire is highly prone to human subjectivity that hinders the objectivity of the diagnosis process. As a result, numerous studies have been conducted to advance the better competency of the traditional model as well as develop better replaceable strategies to diagnose depression. Accompanying with the traditional questionnaire-based assessments, people are now developing other strategies to identify depression in patients, for example-visual evaluation through facial expression [14], Heart Rate Variability (HRV) [15], Magnetic resonance imaging (MRI) [16], and regular usage of social media [17]. Nevertheless, these techniques have some drawbacks; such as-visual evaluation takes long term and close monitoring; HRV is a volatile analysis, and body movements can highly affect the outcomes. Even though MRI shows more accurate results among those techniques, this is very costly. Along with these techniques, EEG or electroencephalography is another technique where we can record the electrical activities of the human brain. Depression affects neurologically in our brain, for this reason, researchers from all over the world engaged themselves to find better biomarkers using EEG. From the recent studies, EEG-based depression detection shows promising results as EEG is less complex, cost-effective than MRI, and patient-friendly.
An EEG machine is a device that records the electrical activity of the brain. It contains electrodes that can detect brain activity when placed on a subject's scalp [18]. EEG devices can be both invasive and non-invasive. However, the researchers prefer non-invasive EEG as there is no need for surgical implantation, and is easy to use [19]. As a result, researchers are more focused on such recordings to identify neurological paradigms using advanced signal processing and machine learning techniques. With the advanced technology of EEG recordings as well as advanced signal processing and machine learning approaches, we can classify the EEG signals through which we can identify the various neurological disorders and diseases; depression is one of them. For this reason, depression identification through EEG has shown foresight for future research.
Before 2000, many researchers started their studies of brain function analysis to identify various neurological syndromes. In 1980, the Canadian journal of psychiatry published the first attempt on EEG brainwave activity observation of a 61-years old depressed woman in two stages-firstly, when she was in the depressed stage and secondly, after two months when they reported that she had recovered. Both activities were while the woman was sleeping; although the reports could not show any differences [20].
From this beginning of the EEG-based depression study, the research on finding biomarkers are still going on and showing better promising results. To the best of our knowledge, no successful detection of depression has been made and its biomarkers are still unknown, requiring further exploration from different angles. As a result, the number of research articles on detection of depression biomarkers through brain signal analysis is increasing exponential manners. Till now, we have found few journals the authors tried to demonstrate a clear and conceptualized review explanation between different research articles on depression detection through EEG signals. However, most of the authors focused on overall study on all sorts of emotional and mental disorders which results less study on identifying depression analysis [21], [23]. Fernando et al. explicitly discussed about the EEG features, band power, signal complexity and functional connectivity of all ages of MDD patients but didn't discuss about the signal acquisition, preprocessing and their classification strategies [22]. On the other hand, even though Yasin et al. detailed discussed about the uses, applications and challenges of deep learning for analyzing MDD as well as bipolar disorders, he didn't discuss about the stimuli, data limitation and not found explicit study on analyzing depression in different stages [23]. In Table 1, we have thoroughly discussed about the drawbacks and comparison between those review articles. For this reason, in this study, we have discussed several EEG-based depression identification strategies through machine learning algorithms along with the neurological study and compared each category exploring their findings and limitations. In our widespread VOLUME 10, 2022 study, we have shown a foremost and comprehensive scenario of the advancement of EEG-based depression identification that may help in future research.

II. DEPRESSION AND EEG
Depression can psychologically affect a person and can potentially change one's brain structure. In long-term depression, the repeated stress episodes can damage the human brain over time. The structural change of human brain can range from inflammation, oxygen restriction to cortical shrinkage [20]. In this section, we would briefly discuss how depression neurologically affects our brain and the reason behind choosing EEG as a screening tool for depression.

A. EFFECTS OF DEPRESSION IN BRAIN
The human brain is formed by two cerebral hemispheresone is the right hemisphere, which is responsible for imagination, insight, and creativity and another one is the left hemisphere, which is responsible for logical and analytical thoughts [25].
Each hemisphere has four lobes-frontal, temporal, parietal, and occipital. Most studies have found that in resting state depression mostly affects the frontal lobe or frontal area near the central position [26], [27]. On the other hand, brain activity can detect every movement of the human body within a millisecond. Humans can lie even to themselves but cannot lie to their brain activity. Depression affects three portions of the brain: The hippocampus is responsible for holding memories and controlling the production of a cortisol hormone. It is located in the temporal lobe of the brain. When a person is depressed, his body releases excessive amounts of cortisol and sends it to the neurons of the brain, which causes the hippocampus to shrink as well as slow down the production of new neurons [28].
The prefrontal cortex resides in the frontal side of the frontal lobe, which is responsible for creating memories, controlling emotions, and making significant decisions. Like the hippocampus, at the time of depression, the enormous amount of cortisol exceeds the limit inside the brain which also causes the prefrontal cortex to shrink [28].
The amygdala resides in the frontal side of the temporal lobe and is responsible for enabling emotional responses. In depression, the amygdala becomes large and exposes a high ratio of cortisol which can result in sleep disorders and activity patterns [28]. In Fig. 1, the location of all three depression-affected brain sections has been shown. FIGURE 1. Depression affected areas (adopted from [29]).

B. DIFFERENT EEG RHYTHMS
EEG signals can be used to better acknowledge finding depression biomarkers. The EEG signal is usually divided into five non-overlapping frequency bands-• Delta (0.5 -4 Hz) • Theta (4 -8 Hz) • Alpha (8 -12 Hz) • Beta (12 -35 Hz) • Gamma (>35 Hz) [30] In Fig. 2, these five frequency bands have been shown from the higher frequency level to the lower frequency level. From Fig. 2, the first region is Delta brainwaves whose frequency range is about 0.5Hz to 4Hz. It can be observed in babies and adults when they are asleep. For babies up to oneyear delta appears to be the highest amplitude and the slowest waves [32]. Uyulan et al developed a CNN model (ResNet-50 architecture) where he found 90.22% accuracy with the delta band [33].
After the delta band, the Theta brainwaves have been observed whose frequency range is 4Hz to 8Hz. It can be detected in children up to 13 years and adults when asleep and the activity of these brainwaves is determined as slow activity. This can be seen in diffuse abnormalities like metabolic encephalopathy or some instances of hydrocephalus [34]. Using the Analysis of variance model, Koller-Schlaud et al. found that in the resting state, theta activity at the central electrode position is highly distinguishable to determine the healthy controls and bipolar depressed controls [35].
Alpha brainwaves are typically better seen for 8Hz to 12 Hz frequency on the middle of the edge of the parietal lobe. It is observed when the eyes are shut, relaxed, and fades when the eyes are open or when one is doing analytical work. This is a typical pattern seen in relaxed adults and is better visible after the age of 13 [34]. Using the alpha asymmetry image, Kang et al. found the best performance for the classification model for detecting depression [36].
The second highest brainwaves shown in Fig. 2 are Beta brainwaves, commonly known as natural rhythms, which are mostly observable across the parietal lobes and frontal lobes. It is found to be a dominant pattern for adults with high alertness, nervous behavior or have their eyes wide open [32]. Liu et al. showed the significant connection of the long-distance edge of the Beta band for the MDD patients which has been distributed mostly within the frontal brain areas and between frontal and parietal-occipital brain areas [37].
Gamma brainwaves are considered the fastest than any other brain activity and are responsible for functioning, learning, focusing, memorizing, and blind sensing. When one focuses too much on a certain topic, the gamma brainwaves rise dramatically [38]. In some cases, the highest accuracy of determining the MDD patients has been found from gamma oscillation by using frequency-based features and the accuracy was 91.38% [39].
The results of earlier studies on depression indicate a significant increase in absolute beta power in depression compared to control. The reported changes in depression are in absolute or relative theta frequency band which was more consistent. Other studies found various statistically significant differences in depression compared to healthy controls: significant increase in absolute or relative delta power and increase and decrease in alpha power [40].

III. SYSTEMATIC REVIEW
A review is designated as a systematic study if it comprises the subsequent strategies which consist of an affirmed research question, identification of proper studies, assessment of their quality, and empirical studies with precisely stated outcomes [41].
To demonstrate the consequences of our research, we have reviewed 52 research articles from the past 21 years (2001-2021) which were sorted from 9 different databases-ScienceDirect, Springer, IEEE, PubMed, Frontiers, MDPI, Wiley Online Library, SAGE, and World Scientific Journal via Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) systematic review analysis protocols. As we can see from Fig. 3 that most of the EEG-based depression identification research has been conducted from the year 2011, we included the articles that were published after 2000 for our study. To conduct our search strategies, we have mentioned all the methods that have been used for signal preprocessing, feature extraction and selection, supervised classification models, and their accuracy found through it. Along with it, we have explored the localization of electrode places which will help to demonstrate the most affected brain areas for future research. According to PRISMA-P checklists, there are some stages mentioned below that needed to be considered before evolving a systematic review. From the graph, we can see that the red trend line is going upwards which represents the research on EEG-based depression is increasing day by day.

A. ELIGIBILITY CRITERIA
Eligibility criteria refer to two main characteristics; a study which means the consideration of study design or time frame and another one is reporting criteria which means the consideration of the selection of years or languages [42]. For our systematic review analysis, we have considered articles from four preferred research areas of the last 20 years in English only.

B. INFORMATION SOURCES
In this section, all of the intended research sources have been considered; for example-research databases or the authors' literature, or certain documentation [42]. In this case, we have overviewed 49 research articles from 9 research databases-ScienceDirect, Springer, IEEE, PubMed, Frontiers, MDPI, Wiley Online Library, SAGE, World Scientific Journals, and other 3 research articles from other different sources.

C. SEARCH STRINGS STRATEGY
Before providing the research article into the databases, every study provides some keywords which help others to find their research article more effortlessly. However, if this search strategy complies with planned limits or preferred words, then the search will be well-constructed. Not through the keywords, one can search the preferred articles through the selected words in the title or abstract of the research articles [42]. Our search string consists of words with the help Boolean operator whether we want to have the word or not. Commonly used Boolean operators are-AND (must be included), OR (May or may not be included), NOT (never be included). Table 2 shows the advanced search string strategy in the digital library of 9 databases.

D. STUDY RECORDS
Study Records consists of three stages-data management, selection process, data collection process. The data management process can be executed by the formulation of some Research Questions (RQ) which will guide the stages from the research methodology to reporting [42]. As our main goal is to review the identification of biomarkers of depression as well as the overview of the related methodology, we need to formulate the research questions from these research articles for better understanding. Table 3 shows the outlines of the research questions with the numbering sequence of RQ.
We have selected 9 questions for the searching strategy and each question demonstrates the comparison of each experimental design. Each experimental design differs according to each depression category, although most of the articles summarized in this study considered the depression ideation for MDD and clinical depression. We have excluded other depression categories such as-post-traumatic, adjustment, situational and bipolar depression. For each experimental  design, subject criteria and preferences as well as the EEG device category have been analyzed, as the depression identification accuracy may vary according to the number of datasets and the number of electrode placement and their location in the brain region. In most of the articles, the authors showed the comparison between models based on different machine learning algorithms and explored the possible findings within them.
The selection process refers to the screening of inclusion and exclusion criteria while searching for preferred research articles from electronic databases. Through the use of Boolean operators in advance search of each database, we can easily determine whether the study is related to our review study or not. In Table 4, we have mentioned the inclusion criteria with the sequence of IC and exclusion criteria with the sequence of EC.
Although we have selected the research articles by sorting the eligibility criteria, there have been found some piloting forms or duplicate or research articles still under review that need to be eliminated before selecting for the review. We have sorted 1,446 research articles from 9 databases through the search string selection. Along with these selection procedures, we have found another 29 articles from other sources while randomly searching for more information.
We have found 61 articles that are duplicated and excluded those in 1st section screening. For the 2nd section screening, we have eliminated another 1,212 articles through inclusion and exclusion criteria. All 29 articles found in other libraries while random search, fulfilled the inclusion and exclusion criteria and sorted 11 articles from them and 1446 research articles from 9 different databases, fulfilled the inclusion and exclusion criteria, and sorted 191 articles. The whole search strategy has been mentioned in Fig. 4 by four stages-identification, screening, eligibility, and finally included research articles.  In the eligibility section, we have found 202 articles related to depression identification through different machine learning algorithms. Later, we have excluded another 150 articles due to insufficient information as well as topics that may be excluded for different reasons such as statistical analysis and treatment through neuroimaging-based analysis. After the eligibility section, we have found a total of 52 research articles for the final full texted review.

E. DATA EXTRACTION
A data extraction format has been utilized to standardize data derivation from the selected research articles to lessen the biases of results and generalize the process. A dynamic outline was employed in categorization while the data were extracted. However, a predefined categorization might limit the possibilities of emerging categories that result from relevant data. As the goal of this study is systematically evaluating the selected studies, we have considered the experimental design and analysis that can determine the biomarkers of depression.
In Table 5, we have shown the number of sorted articles after the second screening and eligibility section respectively. In Fig. 4, we have summarized the individual selection process of the literature search process of 52 research articles from the 9 databases and other sources.

IV. RESULTS AND ANALYSIS
One of the world's major health concerns, diagnosing depression in the early curable stages is very important and may even save the life of a patient. The current recognition of depression depends on the psychologist's questionnairebased assessments. Recently, researchers have devoted themselves to depression biomarker identification through EEG signals, the EEG spectra are non-invasive and give more sophisticated results than other technologies. This EEG data has been fed into different classification or neural networkbased models to determine the accuracies to identify the depression biomarkers.

A. SEARCH RESULTS
A detailed search flow and process diagram have been illustrated in Fig. 4. A total number of 1475 articles have been preliminarily selected from 9 databases and other additional sources. After excluding 61 duplicates, the remaining articles then have been screened through inclusion and exclusion criteria. At the end of the preliminary screening process, 191 articles have been selected for full-text evaluation. For the final eligibility process, 150 articles have been excluded as there 47 articles are found that are based on statistical analysis; 39 articles show different drugs and treatment effects on less than 5 subjects and 64 articles demonstrated the mental effects through neuro-imaging only. According to Google Scholar, these articles have been cited around 1114 times (as of June, 2021) and each of the articles demonstrated a certain strategy and some classification methods to identify this mental disorder.
In our eligibility criteria, we have considered our search strategy up to the last 21 years, and from these 21 years the most of the research has been done in 2020 and the number of researches has been increasing day by day. In Fig. 3, we have graphically shown 52 articles that are published between 2001 and 2021, and from there, we can observe that depression is the most common scenario that is being highly researched nowadays and the identification of biomarkers for this mental illness is necessary at an early stage.
If we conclude these 52 articles from 9 databases and other sources according to the first author's affiliations, we can see that most of the research of depression identification through neural analysis has been done in the countries of Asia, Europe, and North America. Other regions are relentlessly trying hard to conduct their thoughts in this field and those ongoing researches along with the research from other databases have not been summarized here. Fig. 5 shows the research around the world, where the number of researches from highest to lowest has been indicated by high red intensity to low color intensity color. The ash color indicates ongoing research regarding the identification of depression until now.

1) DEPRESSION SEVERITY
Generally, depression severity has been determined by the thoughts and behaviors of a depressed person. For this reason, depression can be categorized into three stages-1. Mild depression 2. Moderate depression 3. Severe depression Mild depression occurs due to lack of motivation, concentration or daytime sleepiness, or fatigue. As it's a mild case to person; however, if it is neglected, it may rise in the severe case which may lead a person to suicide [5]. For mild depression, the authors designed a model that can distinguish a mildly depressed person from a healthy person. However, in that case, they used the Chinese Facial Affective Picture System as stimuli where the authors used some certain expression-based human facial picture block along with the neutral facial picture block of that person for a certain time.
On the other hand, Zhu et al. used an eye tracker along with the EEG recording [43].
Moderate depression refers to the case between mild and severe depression. Sometimes, moderately depressed patients may be identified by showing less productivity along with having the feeling of worthlessness. Mohammadi et al.
showed depressed patients that are identified through BDI-II Clinical Questionnaire-based Assessments and the accuracy rate is about 87.5% [44].
If depression level within the patient is high enough that a person may feel stupor or hallucinate or may think to do suicide, then this case is known as severe depression or Clinical Depression, or MDD. In some articles, authors demonstrated different algorithms to determine the MDD patients. The highest accuracy for determining the MDD patients has been found 99.72% [45], [46].
From our study 21 research articles have been found, where authors determined the presence of depression within the patients without determining the severity level. Each of the research areas of reviewed articles according to the depression category has been mentioned in Table 6. As we know that some of the brain areas are responsible for certain types of seizures, it can also affect mood and result in depression. For this reason, in this systematic review, we have also reviewed 5 research articles that have shown the presence of depression due to stress and epilepsy.

2) PARTICIPANT CHARACTERISTICS
The standard deviation of subject participant according the research articles is 53.66 (max:265 and min:10). Among them, an article [89] published in 2020, used a reference dataset of 246 unique patients collected by Temple University Hospital EEG Corpus and developed a CNN Model to determine the biomarkers. In most cases, the average age for both healthy and depressed subjects is taken from 21-55 years. The authors tried to keep a balanced number between healthy subjects and depressed patients. However, the ratio between male and female subjects is about 61:39. If we account for all of the subject participants from our review and take a look at a ratio of a male between healthy and depressed subjects, we can find about 48.06% were depressed and for the female category, about 56.11% of subjects have been found as depressed subjects. Fig. 6 shows the detailed participation of both healthy and depressed patients along with the gender specification.

3) EXPERIMENTAL PROTOCOL
A precised EEG experimentation is performed in six stages: 1 st stage is data acquisition where the EEG data will be captured through an EEG device from a certain group of control participants and the affected patients considering their age, gender, and other physical conditions. 2 nd stage is signal preprocessing where the raw data will be preprocessed and removed both physiological and non-physiological artifacts through different filters and algorithms. The 3rd stage is feature extraction and selection where different features will be generated through feature extraction and reduces the dimensions through feature selection. In the pattern classification stage, a supervised classification model will be developed and compared with the results from traditional questionnaire assessments of those participants to determine model accuracy.
Along with the model accuracy, the outcomes also determine the ifunctional connectivity mapping within the brain region to observe the statistical relation of brain functional activity between brain regions. The detailed process of experimentation has been discussed in the below section with a typical block diagram in Fig. 7.

4) SIGNAL ACQUISITION AND ELECTRODE PLACEMENT
Signal acquisition is a process of converting the collected sample data into numeric values from the electrical activities of the brain. The electrodes attached with the EEG device convert the physical parameters to electrical signals and Analog-to-Digital Converter allows the signal to convert to digital values from electrical signals [91]. The whole vertebrate cerebrum (brain) is separated into two sides of hemispheres-the Left Hemisphere and the Right Hemisphere. Each hemisphere has mainly four lobes-Frontal, Parietal, Temporal, and Occipital. The center part between two hemispheres is known as the Central lobe [92]. To denote each location of electrodes in the scalp, a numbering position has been established. The Placement for the electrodes can be either 10-20 systems where ''10'' and ''20'' refer to the 10% or 20% inter-electrode distance or 10-10 systems where the distance between each electrode will be 10% [93].
The letters F, T, C, P, and O stand for Frontal, Temporal, Central, Parietal and Occipital respectively. They are used to identify the brain lobes and placements of the electrodes on the scalp surface. The point z refers to the midline of the brain. In the 10-20 electrode standard, the smallest number is closer to the midline and vice versa [94]. The electrodes used in this device could be in shape of cup or disc and the liquid gel called electrolytic gel material used in wet electrodes are made of silver or silver chloride (Ag / AgCl). This material is used as a conductor between the scalp skin and electrode [95].
Most of these studies used EEG data from the individual modality and used a large number of electrodes. Out  as Central. However, to reduce the experimental cost, some of the authors use fewer channels of electrodes i.e. 2 and 3 channel EEG Devices. In those cases, authors could analyze the data that are collected only from the frontal brain areas. Among 52 research articles, 3 articles have been found where the authors captured the data through a 128-channel EEG device that can cover whole brain areas as well as analyze more competitive and complex data.
Most of the studies were conducted through 19-channel EEG devices although they didn't mention the EEG brand. To our best knowledge, there is no renowned EEG Headband that has exactly 19-channel electrodes that cover the whole brain region. In that case, we recommend using a 32-channel EEG device that may assist us to cover the whole brain areas as well as can have better functional connectivity throughout the whole brain. The research articles according to the use of the number of electrodes as well as the brain area coverage according to the number of electrodes has been shown in Table 7. On the other hand, we have found that 23% from our reviewed research articles have been demonstrated their algorithm through online EEG database of depressed patients. As we all know EEG Data collection is cost-consuming and not an easy task for researchers as it takes hospital collaboration to collect patient data with a proper consent letter. One clearly cannot expect to have always the same health conditions and the biological effect under which EEGs are recorded. On the other hand, there may have the limitation of clinical settings (influenced by the environment, equipment, electrode location, noise, etc.) as opposed to the tightly controlled research setting. Therefore, authors are suggested to use online open datasets [98]- [101], which would become a testing ground for the machine learning algorithms. The detailed EEG data recording procedure used questionnaire assessments has been mentioned in Table 8.

5) SIGNAL PREPROCESSING
The raw signal collected from EEG devices has poor resolution due to poor signal-to-noise ratio (SNR) [102] and contains two types of artifacts. One is physiologically generated by the patient himself i.e., muscle or body movements, eye blink, heartbeat, etc. and another one is non-physiological artifacts generated from machinery i.e., the placements of electrodes, surrounding noise, or device error [104]. To determine the preferred result from the project, all the artifacts from raw data should be removed first. The usage of different artefacts removal techniques mentioned in the reviewed articles has been shown in Table 9. It is suggested by the researcher that the subjects should remain in resting position i.e. close the eyes for 5 minutes and stay still to avoid the external physiological noise. However, this technique cannot fully diminish the noise [38], [42]. For this reason, authors have been using different Bandpass filters and algorithms to remove the noise from raw EEG. Mumtaz et al. used Multiple Source Eye Correction (MSEC) to remove the undesired signals [102]. Nolan et al. used the FASTER algorithm to remove the electrooculogram (EOG)-based artifacts [105]. In most of the cases, Authors use band pass filters with 1Hz (for low band pass filter) and 40Hz (for high band pass filter). Saeedi et al. applied Band pass filter cutoff frequencies at 0.1 Hz and 70 Hz to remove artifacts [39]. To reduce the ocular or EOG based artifacts, authors suggested that to use ICA or FastICA algorithms [43], [67], [106]. Peng et al. used the TrimOutlier plugin to remove Electromyography (EMG), EOG artifacts, and Artifact Subspace Reconstruction plugin to remove EEG epochs [79]. To remove the background noise, a common average reference (CAR) was applied [107]. Thoduparambil et al. applied Z-score normalization before passing the signal to the model to eliminate the amplitude scaling problem and offset the effect from the EEG data [55].
The demonstrated method is used as a pre-processing process as it has nonstationary nature [76] and processes the time series data and reshapes the complete information with fixed sized window. It provides the possible necessary information at a given point of time to have the right prediction through the demonstrated model [83]. Wenya et al. and Weidong et al. segmented the EEG data into non-overlapping epochs by a time window of 10s [37] and 2s [53] respectively. However, uses of sliding windows without overlapping can lead a devasting problem into functional connectivity or topological mapping [85]. As a result, authors preferred to set a fixed time window with n% of overlapping, where n is the percentage of overlapping. Ayan et al. performs FFT on the raw EEG signal using a 256-sample window size with an overlap of 75% to remove any phase shift effect in the data and increase the number of data points without using any other data augmentation techniques [85]. As The detailed experimentation of EEG-based depression detection through machine learning algorithm has been illustrated in Fig. 7.
Among these 52 research articles, 3 articles have been found where the authors have designed a model that can distinguish the mildly depressed person from a healthy person. However, as we have mentioned earlier, they used a certain stimulus where the authors used facial expression to picture block along with the neutral block of that person for a certain    [43]. The detailed Feature extraction techniques according to research articles has been given in Table 10.
Feature selection is used for reducing the irrelevant or removing features from a data set of lots of features. The key difference between feature extraction and selection is that feature extraction extracts the feature from a vast dataset and feature selection selects the most useful data from the dataset. Most of the researchers preferred to use t-test and generic search or Rank based search algorithms for feature selection procedure. However, Peng et al. adopted the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient to measure the correlation between the connection of features with the classifier [79].
Hanshu et al. and Jing et al. used Minimal-Redundancy-Maximal-Relevance (MRMR) and LinearForward Selection (LFS) techniques as feature selection strategies. In some cases, the Authors used Linear Discriminant Analysis and Principal Component Analysis to convert the high dimensional feature into lower dimensional space [106]. However, some of the authors suggested to use deep learning as it does not require the employment of feature extraction, selection or reduction techniques. The uses of different feature selection techniques have been mentioned in Table 11 according to the research articles.

7) CLINICAL QUESTIONNAIRE BASED ASSESSMENTS
Many well-known questionnaire assessments have been used to detect the depression category. According to the score calculated through the patient's interview, psychiatrists can detect depression by the degree of severity (mild, moderate, or severe). Some of them are discussed below-

a: BECK DEPRESSION INVENTORY (BDI)
This technique is widely used for patients from age 13 to 80 to determine depression severity and behavioral manifestations. The test takes almost 10 minutes and the score from 0 to 63 is calculated from 21 self-report items. The range for depression category is mild: 0-9, moderate: 17-29 and severe: 30-63 [110].

b: CENTER FOR EPIDEMIOLOGIC STUDIES DEPRESSION SCALE (CES-D)
This screening is used for preliminary check-ups for depression. There are a total of 20 questions and it takes about 20 minutes ranging from 0 to 60. Each question is moderated for 0 to 3. O means rare, 1 means mild, 2 means moderate, and 3 means severe [111].

c: HAMILTON DEPRESSION RATING SCALE (HAM-D)
This task assessment for identifying the depression severity contains questions from 21 items scoring from 0 to 21. If the score is from 10-13, then it is detected as mild; if the score is from 14-17, then it is detected as moderate and if the score is from 10-13, then it is detected as severe depression category [112].

d: PATIENT HEALTH QUESTIONNAIRE (PHQ-9)
In most cases, the Clinical health care center or psychiatrists prefer the PHQ-9 technique. It takes about 2 to 5 minutes to complete the whole task. The severity measurement of this task is 0-4 as none, 5-9 as mild, 10-14 as moderate, 15-19 as moderately severe, and 20-27 as severe cases [113].

e: DIAGNOSTIC AND STATISTICAL MANUAL OF MENTAL DISORDERS, FOURTH EDITION (DSM-IV)
The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition-DSM-IV-is the official manual of the American Psychiatric Association. Its purpose is to provide a framework for classifying disorders and defining diagnostic criteria for the disorders. Most of these diagnoses fall in the category of Mood Disorders, as specified in the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision (DSM-IV-TR; APA, 2000). However, it is also used for diagnosing Adjustment Disorder with Depressed Mood. Additionally, people with a variety of other psychiatric illnesses who are susceptible to depression can be diagnosed through this criterion [114].
The score determined from the clinical questionnaire assessments will be later compared and trained on the classification model developed from EEG data. In Fig. 8, we have mentioned the research articles with the use of different clinical questionnaire-based assessments for diagnosing this mental illness.

8) MACHINE LEARNING APPROACHES
As the EEG signal is more complex, it is not possible to get the preferred information of discrimination from the raw signal. For this reason, to distinguish the features of depressed patients from healthy patients, we need classification models which may be supervised by different deep and machine learning algorithms [115]. A classification model is a learning approach where a model is being observed through training by some input data and then fed some testing data to test the prediction accuracy [116]. Lots of classification methods have been familiar with the learning mechanism. However, the results of each classifier will not match each other. Even for the different features, a separate classification algorithm is being used to get higher accuracy [117].
Before 2015, the studies were dependable upon the predictive analysis and developed their model by Logistic Regression (LR) [72] [97]. It is a machine learning algorithm that is based on the concept of probability [118]. Along with the LR, authors found more suitable results with Support Vector Machine (SVM) to observe the difference [67], [79], [43]. The difference between regression and SVM is that the SVM maximizes the closest support vectors margin; on the other hand, logistic regression maximizes the class probability [119]. This algorithm cannot be applicable for long data terms and data which has high noise and overlapping [120].
However, neural networks are more effective than Linear classifiers for more flexibility and are more susceptible to overfitting [121]. Studies found more effective results by classifying the features by Long Short Term Memory (LSTM) algorithm as it can make more preferable predictions on time series data [122]. It is the extension version of Recurrent Neural Network (RNN) [123].
On the other hand, CNN is much faster than the RNN model as the training time for CNN is less than the time for the RNN model. CNN model does not need the relationship between hidden vectors which takes less time to feed-forward and back propagate [124]. In most cases, authors found higher accuracy by applying CNN. Li et al. proposed a CNN Model for both Right Hemisphere (RH) and Left Hemisphere (LH) [47]. Moreover, Saeedi et al. developed a combined architectural model with One-dimensional CNN with LSTM and observed the highest accuracy with combined frequency bands [39]. On the other hand, Feed-Forward Neural Network (FFNN) classification model achieved significant results in terms of learning the relationship between independent variables [125]. Probabilistic Neural Network (PNN) is another feed-forward neural network where it estimates the probability density functions (PDF) of the random variables [126]. Oliver et al. performed PNN model and achieved 99.5% accuracy with 10-fold cross validation [61].
Cai et al. proposed a K-Nearest Neighbor (KNN) classifier in the fusion of positive and negative audio stimuli which is more suitable to distinguish depressed and normal groups [66]. An autoRegression model has been developed in the article [81] to calculate the power spectral density for a series of electrodes; 7 temporally consecutive three-channel frames as a sequence to characterize temporal information of EEG signals. Although the CNN model produces higher accuracy than the KNN model [127]. Rodríguez-Ruiz et al. proposed a CNN model named the HybridEEGNet model where every data matrix is fed into the model as a new independent data sample [45]. The dataset ultimately used in this study includes a total of 1750 data samples. Probabilistic Neural Network or PNN is feedforward based neural network, where the probability of mis-classification has been minimized [128]. Convolution LSM or ConvLSTM is another recurrent neural network-based approach for identifying depression where the convolutional structures are in both the input-to-state and state-to-state transitions [129]. In Fig. 9, we have mentioned the research articles that we can observe that most of the authors found the highest accuracy in the CNN model.

9) DIFFERENT MEASURES TO EVALUATE THE CLASSIFICATION PERFORMANCE
Classification Accuracy cannot always lead to determine the actual performance of the model if there are unequal number of observations in each class [130]. As a result, researchers have been using different measures to evaluate the performance of classification models. Most of the common measures are- Confusion Matrix can give the better understanding whether the classification model is doing right or not and what types of error is making. There are two types of errors-Type 1 error and type 2 error. Type I error identifies whether the model truly detect the actual healthy participants and Type II identifies whether the model truly detect the actual affected participants [131].
The matrix generates the results by four terms-TP, FP, TN, FN, which refer that. TP = True Positive = Number of depression data correctly classified as they are. FP = False Positive = Number of normal data classified as depressed = Type I error.  Accuracy denotes how many participants are actually identified as healthy and depressed participants and it is referred as- Most of the authors from our reviewed articles mentioned only accuracy of their demonstrated classification model. However, along with it, other measures help to identify in which position it lacks its probability of error identification.
Precision denotes how many of the participants detected as depressed are actually depressed and it is identified by- Sensitivity or recall denotes how many depressed participants are detected by the system and it is referred as- Specificity denotes how many healthy participants are identified as healthy by the system [132] and it is identified as- Along with the accuracy, sensitivity and specificity, some of the authors also calculated F1-score or F-score as it is a harmonic balance of precision and sensitivity value. As a result, it could constitute a better classification performance metric in cases of uneven class distribution [133]. It is identified as- The percentage of mentioning different kinds of measures except accuracy has been shown in Fig. 10.

10) VALIDATION PROCEDURE
The classic approach to test the demonstrated classification model is to split the whole data into training and testing datasets. However, it is essential to validate the classification model before testing it with the data that has never seen before. Validation data is not similar to testing datasets; instead it is a sample of data that held back from the training data to estimate or tuning the model performance [134]. One of the easiest ways to validate the model is random splitting technique, whereas the whole datasets is split into three parts: training datasets, validation datasets, and testing datasets. Depending on the data availability, most of the common preferences are 70-20-10 [103], 70-15-15 [135], and 80-10-10 [55].
However, if there is insufficient dataset, then the suitable method is to use the k-fold cross validation [136]. Cross validation is basically a resampling procedure to determine the model performance with insufficient data and K-fold cross validation is a procedure where there will have specific value k (it could be number = 2,8,10,24 etc.), where the whole dataset is subdivided into k equal sized sub datasets. The validation process is then repeated k times with each of the k sub datasets [137]. Most of the cases, authors preferred to demonstrate their algorithms using 10-fold cross validation for better accuracy rate. However, Henjing et al. demonstrated his 1 st and 2 nd algorithm with 5-fold cross validation and achieved 98.97% [63] and 98.81% [56] respectively.
Another type of k-fold cross validation method is leave-one out cross validation when there are small datasets or when there need to have an accurate estimation of demonstrated model without focusing on higher computational cost [138]. Mohammadi et al. utilized all the subjects except the last one to train a model, and the remaining subject is kept for the validation [139].
Another validation process, walk forward procedure has been used by Kumar et al. to determine the robustness of trading strategy. In some cases, authors used two different validation for single dataset. For example-the whole dataset is first randomly split into two datasets, one is for training and another one is for testing. Later the training dataset has been validated with k-fold cross validation technique [55], [135] or leave-one out cross validation technique [140] to help to increase the model performance. In Fig. 11 shows the uses of different validation procedure in our reviewed research articles.

B. CLASSIFICATION MODEL FINDINGS
Our umbrella review has provided the up-to-date overview and synthesis of machine learning-based approaches through brain signals for identifying depression. In the section below, we have demonstrated a summary of the utmost findings of this research where we have discussed not only the classification model accuracy but also the accuracies according to frequency bands and hemisphere.

1) CLASSIFICATION MODEL ACCURACY
Most of the studies secure the percentage of accuracies within 90-100. In Table 12, we have summarized the classification accuracies according to research articles. According to the article [66], Cai et al. proposed a model with the singlechannel electrode and found 92% accuracy with the Alpha frequency band. However, as they used two channels and the proposed method was validated with little patient data, they didn't find it reliable. In

2) CLASSIFICATION ACCURACY ACCORDING TO FREQUENCY BAND
Most of the research articles have shown the justification of multiple classification accuracies according to different frequency bands. In Table 13, we have shown the research article list where they have focused on frequency bands (Alpha, Beta, Theta, Delta, and Gamma). However, most of the research articles found the highest accuracies using the classification for Alpha frequency band [72], [67], [97], [74], [73], [76], [51], [43].

3) DIFFERENCES BETWEEN LEFT HEMISPHERE AND RIGHT HEMISPHERE
Although depression causes abnormality in the whole brain, the right hemisphere shows more accurate and better performance in the case of detecting depression and its severity. Thoduparambil et al. proposed a 13-layered CNN model along with the LSTM algorithm, through which they found the accuracy of the Right Hemisphere (RH) is 99.07% and the accuracy of the Left Hemisphere (LH) is 98.84% [55]. However, in that case, they captured the EEG data through 64-channel EEG data. The derived model accuracy has been VOLUME 10, 2022 shown in Table 14. Later, Betul et al. proposed another CNN model with 10-folded cross-validation and achieved 99.12% accuracy with a 4-channel EEG device for the right hemisphere. On the other hand, Sandheep et al. achieved higher accuracies through the same CNN model and observed that the lesser the data used for training, the lesser was the classification accuracy, sensitivity, and specificity of the network [78]. On the other hand, Hesam et al. developed an SVM algorithm-based model as it mapped train data by RBF as kernel function in a higher-dimensional space that set a hyperplane of margins between depressed and healthy control groups.

C. COMMON CHALLENGES IN DEPRESSION STUDIES
Although each method mentioned in 52 research articles has different findings, authors from more than 20 articles have mentioned the limitation of this designed protocol. Most of the common challenges of the machine learning technique are uncertainty-the ratio of data segmentation for trailing the model may not be appropriate or there may be a limited sample size [147]. In most cases, the authors mentioned accuracy degradation due to limited datasets. However, data limitation is not the only case, sufficient electrode channels and their placement, noisy channels due to setup preferences are also mentioned as challenges after demonstrating the model efficacy. In the section below, we have discussed the challenges that the authors mentioned in their article facing while designing their experimental protocol.

1) DATA LIMITATIONS
The small sample size is one of the challenging issues faced by most studies related to depression identification through EEG datasets [51], [43], [67], [72], [75], [81]. Even so, the research has been ongoing to achieve better accuracy with the small sample size and it is hence not easy to draw definite conclusions. For machine learning techniques, it is imperative to test the model through a high proportion of training, validation as well as data diversity. If the sample data collected from the experiment is low, the model will behave biased for the new dataset [35]. However, the sorting of depressed participants is quite hard that causes the low availability of data. Most of the research shows the data availability as a common limitation and to avoid these circumstances they used public data to first train up the model with public online data. In that case, the health condition for each participant while recording the data has not been considered which results in the lower accuracy rate for the model [58]. The data type along with the subject participation according to research articles has been shown in Table 15.

2) PRESENCE OF ARTEFACTS
To achieve a better dataset, signal noise needs to be reduced from raw EEG data in the signal pre-processing section. Although in some cases, the default MATLAB EEGLAB Toolbox is being used to eliminate noise, it may create a disadvantage for desired filtered signal [148]. From 52 articles, authors of 13 articles mentioned using EEG devices having less than five electrodes which cause noisy and poor quality of the signal [63]. The research article according to the number of electrodes has been given to Table 7. However, they recommended recording the brain signal until they can ensure having a good quality of the brain signal for the experimentation [66]. In some cases, they used musical stimulus that highly augments the comparison between the control and depressed groups. In [59], [90] articles, authors mentioned that while recording the subject data, the participants were suggested to keep their bodies in a stationary position, closing their eyes to avoid noise. As a result, the accuracy rate achieved from those models was quite low. On the other hand, for noise reduction, Kaur et al. used Matlab VMD toolbox, which results in noise sensitivity [52]. Even though they could determine the noise from a high value of alpha, they couldn't extract the noise from lower alpha. Furthermore, the model accuracy can be affected due to not considering the medication status of the participants [35], [62].

3) CLASSIFICATION MODEL OVERFITTING
Overfitting can result in a very good performance on training data, but very poor performance on testing data, and poor generalization to independent datasets, which may be caused by small datasets [149] and high variance [65], [82] features and complex models with too many parameters. In most cases, cross-validation, mostly the Leave-One-Out procedure, is a common approach to control for overfitting and provide a fair performance evaluation [87]. However, most studies prefer to employ a dropout layer with a certain dropout probability to avoid overfitting [62], [53]. However, it is recommended to use PCA with Gaussian Kernel to reduce input data dimensions to avoid high dimensionality problems.

V. DISCUSSION
Although we have discussed above the challenges the authors had faced while doing research, this review of the articles may provide future directions for a better understanding in terms of psychology as well as depression biomarkers. This section summarizes the overall progress obtained from this review as well as a recommendation for future researchers which may help reduce the complications.
The traditional diagnosis of depression presently used by the clinical physicians and psychiatrists mostly depends on patients' replies to questionnaire assessments and their visual behaviors. In that case, EEG signals can be utilized to have a better understanding of depression identification mechanisms rather than human-dependent perception. Not only is there an accurate detection of depression, but a more in-depth analysis of current screening techniques can be done.
A. DERIVED STIMULI EEG signals are non-stationary and highly complex signals which require a computerized system for their monitoring and analysis. In some cases, researchers used external stimuli such as-emotional notes or pictorial stimuli, music, video. Negative stimulus aggregates their emotional states that may assist with easy identification; on the other hand, creates a biased identification for the differently staged depressed patients and creates a biased i.e. patients focus more on negative stimuli than positive stimuli and produce prominent responses for the negative ones. The detailed derived stimuli have been shown in Table 6.

B. CLASSIFICATION MODEL
From the analysis of the machine learning section, we can notice that for the small number of datasets, SVM performs better; however, if the number of datasets increases, the training time for the algorithm scales super linearly with the number of data points which results computationally infeasible for large datasets [150]. In that case, researchers prefer to use CNN. Along with the traditional machine learning application, ensemble models became more popular and found better accuracy compared to other models.
A significant result has been found in the ensemble learning model where a deep forest transformed the original features into new features [67]. Few researchers found more satisfactory outcomes using CNN-LSTM model architecture where the input time series data first fed into a series of convolutional layers to provide a satisfactory feature map and then these features fed into LSTM layers to get the possible temporal information [2]. Convolution LSM or ConvLSTM is another recurrent neural network-based approach for identifying depression where the convolutional structures are in both the input-to-state and state-to-state transitions [129].

C. FUNCTIONAL CONNECTIVITY
Functional connectivity is a statistical concept that defines a temporal coincidence of spatially distant neuro-physiological events where two regions are considered to show functional connectivity to show the statistical relationship of different measures of brain region activity. Those activities can be visualized and recorded through electrodes placed on the scalp. The novel approach presumes the change of pattern for different activity with placing the electrodes in a certain region or may stay in the same network if their functional behavior is consistently correlated with nearer electrodes. It concurs with the intuitive notion that when two things happen together, these two things should be related to each other. By relying very little on a priori assumptions, functional connectivity analysis reflects a straightforward, observational measure of functional relationships [151]. Liu et. al. found 13 significant long-distance edged connections in the delta band within temporal and parietal regions and 43 significant long-distance edged connections in the beta band within frontal regions and frontal and parietaloccipital regions [37]. Peng et al. used 128-channel EEG and found different increased and decreased connection densities between depressed and healthy controls for different frequency bands. For the full frequency band, the distribution of connection densities resides in frontal, parietal, and temporal regions [79]. Xie et al. set the threshold adjacency matrix to 0.04 and demonstrated 31 small head models on entire large head models of brain functional connections. He noted that the functional connection between F4 and other electrodes is stronger than F3 and other electrodes in depressed patients [88].

D. BRAIN TOPOLOGICAL MAP
Brain topological mapping is another novel approach for categorizing emotions from a brain map and its implementation is a real-time tool for the precise assumption. Recent studies show interest in mapping brain topography along with the classification accuracy as it may provide successful interdisciplinary modeling, which is a generalized classification algorithm of different mental states [63], [67], [82], [86]. Not only it distinguishes a certain group from healthy control, but it can also enliven a vast number of brain activity across all parts of the brain. It is a visualization of brain activity distribution and the color of the topological gap is designed by the interpolation of voltage value for each electrode placed in the human brain scalp. Through this, we can assume that a voltage value can be changed smoothly from one electrode to its next electrode. It is an assumption of interpolation of what activity could look like at all of the points in between all of these electrodes. Wan et al. found the abnormalities in the prefrontal cortex region in depressed patients and assumed the fact that the prefrontal cortex is mostly related to the nerve center of human thinking and behavioral regulation, which may be related to the changes in the prefrontal cortex in depressed patients [82]. For three major EEG rhythms, Li et al. illustrated topological maps with the brain activity value and power spectral density and showed the most significant difference within alpha and beta frequency bands that resides in the frontal, temporal and parietal-occipital brain regions [67]. On the other hand, Hengjin et al. demonstrated 3D topological mapping with the CNN model and found that the temporal region is the most vital brain region for showing significant difference between healthy and depressed patients [63].
In Table 16, we have listed research articles that have used different visualization techniques other than the ROC curve (receiver operating characteristic curve) for classification accuracy to determine the comparison of brain activities between healthy and depressed groups.

E. BRAIN REGIONS
Researchers focus on finding the correlations between the presence of depression and the EEG patterns of different brain regions of depressed patients. As mentioned in earlier sections, along with the classification accuracy, researchers have constructed the brain topological mapping and brain functional connectivity matrix for all of the electrodes in certain regions on the scalp that allows us to get more discriminating information. Although EEG synchronization patterns change in the entire brain region for each frequency band, the degree of change is different [79]. The most significant result has been observed from the right-side hemisphere compared to the left-side hemisphere and frontal and parietal-occipital cortex compared to the other regions.

F. GUIDELINES FOR DEPRESSION BIOMARKERS IDENTIFICATION
From our observation through this work, we can say that depression identification through EEG signals could make a revolutionary change in near future. From our reviewed research articles, we have found that researchers are more focusing on identifying possible biomarkers through brain functional connectivity analysis that helps to presumes the change of pattern for different activity of a depressed patient.
For signal acquisition technique, it is better to use 32-channel EEG device with the help of 10-20 systems as it can cover whole brain areas, which will help researchers to analyze the brain functional connectivity. According to the research articles, it is evident that right-side hemisphere and frontal and parietal-occipital cortex are distinct regions to identify the depression using EEG signals. As a result, it is better to analyze the depression by placing the electrodes in those regions, which can help the future researchers to analyze more efficiently. Independent Component Analysis (ICA) is commonly used for filtering the both physiological and non-physiological artifacts. For the insufficient datasets, k-fold cross validation can be used. Among machine learning approaches, Convolutional Neural Network (CNN) and Support Vector Machine (SVM) showed better performance for classifying healthy and depressed brains. In most cases, cross-validation can be used for controlling the overfitting. On the other hand, PCA with Gaussian Kernel can be used for reducing input data dimensions to avoid high dimensionality problems.
All these techniques will help to motivate the future researchers for analyzing the depression through EEG signals more efficiently and the recent studies show that it can be one of the more useful as well as effective techniques than other screening tools and hopefully it may help future clinical experimentation.

VI. FUTURE SCOPE
The origin of the depression is still unknown to us and researchers around the world are trying to find out the reasons as well as develop better possibilities to cure this disorder through various technologies. Despite the limitations, the identification of depression biomarkers through EEG signals using machine learning algorithms is a more sophisticated process as well as more patient-friendly. Even though psychiatrists are using traditional ways of determining depression i.e. clinical questionnaire-based assessments; it will be helpful to have a better understanding as well as securing the assessment value up to its stages. As a result, to avoid overfitting, studies need to be more enriched with a high amount of sample data. The amount of sample data can be increased by increasing the subject participation or acquiring the sample data for a longer time. Although modern strategies suggest using multiple algorithms to achieve higher accuracy [153]. On the other hand, researchers can focus on the effect of medication and regular activities of the patients which will help researchers to analyze the neurological as well as physiological effects on the depression period. Furthermore, High density of the electrode placement within the whole brain can help analyze the brain functional connectivity i.e. correlation pattern between different brain regions to define the most affected brain area due to depression [154]. Since suicide is highly associated with depressive symptoms, future studies may find suicidal tendencies through EEG and a correlation between suicidal thoughts and depression severity.

VII. RECOMMENDATIONS
Based on the overall discussion and limitation in this study, few ways may assist the future research to explore. One of the biggest challenges found in our study is data limitation. To obtain a large amount of data, the studies may assemble their data from any hospitals or clinics with proper consent. Data augmentation is another strategy that significantly increases the diversity with the data availability without collecting new data for training the models [155]. Data needs to be preprocessed to remove the physiological and nonphysiological artefacts from the raw EEG; otherwise, the spatial information may get lost [156]. It is recommended to use regression and adaptive filtering techniques if any reference channel is available [157]. Before feeding the data into the models, preferable dataset splitting is necessary to supervise a classification model [158]. To avoid the curse of dimensionality, high-dimensional correlated data can be transformed to lower uncorrelated data. Participants need medications free as well as having no other complications before starting this experimentation. Most of the studies focus on identifying depression at a certain stage and the feature protocol, as well as the derived stimuli for each of the stages, differs from each other. However, the derived negative stimuli can lead to biased identification. In that case, positive stimuli can be used for discriminating the depression stage.

VIII. CONCLUSION
Depression is one of the serious and common medical disorders that can lead to suicide if not treated in the preliminary stages; hence, the early identification of depression is essential. As we mentioned before in the earlier sections, the clinical questionnaire-based assessment methods have limitations and, in that case, EEG signals can be considered for developing clinical assessment protocol as it is non-invasive, easy to use, and user-friendly.
We focus on evaluating the potentiality of EEG-based depression identification and the works that has been conducted around the world to address this purpose. Therefore, we have conducted a comprehensive study to find out dif-ferent experimental protocol diversities. It was observed that negative stimuli show better results of depression detection. In addition, researchers further observed that the EEG signals in the frontal and parietal-occipital cortex of the right hemisphere have the potential to show distinct results for distinguishing healthy and depressed brains. The trend of deep learning approaches is drawing more attention and provides high differentiation as well as more information about the underlying depressed brain regions. Furthermore, functional connectivity and topological mapping show more convenient and promising outcomes for advanced research. We hope this study will be helpful for future researchers to explore the opportunities for depression and beyond.

SUPPLEMENTARY ELEMENTS A. SCIMAGO RANKING AND IMPACT FACTOR
From the 52 research articles, there are 42 journal articles and 10 are selected from IEEE Xplore. As there are no indications for conference papers, we plotted the 42 journal articles in Boxplot according to the Scimago Rank and Impact factor (shown in Fig. 13). According to Scimago rank, we have found 23 Q1 ranked, 11 Q2 ranked, 7 Q3 ranked and 1 Q4 ranked articles. As we can see from this boxplot, we can see that most of the journals from Q1 and Q2 ranking and the impact factor of most of the articles is higher than 2.0 which concludes that our selected articles are more informative.

B. SUMMARY TABLE OF REVIEWED RESEARCH ARTICLES
The summary of our 52 research articles can be found in below link. Here, we have mentioned our final included 52 research articles along with their publishing data, depression type, use of questionnaire assessments, signal acquisition, preprocessing, classification model and their accuracy, sensitivity and specificity. https://github.com/AntoraDev/ Systematic-Review blob/main/Summary% 20table%20of% 2052%20research%20articles.pdf