On Using Grey Literature and Google Scholar in Systematic Literature Reviews in Software Engineering

Context: The inclusion of grey literature (GL) is important to remove publication bias while gathering available evidence regarding a certain topic. The number of systematic literature reviews (SLRs) in Software Engineering (SE) is increasing but we do not know about the extent of GL usage in these SLRs. Moreover, Google Scholar is rapidly becoming a search engine of choice for many researchers but the extent to which it can find the primary studies is not known. Objective: This tertiary study is an attempt to i) measure the usage of GL in SLRs in SE. Furthermore this study proposes strategies for categorizing GL and a quality checklist to use for GL in future SLRs; ii) explore if it is feasible to use only Google Scholar for finding scholarly articles for academic research. Method: We have conducted a systematic mapping study to measure the extent of GL usage in SE SLRs as well as to measure the feasibility of finding primary studies using Google Scholar. Results and conclusions: a) Grey Literature: 76.09% SLRs (105 out of 138) in SE have included one or more GL studies as primary studies. Among total primary studies across all SLRs (6307), 582 are classified as GL, making the frequency of GL citing as 9.23%. The intensity of GL use indicate that each SLR contains 5 primary studies on average (total intensity of GL use being 5.54). The ranking of GL tells us that conference papers are the most used form 43.3% followed by technical reports 28.52%. Universities, research institutes, labs and scientific societies together make up 67.7% of GL used, indicating that these are useful sources for searching GL. We additionally propose strategies for categorizing GL and criteria for evaluating GL quality, which can become a basis for more detailed guidelines for including GL in future SLRs. b) Google Scholar Results: The results show that Google Scholar was able to retrieve 96% of primary studies of these SLRs. Most of the primary studies that were not found using Google Scholar were from grey sources.


I. INTRODUCTION
The Internet has become a vital channel for disseminating and accessing scientific literature for both the academic and industrial research needs. Nowadays, everyone has comprehensive access to scientific literature repositories, which comprise of both "white" and "grey" literature. The "grey" literature, as opposed to "white" literature, is non-peer reviewed scientific information that is not available using commercial information sources such as IEEE or ACM. A large number of software engineering researchers are undertaking systematic literature reviews (SLRs) to investigate empirical evidence in software engineering. The key reason to include grey literature during information synthesis is to minimize the risk of any bias in the publication. Using the state of the art non-commercial databases that index information, the researchers can make the rigorous process of searching empirical studies in SLRs easier. This study explains the evidence of grey literature while performing synthesis in Systematic Literature Reviews.
Grey literature (GL) refers to informally published written material, not indexed by major database vendors (such as IEEE Xplore 1 and ACM 2 digital libraries). GL is usually attributed to government, academia, pressure groups, trade unions, industries and is not rigorously peer reviewed [1]. Some examples of GL are reports (progress, market research), theses, conference proceedings, technical specifications and standards, official documents, company white papers, discussion boards and blogs.
Typically at the start of any research endeavor, the firsthand information about a new topic is generally collected through GL. This includes a quick search of the topic on Internet and discussions with peers [2]. GL can offer some advantages, e.g., it can be authored by scholars and scientists and thus is of high quality and detail [1]. It has recent information about a topic of interest and is focused [3]. It is also available earlier than commercially published literature [2].
The growth of Internet has immensely broadened the access to GL [4], [5]. However it has also produced new challenges for researchers: What to include and what not to include in GL? A recent example of such a challenge was faced by a journal article where researchers claimed to identify genes that can predict human longevity with 77% accuracy. This received rapid feedback and enough criticism just after an hour of online publication [6]. The online researchers showed their skepticism about the environment and controls in which the study was conducted.
Over the past decade, the Internet has emerged as an essential source of information for everyone [7]. In scientific community, academic researchers are now equipped with state of the art sources of scientific articles and meta-data research tools for their research. The online presence of scientific communities, discussion boards and blogs owned by notable authors is an important source of up-to-date scientific information 3 . However, most of the information published in online communities, blogs and discussion boards is considered as "Grey" by the definition of Grey Literature.
The Grey Literature, by Luxembourg definition and GreyNet community 4 , is, "Information produced on all levels of government, academics, business and industry in electronic and print formats not controlled by commercial publishing i.e. where publishing is not the primary activity of the producing body". In general, grey literature publications are volatile in nature and lack bibliographic controls such as place and date of publication, details of author and publisher.
These tendencies of grey literature make it difficult to index and categorize it. The grey literature is often referred as "fugitive literature" as it is semi-published and difficult to locate [8], [9]. Grey literature, though not peer-reviewed thoroughly, is still an important source of information [10].
It is worthwhile to note that grey literature, although not peer-reviewed, is often produced by scholars and scientists of their respective fields and is of high quality and detail [10]. According to Soule and Ryan [11], grey literature is becoming a common means for information exchange because it is available on a timely basis than literature published by commercial information sources. For instance, the conference papers are in access to public long before the published articles. Beside these traits, grey literature is focused, has indepth and up-to-date information about any topic [12]. The growth of Internet has immensely broadened the access to grey literature [7], [13]. Now a days, research on various aspects of grey literature is being undertaken such as one of the recently published studies [14] discusses the argument whether thesis or dissertation are still counted as grey literature (taking in consideration a quality review process for graduation). Furthermore, another group of researchers [15], [16] offer guidelines on how to include online literature/grey literature in research studies, keeping in mind the weaknesses associated with grey literature. Another group of researchers [17], [18] focuses on whether online literature can be used for improving public law or policies. Besides this, research is also being conducted on how online repositories are indexing the grey literature with respect to specific location such as in India [19] and Africa [20]. Our study has multiple objectives and fills the research gap in software engineering by researching (i) the extent of usage of grey literature in systematic literature reviews in software engineering; (ii) categorization strategies and quality assessment criteria for grey literature and (ii) viability of Google Scholar for searching grey literature.
Inclusion of GL is also important to minimize publication bias. Publication bias refers to the problem that the studies with positive results are most likely to be included as primary studies in an SLR than the studies with negative results. Some of the strategies to tackle this issue are to scan for GL, conference proceedings and unpublished results by contacting colleagues and researchers [21]- [23].
With the number of SLRs in SE growing and considering the importance of GL [24], this study investigates the extent of GL use in SE SLRs. As a secondary concern, this study also investigates the extent to which Google Scholar alone is sufficient to find primary studies for an SLR. This tertiary study thus tries to seek answer to the following research questions: RQ1: What is the extent of usage of GL in SLRs in SE? RQ1.1: What strategies can be used to categorize GL (nonpeer reviewed) and how to assess its quality?
Rationale for RQ1: The Internet is transforming the whole value chain of publishing by offering tools and channels This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. RQ2: Is Google Scholar alone sufficient for searching primary studies in conducting an SLR in SE?
Rationale for RQ2: The process of selecting primary studies for an SLR can be very laborious, time-consuming and rigorous [25]. Manual searches are conducted on different information sources to pile up primary studies. On the other hand, we have Google Scholar 5 6 that retrieves results from all major databases and orders them on the basis of certain attributes. It is interesting to know if researchers can rely only on Google Scholar for finding primary studies instead of manually searching separately in each of the databases.
The rest of the paper is organized as follows. Section II motivates and explains the research methodologies used, including the important steps in the systematic mapping (subsection II-A). Section III analyzes and explains the results acquired from the systematic mapping (sub-section III-A). It thoroughly discusses the characteristics of GL in SLRs including (but not limited to) their forms and origins. Furthermore, sub-section III-B discusses the google scholar indexing results obtained using the systematic mapping. Section IV discusses the proposed categorization strategies and quality evaluation criteria for GL. Threats to the validity of the study are given in sub-section V-A followed by the conclusions in sub-section V-B.

A. SYSTEMATIC MAPPING STUDY
This first part of the study is conducted as a systematic mapping study based on the guidelines proposed by Kitchenham [23]. Systematic mapping studies are recommended methods for getting a broad understanding of a research topic and does not involve detailed synthesis as in the case of a systematic literature review (see e.g. [26], [27]). Our methodology is driven by using a predefined protocol that VOLUME 4, 2016 aims to be unbiased by being auditable and repeatable [28]. Our study is also a tertiary study since it collects evidence from secondary studies (i.e. systematic literature reviews in software engineering). Other research methodologies e.g., surveys, experiments and case studies are not relevant for achieving the goals of this study. Surveys are typically conducted when the use of a technique has already taken place, case studies are mostly suitable for conducting industrial evaluations while experiments are used for quantifying a cause and effect relationship. In our study, we are not assessing any specific technique rather collecting overall evidence of grey literature in SLRs as well as evaluating if Google Scholar alone is able to find primary studies of SLRs.
This systematic mapping study is based on RQ1 and RQ2 given in Section I. The population in this study consist of SLRs conducted in SE. Intervention includes the use of GL in SLRs within SE. The comparison is not applicable in this study as our aim is not to do a comparison. The outcome of our interest is the level of usage of GL in SLRs in SE. Our context and types of primary studies are limited to SLRs.

1) Search strategy
Our search for primary studies (SLRs in this case) was based on the following steps: • Identification of alternate words and synonyms for terms used in the research question. • Use of Boolean OR to join alternate words and synonyms. • Use of Boolean AND to join major terms. We limited our search to papers published between year January 2004 to June 2012. We selected 2004 as the starting year because the guidelines for conducting SLRs in SE were first published in 2004. The search terms used are as following: (i) systematic review (ii) systematic literature review (iii) meta-analysis (iv) empirical evidence (v) empirical studies (vi) empirical study. The use of these search terms led to using the following search queries: empirical studies OR empirical study, systematic review AND Kitchenham, systematic literature review AND Kitchenham, metaanalysis AND Kitchenham, (empirical studies OR empirical study) AND Kitchenham, "systematic review" AND (software engineering).
The following databases were selected for searching 7 : We conducted a pilot search before the actual search to verify the strength of search terms. This was an attempt to avoid time being wasted because of inadequately designed search terms [22], [25]. After finalizing the pilot studies, we performed search and if we got more than 90% percent pilot 7 According to Hasteer et al. [29] and Dybå et al. [30], these databases cover the most relevant journals, conference and workshop proceedings within SE studies using a search term, we retained it. The pilot studies included a total of 37 SLRs representing each year from 2004 to 2012. Out of the 37 pilot studies, 22 were found from Kitchenham et al.'s paper [31] while 15 more were added by contacting prominent authors.
We used a three-phase strategy for searching, similar to one used in [32]. In the first phase, we searched above mentioned electronic databases. In the second phase of our search strategy, we scanned the reference lists of all the papers found after the search in electronic data bases. We then contacted authors who authored most number of SLRs and also scanned their personal webpages. In the third phase of our search strategy, we used Google Scholar 8 to find any missing SLRs. The detail of the research protocol can be seen in the Figure 1.

2) Study selection criteria and procedures for including and excluding primary studies
We included papers that met the following inclusion criteria: • The paper is an SLR, written following the guidelines given in [23]. • The paper is peer-reviewed. • The paper language is English. • The paper is published between year January 2004 and June 2012.
We excluded papers based on the following exclusion criteria: • Paper is not available in full-text.
• Paper does not belong to SE. • A shorter version of a similar paper is excluded.
• Editorials, position papers, keynotes, tutorial summaries and panel discussions are excluded. • Reports of lessons learned, expert judgments, anecdotal reports, and observations are excluded.

3) Study quality assessment and data extraction
We did not perform quality assessment as a separate step because one of our inclusion criterion enabled us to only include SLRs that followed guidelines proposed in [23]. This meant that the included studies were of reasonable quality and rigor. We designed a data extraction form to collect information needed to answer our research question. We extracted the full citation details of the SLR, number of primary studies used in the SLR and full citation details of every primary study used in the SLR. Most of the SLRs (primary studies in our case) included a list of primary studies while for others we had to read the full-text to get the list. For each primary study in every SLR, the authors searched for the source of the study (whether GL or indexed elsewhere). The SLRs were divided among the authors for data extraction. The data extraction was cross-checked by an author other than the one extracting.

III. DATA ANALYSIS A. GREY LITERATURE EVIDENCE : SYSTEMATIC MAPPING
A total of 138 SLRs were selected for data synthesis 9 . These SLRs covered four electronic databases (ScienceDirect, IEEE Xplore, ACM digital library, Springer Link). We present our results separately for each database and then, in the end, we will draw the overall picture of grey evidence. There were a total of 6307 primary studies extracted from 138 SLRs. The total SLRs and the primary studies are given in Table 1 for each database. The detail of the SLRs and primary studies is also shown in Figure 2. For gathering evidence relating to the use of GL, we classified the total primary studies for every electronic data base according to their source, i.e., whether coming from one of the four electronic data bases (ScienceDirect, IEEE Xplore, ACM digital library, Springer Link), other journals/books or GL.
IEEE SLRs: There were a total of 48 SLRs retrieved from IEEE Xplore, consisting of 2018 primary studies. The classification of these primary studies according to their source is given in Table 2. 9 The references of primary studies are listed in Appendix. ACM SLRs: ACM digital library gave us 9 SLRs consisting of a total of 240 primary studies. Table 3 presents the classification of these 240 primary studies in terms of their source. The number of GL sources stand at 27, making up 11.25% of the total primary studies for SLRs found in ACM digital library. Science Direct SLRs: For ScienceDirect, the 67 SLRs gathered a total of 3573 primary studies. The classification of these primary studies according to their source is given in Table 4. The percentage of GL is lowest as compared to other sources of primary studies.
Springer SLRs: There were a total 476 primary studies VOLUME 4, 2016  In summary, out of 6307 primary studies in 138 SLRs, 582 (9.23%) were classified as GL. 4920 primary studies (78%) were from the four major databases (ScienceDirect, IEEE Xplore, ACM digital library, Springer Link).
We have noticed that most of the grey literature that has been included as primary studies in SLRs are conference proceedings and technical reports. In order to further analyze the extent of GL use in SLRs, we define certain indicators: It is calculated by dividing total grey primary studies by total SLRs with grey primary studies. Table 6 shows that 76.09% (105 SLRs) of the total SLRs have used GL for their primary studies. The Table 6 also presents the frequency of GL use in primary studies per database.  We see from Table 7 that 582 primary studies were identified as GL out of 6307 primary studies. The Table 7 also presents the frequency of GL citing in primary studies per database. 3) Intensity of GL use Table 8 shows the intensity of GL use indicator for each database. We see that the intensity of GL use in 105 SLRs is 5.54.

4) Total Grey Evidence Found Using Systematic Mapping
A total of 6307 primary studies included in 138 SLRs are investigated. We have found out that 582 primary studies are from grey sources. The percentage of grey evidence is around 9.22% in the selected 138 SLRs of Software Engineering. Figure 3 shows the extent to which grey literature has been used in SLRs in Software Engineering (SE).
While the inclusion of GL in synthesizing evidence is important, the GL source should be traceable. During this study, we noticed a small percentage of GL without proper bibliographical control (such as missing date of write-up and missing company name). We recommend that the GL should have at least the following information: name(s) of authors, date of write-up and name of sponsoring company.

6) Forms of GL cited
The distribution analysis of GL with respect to forms of document is shown in Table 9. The GL is classified into 7 categories: conference papers, technical reports, theses/dissertations, workshop/seminar papers, guidelines/lecture notes and preprints. These categories are described briefly below: • Conference papers: The conference papers not indexed in the four major databases (ScienceDirect, IEEE Xplore, ACM digital library, Springer Link) are taken as GL. • Technical reports: Includes reports such as research reports, internal progress and review reports and scientific reports. • Theses/dissertations: Includes academic theses done at undergraduate and postgraduate levels. • Workshop/seminar papers: Includes working papers from research groups and committees, typically presented in workshops and seminars. • Guidelines/lecture notes: Includes company white papers and guides to help readers understand and solve a problem. • Preprints: Includes draft of a scientific paper that has not yet been published in a peer-reviewed scientific journal. We see that conference papers are the most cited (43%) GL document type in SLRs followed by technical reports (25.2%) and theses/dissertations (12.4%).  Table 10 shows the number of grey primary studies by origin type. We classify the origin of grey primary studies as being produced by universities, international organizations, research institutes/labs/scientific societies, government organizations and others. We see that the universities and research institutes/labs/scientific societies are the biggest producers of GL documents covering~68% of the total grey primary studies. We also noticed that the grey studies produced by universities, international organizations and research institutes/labs/scientific societies contain well-formed bibliographical details and are highly accessible. We found 12 (~2%) grey primary studies that did not provide date of publication. The breakdown of grey primary studies with year of publication is given is Table 11. Majority of grey primary studies included in SLRs can be found in recent past. Almost 48% (280) of included grey primary studies were published in the last 5 years.   The more granular breakdown of each information source primary studies is tabulated in Table 12.
IEEE SLRs: There were a total of 48 SLRs retrieved from IEEE that consisted of 2018 primary studies. We searched the 2018 primary studies in Google Scholar (GS). A total of 1946 primary studies were found using GS and 72 primary studies were not found. Overall 96% of primary studies were found using Google Scholar. The results of Google Scholar findings are tabulated below in Table 13.
ACM SLRs: We retrieved 9 SLRs consisting of total 240 primary studies. There were total 27 grey sources used as primary studies in SLRs selected from ACM database. We searched 240 primary studies on Google Scholar. Out of these  240 primary studies, we were able to found 229 primary studies using Google Scholar. So, overall we were able to find about 95% of total primary studies of ACM SLRs using Google Scholar. The results of Google Scholar finding are shown in Table 14. Springer Link SLRs: There were a total of 476 primary studies extracted from 14 SLRs of Springer Link database. 23 primary studies were found to be from grey sources. We searched 476 primary studies on Google Scholar. Out of these 476 primary studies, we were able to find 468 primary studies using Google Scholar. So, overall we were able to find about 98% of total primary studies of Springer Link SLRs using Google Scholar. The results of Google Scholar finding are shown in Table 15. Summary of Google Scholar Results: We searched for the 6307 primary studies in Google Scholar and we came up with 6026 primary studies as hit. Only 281 primary studies were not found using Google scholar. The GS hit percentage is 95.5, which if we round, becomes 96 percent. Going into more detail, we noticed that 281 primary studies that were not found by GS, most of the primary studies were grey sources. Around 38.4% of the primary studies that were not found in Google Scholar were grey literature. We believe that this is because of that fact that grey literature is volatile in nature. Also, this can be because of the fact that sometimes the grey literature is not published in electronic formats or is not published over the web at all.

IV. DISCUSSION
Internet is an obvious choice for searching GL as it attracts a much broader audience [33]. Open access journals are increasing in numbers and are another source for GL. There is an increasing number of data which is generated at informal platforms, such as researchers producing personal opinions, reports and articles over social media, personal websites and blogs. Therefore to utilize this information in a proper manner, we suggest simple strategies to categorize GL based on various attributes. These strategies are a result of our experience and knowledge gained while investigating grey evidence in SLRs in SE.
The strategies presented in this Section have their pros and cons. Therefore a hybrid approach has to be used when searching for GL, e.g., a combination of multiple strategies identified below: • Filtering web content based on page views: Page view is the count of views by visitors on a web page. A popular web page is assumed to be viewed by a number of visitors. Once such a count is available, an informed decision can be reached whether to include/exclude a web page. This measure has some obvious limitations. A new web page will not have a higher count while greater number of counts do not correlate with high quality content. Moreover such a count might not be available on every web page. • Filtering web content based on user comments: For evaluating content in online blogs, discussion boards and bulletins, one can count the number of user comments as an indication of interest a particular post has generated. Again, one cannot entirely judge the importance of content with count of user comments as some comments might only be responses to earlier comments made by others (not relevant to the post). • Number of citations: If a certain document/report is cited extensively by other authors, it can provide a measure of the importance of such a document/report. A highly cited source may be included while a low cited source may warrant a full-text read to ascertain quality. • Filtering GL based on type: There are certain types of GL which are of greater interest than others, such as conference proceedings are more likely to contain  Similarly literature from certain research labs might be of high quality. Therefore the GL needs to be categorized based on types. One such categorization is based on SE SLRs and is given in Table 9.
• Filtering GL based on authors: While performing an SLR, it is sometimes obvious that few authors publish more than others. Consequently it might be of interest to look for GL from such authors (scanning their web pages and resources from their research groups). • Filtering GL based on affiliations: Our study indicates that 67.7% of GL is contributed by universities, research institutes, labs and scientific societies. This means that it is useful to search for GL in these sources. This step can be performed as a secondary step after filtering GL based on prominent authors. • Filtering GL based on research methodology: Depending on the research question of an SLR, certain research methodologies will be excluded, such as one might only be interested in experimental evidence and thus surveys and case studies will be excluded. • Filtering GL chronologically: One of the advantages of GL is that new data is available quickly. Therefore sorting GL based on date can lead researchers to capture trends and allow them an insight into innovations.
Research gaps can be identified quickly, setting foundations for interesting future research ideas.
All the strategies presented above have their own pros and cons. The recommendation is to use hybrid approach while using these strategies. An example combination of these strategies can be as follows; 1) Search the String/ Keyword. 2) Categorize by grey literature type (Conference Proceedings, Thesis, Reports etc.) 3) Categorize by no. of hits or no. of citations. There are many different combinations which can be adopted in order to fetch quality data from Internet. It totally depends on the researcher to select a certain combination of strategies which suits his research requirements.

A. ASSESSMENT OF GL QUALITY
While inclusion of GL can help protect us from publication bias, their quality has to be assessed. GL usually do not undergo rigorous peer-review therefore their quality must be assessed against a minimum number of preset criteria. We have come up with a list of quality assessment criteria (a checklist) designed for GL, along with the motivations of including them (Table 16). The criteria are based on our experience of searching GL during this study and are by no means complete. Furthermore we have not yet evaluated the validity of the quality criteria which is planned as a future study.

A. VALIDITY THREATS
This study is conducted using the guidelines for performing SLRs [23], though on the scale of a systematic mapping study as we asked general questions (i.e., what do we know about use of GL in SE SLRs?). The search strategy was initially piloted on a small number of studies to ensure User comments can point to importance or research contribution. 5 Do majority of comments on the document support its quality?
A high-quality document should receive more positive comments.
6 Have the authors published elsewhere?
Prominent authors in a field are more likely to have published elsewhere. 7 Can the results be reproduced?
To ensure enough methodological details are provided. maximum coverage. The search strategy was not only limited to electronic databases but also included searching for relevant studies in the reference lists of included papers, asking researchers about any SLRs we might have missed and using Google Scholar. A validity threat is that we did not search in electronic databases other than ACM digital library, IEEE Xplore, Science Direct and Springer Link. We intend to add more databases in the future extension of this mapping study in to a detailed SLR. We defined explicit inclusion/exclusion criteria but did not perform quality assessment because we only included SLRs following standard guidelines [23] and also because our research questions were not posed to evaluate research outcomes. Quality assessment will however be required once this mapping study is extended to an SLR where we would be interested in specific research outcomes. The data extraction in our case was lengthy but not complex. On few occasions it was not easy to find primary studies of a particular SLR. In that case, two of the researchers matched their outcomes and resolved differences. The validity of data synthesis was reached by cross-checking, i.e., the data extracted by one researcher was checked for any mistakes by the other researchers. The categorization of GL in case of conference proceedings was tricky since we did not know about the review policy of some of the conferences. We took the assumption that conference proceedings not included in the four major electronic databases are GL. We know that this is not the case with every conference proceeding in SE but this threat was minimized using authors' knowledge in SE research. However in the future SLR we intend to come up with a more detailed mechanism of categorizing conference proceedings as GL.
According to Hasteer et al. [29] and Dybå et al. [30], IEEEXplore, ACM Digital Library, Springer and Elsevier/Science Direct cover the most relevant journals, conferences and workshop proceedings within SE. Nevertheless, we acknowledge that adding more databases (including Scopus) will increase the validity of the study.
Grey literature is a new and emerging area in the Software Engineering field [24]. Researchers are exploring and proposing methods to utilize grey literature. Some suggest methods on utilizing quality blogs while others suggest utilizing quality online literature in research work. To the best of our knowledge, no study in SE has tried to calculate the magnitude of this grey evidence. Therefore, we have not included a separate related work section in this study, however some of the important contributions related to grey literature are mentioned earlier in Section I.

B. CONCLUSION
The below subsections will summarize and conclude the results of our study.

1) Grey Literature Results
Despite the known importance of GL during SLRs, we have found out that the level of grey literature evidence is 9%. Thus, most of the literature, which is included as primary studies in SLRs, is published and peer-reviewed. GL has gained more importance in "Health and Medical Science" research because of the sensitivity of research topics about human health and life. The inclusion of grey trials is necessary to limit any publication bias in Health Science [34]. We have found out that in the field of SE, researchers undertake SLRs with overwhelming use of peer-reviewed articles. In the following section, we state our answers to previously stated research questions.
RQ1: What is the extent of usage of GL in SLRs in SE? RQ1.1: What strategies can be used to categorize GL (non-peer reviewed) and how to assess its quality?
After investigation of 6307 primary studies during the systematic mapping, we have found out that the percentage of grey evidence is 9% in our selected SLRs. Among the total 6307 primary studies, 582 studies were classified as grey literature. While analyzing the 582 grey links, we noticed that most of the grey literature consisted of conference proceedings and technical reports (68%). The research results in these reports and proceedings are more detailed and specific than in journals and these results are available months before the official publication in traditional databases.
Our results regarding the evidence of GL in SE SLRs suggest that, on average, there is a minimal level of GL evidence (8.61%), when compared with four major electronic databases (IEEE Xplore, ACM digital library, ScienceDirect, Springer Link) and other journals/books. The comparison of GL with other sources of primary studies for the four major electronic databases is given in Figure 4.
The average percentage of primary studies source for IEEE Xplore, ACM digital library, ScienceDirect, Springer Link VOLUME 4, 2016 and other journal(s)/books is 33.97, 15.96, 14.65, 14.64 and 12.17, respectively. The results of performing a Kruskal-Wallis test to compare samples from each primary study source showed that at least one sample median is different from the others (p=0.004, α=0.05). A multiple comparisons test (Tuckey-Kramer, α = 0.05) showed that the primary studies from IEEE Xplore are significantly different from those belonging to GL. No other pairs of primary study sources differed significantly. This is shown in Figure 6 where the vertical dotted lines indicate differences in mean ranks of different sources, i.e., IEEE Xplore and GL have significantly different mean ranks.
We also collected three other measures of GL evidence in primary studies: frequency of GL use, frequency of GL citing and intensity of GL use. These three measures for the four major electronic databases is given in Figure 7.
We see that overall 76.09% SLRs (105 out of 138) in SE have included one or more GL studies as primary studies. Among 6307 primary studies across all SLRs, 582 are classified as GL, making the frequency of GL citing as 9.23% (the average across four databases is 8.61%). The intensity of GL use indicate that each SLR contains 5 primary studies on average (total intensity of GL use being 5.54). The ranking of GL tells us that conference papers are the most used form (43.3%) followed by technical reports (28.52%). Universities, research institutes, labs and scientific societies together make up 67.7% of GL used, indicating that these are useful sources for searching GL.

RQ2
: Is Google Scholar alone sufficient for searching primary studies in conducting an SLR in SE?
Searching for research literature (especially in Software Engineering) is time-consuming, and this effort increases a lot in case of an SLR. Our study aims to find a solution to this problem by answering the RQ2. A systematic mapping study is performed where in total, 138 SLRs (6307 primary studies) were extracted from various databases and searched in Google Scholar. The results from the analysis of the Google Scholar database showed that Google Scholar was able to retrieve (96%) of primary studies of SLRs. Most of the primary studies that were not found in Google Scholar belonged to grey sources. Moreover, during our research, we have seen that the literature which was not found with Google Scholar was found from simple direct Google search. Thus, it can be argued that the combination of Google Scholar and Google can increase the chances of finding maximum number of primary studies.
When we look at the results of Google Scholar, we see that Google Scholar was able to retrieve (90+%) of primary studies of SLRs. Most of the primary studies that were not found using Google Scholar were of grey sources. We found the primary studies that were not found in Google Scholar to be heterogeneous in characteristics and therefore we could not infer much about what type of studies generally Google Scholar is not able to retrieve. During our Google Scholar analysis, we noticed that some of the primary studies that were not found in GS were retrievable through Google. There were only few primary studies that were not found in both Google Scholar and Google. All of these primary studies were conference proceedings and workshops. We found that these studies were either published before year 2000 or belonged to specific conference proceedings. So collectively, we were able to find most but not all the primary studies using combination of Google Scholar and Google.
Possible future work for the study is to bridge the gap between academia and the GL utilization process. In this study, we have suggested a preliminary quality evaluation checklist (Table 16), which can further be enhanced and utilized to access the quality of the grey literature. . [32], [35]- [171] Primary study source 75.00