A Research Agenda on Open Data Impact Process for Open Innovation

Open data and open innovation are two topics currently attracting the attention of academics. But no previous studies consider these fields in combination while using a bibliometric approach. Thus, the aim of this paper is to understand the relationship between open innovation and open data. Two research questions have been formulated: 1) What are the main topics studied in the literature that combine both lines of research? and 2) How can the open innovation paradigm be integrated in the open data impact process? To address the first question, a co-word analysis is used to identify the main topics investigated in the open innovation and open data literature. Based on our results, to answer the second research question, the topics are grouped and analyzed considering a model of the open data impact process. Finally, some future research lines to analyze the open data impact process for open innovation are presented. For example, future research could focus on questions such as (1) What kind of applications can be created through the reuse of open data?; and (2) How do open innovation processes influence the reuse of open data?


I. INTRODUCTION
Open data and open innovation are two interesting phenomena to study. Open data is freely available for use by agents and helps to develop the innovation potential of public and private organizations. Open innovation is based on the openness of the inputs and outputs of the innovation and has changed the paradigm of innovation. The

II. THEORETICAL BACKGROUND
Open data is data made freely available for use by anyone (e.g., governments, organizations, researchers) without copyright restrictions. In that sense, open data can be a source for innovation [1], [2]. Open data helps develop the innovation potential of governments, businesses and entrepreneurs that can provide economic, social and scientific gains [3]- [5]. Additionally, some authors highlight the new opportunities for innovation in public and private sectors that big and open linked data have created [6], [7]; for example, facilitating the generation of new software applications by interconnecting data from different sources on the web [8].
Open innovation has changed the paradigm of innovation 1 on the bases of the openness of the inputs and outputs of innovation [9]. From the open innovation paradigm, organizations commercialize internal and external ideas by deploying them inside and outside [10]- [12]. Following Gassmann and Enkel [13] and Enkel et al. [14], there are three types of openness: inbound (insourcing external ideas and technologies to enhance products' values), outbound (outsourcing internal resources for refining, exploiting and bringing them to market) and coupled (a combination of the inbound and outbound processes). Open innovation has more research attention in the private sector but public sector organizations are also developing open innovation initiatives [15]- [17].
Open data offers access to external data that come mainly from public organizations. Recently, Smith and Sandberg [18] highlighted that outbound open innovation can be enabled by open government data and that it is beneficial to society. Governments and public agencies are liberating their data and they want open data to be used to solve problems and to create and improve products and services [19], [20], generating new business opportunities based on open data [21] and fostering entrepreneurial initiatives [22]. But access to open data in itself does not produce innovation [23]. Therefore, it is necessary to know how to develop open innovation using open data [24].
Previous studies have conducted literature reviews on open data [25]- [28]. There are also literature reviews on open innovation [29]- [38]. Some of these studies have identified an interesting relationship between the terms ''open data'' and ''open innovation''. Herala et al. [26]  open innovation paradigm and the open data impact process (section IV).

III. TOPICS STUDIED IN THE LITERATURE THAT COMBINE OPEN DATA AND OPEN INNOVATION A. SELECTION OF DOCUMENTS AND KEYWORDS
The first phase was to identify the most relevant papers by targeting journals and conferences. We selected documents that study open data-driven open innovation using the Scopus database. The database includes journal articles, conference proceedings, and books, and it has been used by others researchers such as Gupta et al. [40] to develop literature reviews. Documents were searched by ''Article title, Abstract, Keywords'', for all years and all access types. The data range is all years up to 2018. We used the following search terms:  In the second phase, to conduct the co-word analysis, we considered the keywords of the selected documents ( Fig. 1): a total of 332 keywords (Phase 2, step 2.1). Next, to develop the word co-occurrence networks, ''SciMAT'' (v. 1.1.04) bibliometric software was used [41]. Synonyms have been grouped to filter the keywords (e.g., ''e-government'' and ''government 2.0''), and words that can be singular or plural have been converted to singular form (e.g., ''e-service'' and ''e-services''). In the second phase, the filtering criteria were applied, resulting in 301 keywords (Phase 2, step 2.2).

B. THE CO-WORD ANALYSIS TECHNIQUE
Co-word analysis is a technique that uses co-occurrence models of word-pairs in a set of documents to identify the relationships of ideas that appear in the knowledge areas. In accordance with Choi et al. [42], keywords are considered to be important for analyzing various literature topics. As such, the presence of associations enables us to identify VOLUME 8, 2020 the relationships between the various topics that the words represent [43].
To perform the analysis, we proceeded to calculate the cooccurrence matrix and equivalence index [44]. These enabled the application of techniques such as the simple centers algorithm [45], an algorithm that allows the identification of keyword subgroups that have important associations; a maximum network size of 15 and a minimum size of two were established.
Using this technique, topic networks can be produced. For each network, Callon et al. [44] suggest the calculation of density and centrality, which enables the clustering of topics into four different types: motor themes, basic and crosssector themes, emerging or disappearing themes, and welldeveloped and isolated themes.  -''Innovation'': This is a motor theme. The term can be defined as such: ''An innovation is a new or improved product or process (or combination thereof) that differs significantly from the unit's previous products or processes and that has been made available to potential users (product) or brought into use by the unit (process)'' [46]. When analyzing the subnetwork (Fig. 3 . ''Open data'' is used within an ''open innovation'' process that involves various stakeholders (''co-creation'') and applies an evaluation methodology for the applications developed that displays the users' evaluations (''evaluation result'') [48], [49]. In addition, there is a relationship between these terms and the development of ''digital services'' based on ''open data'' using ''open innovation'' processes [50]. Fig. 3 illustrates other relationships with the main term such as ''open standards'', which act as information facilitators [51]. Moreover, ''small and medium enterprises'' is linked to knowledge management practices and their effect on driving innovation processes in these types of companies [52]. ''User interfaces'' and ''goal-matching service'' are related to ''innovation'' processes in the development of a web applications that promote cooperation and greater efficiency in contacts between public organizations (''goal matching service''). These applications have a negotiation user interface such as a videoconferencing platform that facilitates resolution of potential conflicts and promotes cooperation between organizations [53], [54]. Regarding the term ''innovation management'', the management of technological innovation in small-and medium-size companies is analyzed and modelled [55].
-''Economic and social effects'': this is a motor theme. The term refers to the economic and social effects that can be derived from open data or open innovation policies. If we analyze its subnetwork (Fig. 3), relationships of significant intensity between the main term and ''public data'' and ''economic effect'' can be found in the context of public entities sharing data in a ''public data'' format so that it can be used to create new companies and business models, and/or to improve public services and public policies [56], [57].
-''Linked open data'': this is a basic and transversal theme. The term refers to data that is publicly available on the web under an open license that allows the exchange of knowledge using semantic web technologies such as the Uniform Resource Locator (URL) or the Resource Description Framework [58], [59]. An analysis of its subnetwork (Fig. 3) reveals a relationship of significant intensity between the main term and ''public collaboration'' due to the development of web platforms that use ''linked open data'' made available by various public agents (''public collaboration'') to organize, create or discover certain public objectives and resolve potential conflicts [53], [54].
-''E-government'': this is a basic and transversal theme. The term can be defined as follows: ''The use of information and communication technologies, and particularly the Inter- net, as a tool to achieve better government'' [60]. An analysis of its subnetwork (Fig. 4) reveals a moderately intense relationship between the main term and ''smart city'', since the use of information and communication technologies, whether for ''e-government'' or ''smart city'', allows the provision of e-services to citizens, thus contributing to a new dynamic in the relationship between the city and its citizens [61].
-''Business model'': this is an emerging or disappearing theme. The term has a wide variety of definitions in the literature but, in general, describes the value of an organization as a set of interrelated elements that produce and capture value for its customers [62]. An analysis of its subnetwork (Fig. 4) reveals a relationship of moderate intensity between the main term and ''information management'' due to the importance of ''information management'' in supporting the ''business model'' in a digital environment, especially when combined with linked open data in a platform [63].
-''Priority journal'': this is a more developed and isolated theme. The term refers to important journals in a given field. An analysis of its subnetwork (Fig. 4)  innovation processes, and promoting public-private cooperation [64], [65].
-''Living lab methodology'': this is a more developed and isolated theme. According to the European Commission [66] ''A living lab is a user-driven open innovation ecosystem based on a business-citizens-government partnership that enables users to take an active role in the research, development and innovation process''. The concept/methodology is multidisciplinary and has various areas of application. Although it started in Europe within the new information and communication technologies, it has spread to other areas such as health, security, or sustainable energy sources [67]. An analysis of its subnetwork (Fig. 4) reveals a relationship of significant intensity between the main term and the terms ''user-driven innovation'', ''open data innovation'' and ''need-driven innovation''. This is because the ''living lab methodology'' promotes ''open data innovation'' processes through co-creative innovation, involving the product or service users in its development (''user-driven innovation''), which requires the promotion of innovation (''need-driven innovation'') [68], [69]. Abella et al. [70] propose a model to analyze the open data impact process. Considering that model and the results of our co-word analysis, we have developed the Table 1 that presents our classification of topics. The main topics are placed in the four phases of the process: 1. Candidate data; 2. Published data; 3. Reused data; and 4. Impact. For each topic, the document authors and year of publication are also presented. We then analyzed what was studied under each topic and associated this with the appropriate model phase.

IV. OPEN DATA IMPACT PROCESS FOR OPEN INNOVATION
The first phase (Candidate data) encompasses the different sources of open data. The main topic by number of documents (20) is open government data. The studies on this topic have examined aspects such as the determinants of innovation using open government data [18], [71]; the use of this type of data [72]; the creation of portals to encourage companies and citizens to create e-services [73], [74]; the difficulties that citizens may encounter in using them [75], as well as the impact of these data on competitiveness [76]; or its economic impact [56]. The topic of e-government (13 documents) also stands out. These studies focus on the willingness of stakeholders (citizens, companies, public entities) to innovate with open data [77], [78]; the reuse of these data by companies and citizens to create e-services [73]; the improvement of services provided to citizens through applications that collect information from them [48]; the difficulties in carrying out an open government agenda [75] or the impact of digital technology for improving the efficiency and productivity of public administration [79].
The topic of open science (six documents) includes studies of making academic or scientific information available to society in order to facilitate its reuse in innovation [64], [80]- [82]. On the other hand, the openness concept topic (six documents) includes studies of the development and contextualisation of this concept [83], [84]; other aspects related to open innovation management in small and medium-size enterprises [52], [55], [85]; or the influence of open policies on standardisation activities [86]. The topic of linked open data includes four documents analyzing the development of platforms that use this type of data [53], [54], [63]. Regarding the topic of big data (two documents), the studies examine the importance of big data and its associated technology that makes data available for use in open innovation processes [83], or how big data collected through social media can be important for open innovation activities [87]. The topic of smart cities (two documents) includes studies of open collaboration within the smart city ecosystem [61], [88].
The second phase (Published data) encompasses the forms and locations of open data publishing. The topic of cooperation with different agents to obtain data is the most prevalent, appearing in 11 documents. These studies analyze the collaboration needed to conduct open innovation processes among different agents such as public organizations, companies, universities and citizens [48], [89], [90]; universities and companies [64], [69], [91]; or between companies [52]. Another prevalent topic is open data portals (five documents). These studies examine the portals created to share open data in cities [51], and how they can foster open innovation policies that promote the participation and collaboration of different agents in service creation [73], [74]. Lastly, the topic of web data platforms (four documents) includes studies of the aspects of business models, such as the creation of a revenue model that encourages the use of open data through platforms [63] or the implementation of web platforms that use linked open data to promote collaboration among various stakeholders, including individuals or organizations [53], [54].
The third phase (Reused data) encompasses the reuse of open data in open innovation activities. The applications topic is the most prevalent with 28 documents. The reuse of data enables the development of digital services through co-creation processes with other agents such as public organizations, universities, and companies [48], [89], [90]. The development of mobile applications based on open data stands out and is mainly related to aspects such as transportation and mobility [72], [89], as well as the provision of information to locate certain places of interest or services in the city [49], [77]. These studies also focus on mobile applications for voting and for promoting different initiatives proposed by citizens to develop services in the city and that create social networks for neighbours of a specific area to promote neighbourly collaboration [48]. There are also applications that show available desks in libraries for students [89] or health-related applications focusing on nutrition [68], [69]. Furthermore, these studies examine web systems that enable information-sharing by various stakeholders through linked open data to resolve conflicts and facilitate open innovation processes [92]. Lastly, the crowdsourcing topic appears in one document that studies the crowdsourcing phenomenon and open data in smart cities, which are cities with open environments and government-citizen collaborations that foster the creation of open innovation processes such as new eservices that serve the needs of citizens, since they participate in creating these services [88].
The fourth phase (Impact) addresses the effects of reusing open data and the innovation that has been created. The topic of economic impact appears in two documents that study the economic effects of using open data [56], [57]. The impacts on competitiveness are analyzed in two documents that examine how open-access data can improve competitiveness in a knowledge-based economy [76]; indices are also proposed that use open data to measure the innovation of various countries around the world [93].   VOLUME 8, 2020 Studies classified in the first phase -Candidate data -analyze the sources of open data. The existing literature includes some partial studies, but no in-depth studies have been conducted to identify all of these sources and their characteristics. In addition, just because they are open data sources does not necessarily mean that they are good candidates for reuse, so it is important to analyze and define open data quality. Another interesting aspect mentioned by Smith and Sandberg [18] in these future areas of research is the analysis of the barriers that arise when innovating with open government data. Also included in this phase is the subject of smart cities, which is already receiving considerable attention in the literature, but still offers much potential for further research [94]. For example, we could ask: What data can be obtained in smart cities to foster open innovation? Is there any means or tool that can help optimise and improve the capture of open data in these cities? In this phase, future studies will study outbound open innovation, that is, how to select internal data from different agents (public organizations, smart cities. . . ) to be converted into open data. Additionally, agents of the open data reusers' ecosystems who can develop that type of open innovation will be identified.
In the second phase -Published data -we find studies that analyze where and how the data can be published, paying special attention to web platforms and open data portals. In this sense, we find several topics of interest that could be new research areas, such as the features that an open data portal should have to publish information that is useful for innovation, and for publishing the data in a homogeneous format to enable comparisons between portals. Taking this a step further, it's interesting to note the aspect mentioned by Zhu and Freeman [95] that portals are efficient in providing data, but they still have to improve in supporting users in their engagement. The study of outbound open innovation is an interesting topic in this phase as well. In particular, how to publish open data to perform outbound open innovation can be analyzed. Another theme is identifying the agents of the open data reusers' ecosystems who can collaborate in the process of outbound open innovation.
In the third phase -Reused data -the major challenge is to implicitly identify the products, services, and businesses that are created from the reuse of open data. Although the existing literature analyzes the reuse of open data, researchers have difficulty identifying and collecting information about applications developed from open data, and about the businesses that can be created. It would be very interesting to develop a model or proposal to publish these data both in the open data portals and in the applications and businesses developed from these data. In addition, this phase reveals the need to delve further into the study of co-creation and citizen participation in the creation, design, and redesign of public services from open data, as noted in the future lines of research sections of studies by Chan [73], Chatfield and Reddick [74], and Hellberg and Hedström [75]. Also worth mentioning is the need for longer-term studies that analyze the impact of citizen engagement when participating in certain innovation processes [61]. This brings us to the study of inbound and coupled open innovation as an interesting topic. Specifically, the analysis of how to reuse external open data to innovate, creating products and services. We note again the theme of identifying the agents of the open data reusers' ecosystems who use open data, in this case, for inbound and coupled open innovation.
Finally, we identified a smaller number of studies related to the fourth phase -Impact. As such, more studies are needed to quantify in detail the economic impact of reusing open data by private or public sector organizations as recommended Noda et al. [56]. It would also be interesting to analyze the impact of open government data on competitiveness in the technology, economic, political, or social fields [76]. This This paper presents some theoretical and practical contributions. For academic purposes, a classification and analysis of the role of open innovation in the open data process are developed, which could be the basis for future research works. Moreover, this paper provides useful information for public or private organizations that reuse open data by proposing new alternatives to the simple reuse of data, such as collaboration in open innovation processes with other agents (so-called co-creation) to create quality e-services for users by involving them in the creation process.
Finally, the paper has some limitations. In that sense, future studies can complement our results by using other bibliometric techniques such as bibliographic coupling, cocitation analysis, or co-author analysis. This would provide additional information and alternative approaches to describe the phenomena studied in this paper.