IoT and Big Data Applications in Smart Cities: Recent Advances, Challenges, and Critical Issues

The notion of smart cities has remained under evolution as its global implementations are challenged by numerous technological, economic, and governmental obstacles. Moreover, the synergy of the Internet of Things (IoT) and big data technologies could result in promising horizons in terms of smart city development which has not been explored yet. Thus, the current research aims to address the essence of smart cities. To this end, first, the concept of smart cities is briefly overviewed; then, their properties and specifications as well as generic architecture, compositions, and real-world implementations are addressed. Furthermore, possible challenges and opportunities in the field of smart cities are described. Numerous issues and challenges such as analytics and using big data in smart cities introduced in this study offers an enhancement in developing applications of the above-mentioned technologies. Hence, this study paves the way for future research on the issues and challenges of big data applications in smart cities.


I. INTRODUCTION
Based on the latest estimation of the United Nations, about 68% of people will reside in urban regions in 2050. The report indicated a dramatic rise in urban living from 751 million in 1950 to 4.2 billion people in 2018 [1]. Bibri and Krogstie [2] argued that urban zones account for 70% of total natural resource consumptions which had led to environmental contamination, ecosystems destruction, and energy shortage. Limited access to resources is a prominent challenge in the cities development since these cities are invented to decline the cost and unemployment rate with special emphasis on climate changes and potable water supplies. Thus, there is an urgent need to adopt smart methods to help the citizens to address all the mentioned aspects [3] in this context, smart cities could be a solution to such problems. ''Smart city'' can be employed to improve the environmental, economic, alleviate mobility, safety, governance, and living standards of their residents [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Seyedali Mirjalili .
Marsal-Llacuna et al. [5] described a smart city as an urban medium utilizing information and communication technologies (ICT) as well as other relevant technologies to promote more efficient ordinary city operations and improve the quality of services (QoS) received by the citizens [6]. The extreme objective of the first smart cities was to alleviate the quality of life (QoL) of their citizens by modulating the demand-supply contradictions in different functionalities [7]. Concerning QoL demands, modern smart cities specifically emphasize sustainability and efficient energy management, transportation, health care, and governance to comply with the utmost requirement of urbanization [8].
The implementation of the notion of smart city will drastically raise the data. Such a huge deal of data (i.e. big data) will be at the center of the services rendering by the Internet of Things (IoT). The big data phenomenon has been explored by volume, velocity, and a variety of data types at ever-rising rates [9], [10]. Big data allows the city to attain precise insight into a huge deal of data collected from different sources. Such data mainly encompass unstructured specifications in comparison with big data collected by other methods [11]. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Such an enormous deal of unstructured data could be gathered and saved in the clouds or data centers through the use of distributed fault-tolerant databases like Not Only SQL, improving a single service or application [12]. Therefore, the programming models for large datasets processing using parallel algorithms could be implemented for data analytics to attain values from the saved data. Nonetheless, the application of big data analytics (BDA) in smart media is still at the initial stages. In this regard, the big data analytics could be promising in the enhancement of smart city services [13]. Currently, a huge deal of data is continuously generated in various sources, including smartphones, PCs, sensors, cameras, global positioning systems, social networking platforms, commercial transactions, and games. Regarding the ever-growing data generation in our digitized era, high-efficiency data storage and processing have brought up new issues for the conventional data mining and analytics platform. Big data analytics is capable of extracting valuable information from an ocean of data generated by various sensors.
This article is beneficial for the individual with a specific attention to the concept of smart cities as it is mainly concentrated on the analysis of big data applications as a major issue in the field of smart cities. The research gaps in big data studies of the smart cities are identified in various sectors.
Ultimately, the open challenges and trends will be presented to elucidate the direction of future studies in the field of big data in smart cities. The current paper will also address various usages of big data in smart cities.
The important aims of this work are outlined as follows.
• A brief review on the main characteristics of smart cities (e.g., architecture, challenges, and opportunities) and function of IoT in smart cities, • Discussion the needs and challenges of big data systems in smart cities, • A survey on big data systematic tools representing for smart cities in terms of data collection, pre-processing of data and analytical process of data, • Summary of potential upcoming research paths.
The rest of this article comprises the following sections: Section 2 presents state-of-the-art smart cities architecture, applications and the concept of IoT. The issues and opportunities of smart cities will be described in Section 3. As big data is the most prominent issue of smart cities, the obstacles in the path of its implementation in the smart city will be addressed in Section 4 considering various case studies in this field. Unsolved issues and future study topics will be also discussed. Section 5 introduces an innovative big data framework for the smart cities; while, Section 6 highlights the future research opportunities. Last but not least, Section 7 presents the conclusions.
Based on our literature review, this study is the first research addressing the issues of big data in a smart city. In this regard, the present article can pave the way for future studies aimed to develop smart cities based on the concept of big data.

A. SMART CITIES
Based on Albino et al. [14], a smart city refers to a series of paradigms extended over diverse areas: economy, public, government, mobility, environment, and lifestyle. This concept involves various fields including environment monitoring, traffic analysis, utility monitoring, public transportation, and incident reporting. Data collection from the mentioned areas helps the city authorities to improve the infrastructure and optimize their resources [15]. A smart city has to encompass some prominent factors to centralize the data. Such factors could be found in various shapes and forms from a plain webpage to a complicated and context-aware mobile application or specialized hardware [16], [17]. Furthermore, the data accessibility must be guaranteed so that the system can freely access the citizens as well as allowing them to modify and correct. Upon the citizens' contribution, broader information angles could be achieved as it will be easier to collect more data from citizens [18].
A high-level concept of a smart city and its various functionalities are depicted in Fig. 1. As it can be observed from this figure, Unmanned Aerial Vehicles (UAVs) and Internet of Drone (IoD) resources have been employed extensively for conducting the expected services and applications [19].
This prompt urbanization growth has brought up new infrastructural challenges. The city management gets more complicated upon city expansion and its service extension. Therefore, cities have to evolve to cope with the newly-emerged social, economic, engineering, and environmental issues. In other words, the cities should develop smart features to be able to properly encounter these issues.

B. ARCHITECTURE OF SMART CITIES
A smart city is intended to improve the smartness of its systems and applications. Here, some of its requirements and features are listed [2], [20], [21]: • A rigid and extensive framework offering safe and open access; • A citizen-oriented architecture; • A considerable deal of mobile and wearable public and private data which can be stored, found, shared, and tagged providing the citizens with access to information from anywhere, if necessary; • An application with analytical and integrated features; and finally • A smart physical and network infrastructure offers the possibility to transfer large amounts of heterogeneous data and support complicated and distributed services and applications [6], [22]. The studied smart city possesses a layered configuration including four levels as depicted in Fig. 2. • The physical components level serves as a host for all devices, sensors, equipment, apparatus, and components  of different smart systems, services, and applications. Besides that, data collection plays the most decisive role due to controlling the rest of the operations in a smart city. Moreover, substantial data heterogeneity has increased the challenges of data collection. The major challenges at the physical level can be summarized as: • the large volume of devices; • heterogeneity; and • low-processibility and restricted sources of energy. Based on Fig. 2, the bottom layer is responsible for sensing/data collection. This layer is composed of wireless sensor networks (WSNs), intelligent components, and data collection tools. This layer also collects the entire data categories from various sensors and devices [23]. This layer exploits diverse techniques to present more efficient data collection. Its sensor network captures a wide variety of parameters such as humidity, temperature, pressure, and light. In this regard, VOLUME 9, 2021 various sensing devices (e.g. radio frequency identification (RFID) sensors, global positioning system (GPS) terminals, Bluetooth, actuators, Zigbee, and cameras [24]) have been employed. Its maximum network coverage offers excellent accessibility and smartness as well as extended data capturing capabilities [6].
• The communication level: it encompasses the entire network technological alternatives to establish communication between physical elements and applications. By connecting data sources to the management, the communication (transmission) layer is the most prominent component of the smart city architectures. This layer converges diverse communication networks. Thus, innumerable devices can be linked to a single network satisfying routing through its distinctive addressability. This layer includes different types of technologies (i.e. wired, wireless, and satellite). Concerning its coverage, the transmission layer can be categorized into two sub-layers: access and network transmissions. Technologies offering short-range coverage (e.g. near field communication (NFC), Zwave, Bluetooth, M2M, Zigbee, and RFID) are considered as access network whereas those supplied broader coverage (like low-power widearea networks (LP-WAN)), 5G, 4G long-term evolution (LTE), and 3G are classified in transmission class.
• The Management Level: The data management layer can be regarded as the brain of the smart city as it is placed between the acquisition and application layers. The management layer conducts a wide diversity of data manipulation, organization, analysis, storing, and decision-making actions. An efficient data management layer is a prerequisite for a maintainable smart city as its functions depends on data management. The primary performance of the data layer is to preserve the strength of data by focusing on cleaning of data, development, involvement, and protection. This layer could be divided into several subunits including integration, processing, analysis, storing of data, and finally decision making supervision [25].
• The application layer: As the top layer of a smart city, the application layer is a mediator between the citizens and the data management layer. Its performance has a drastic influence on the users' perspective and satisfaction from the smart city, as it is in direct interaction with the citizens. Including diverse applications applied by citizens, municipal staff, and administrators are connected to different departments of the smart city. This layer encompasses the entire smart systems and developments. These systems can be fed and process huge data used by the communication level to flow the entities in the application processes. The smart applications are liable for decision implementations received from the data management layer. Each level exhibits a series of features and requirements. In this regard, various innovative computing paradigms have been utilized and explored.
As big data offers the possibility to solve the smart city challenges, the relevant challenges should be mentioned first. A limited number of studies have dealt with the aforesaid issues. However, scarce studies have consolidated the existing works and established them as a framework to offer an efficient approach for the management of big data in a smart city [26]. Here, we tried to present the recommended effective methods to cope with big data issues in a deep systematic review article. Considering the current importance of this topic, the present article can offer major insights for practitioners and researchers in this field [27].

C. IoT IN SMART CITIES
The emergence of the Internet of Things (IoT) approached the abstract concept of the smart city into reality. The smart cities are intended to provide their citizens with superior healthcare, safety, convenience, and wisdom. In other words, they are aimed at improving the quality of life. IoT could be definitely regarded as the most effective technology giving rise to a significant mutation in the world [28], [29]. IoT is the developed version of conventional networks aimed at connecting a myriad of interconnected devices. The concept of IoT has been further reinforced with advanced technologies such as wireless sensor networks (WSNs) and machine-to-machine (M2M) communication [6].
The resources of a smart city are managed and controlled by intelligent information systems that require considering the food, energy, and water nexus. Optimized supply and high-efficiency of resources require the development of IoT-based systems to meet the mentioned issues and the big data originating from such systems [26]. Huge efforts have been devoted to developing smart houses [30], smart transportation [31], traffic-managements [32], waste disposal system [33], energy management systems [34], and healthcare [35], as well as various facilities, synergizing towards the establishment of a smart city. Fig. 3 illustrates the idea of IoT which is one of the most effective approaches for enhancing the quality of life and social welfare taking into account human and environmental issues.
As a newly-emerged concept, IoT uses mobile devices, public services, transportation resources, and home appliances as its data collection tools. In such an environment, the entire daily-life electronic devices (e.g. emergency alarms, wristwatches, garage doors, and vending machines), and also household appliances like freezers, water heaters, washing machines, microwave ovens, refrigerators, dishwashers, stoves, and air conditioners are connected to an IoT network and could be remotely controlled.
Citizens are continuously and increasingly demanding for novel services supporting a high-quality lifestyle. These opportunities have led to the cities evolving; however, numerous challenges could influence the daily lives of the citizens [8]. As the hearth of this evolution, technology has remarkably altered our world and life. A digitalized world with interconnected objects, people, and devices has  drastically influenced our jobs, travels, societies, and interactions with our environment; profoundly affecting various fields including urban systems, environmental monitoring, management platforms, and healthcare [36].
The smart city has found increasing popularity over other urban development models (information city, digital city, and telicity) as it manifests the concept of the entire mentioned approaches [37]. A smart city is an implementation of the IoT [38], thus, inheriting its underlying operational mechanisms. According to Fig. 4, IoT offers vital building blocks of a smart city.

III. SMART CITIES APPORTUNITIES AND CHALLENGES
Despite the extensive application of the concept of a smart city, its further development requires resolving its remaining challenges. In this section, the issues and potential opportunities for the real application of a smart city will be discussed. These challenges are recognized by extensive literature review on the latest studies about the smart cities while the opportunities are presented by surveying the works and experiences in this area.

A. OPPORTUNITES FOR SMART CITIES
The concept of a smart city is yet developing and experiments and implementations are restricted inside the developed countries. Nonetheless, the promising advantages of smart cities can be applied to any urban region as shown in Fig. 5.
Thus, more research works on cost-effective design and implementation can enhance the popularity of smart cities around the globe. The incorporation of renewable energy VOLUME 9, 2021 resources is a promising method to guarantee sustainable city functions and successful management of scarce non-renewable energy sources.
Smart devices are producing an immense deal of data that needs enormous data storage facilities. Upon Big Data generation, the traditional data processing approaches have lost their application in the architecture of modern smart cities. Thus, Big Data analytics should be integrated into a smart city environment. This issue has been addressed in a limited number of studies, however, the majority of these works have been performed in proposal forms instead of practical implementation. In this context, the development and application of Big Data analytics [39] in real smart cities could be an interesting research topic. The secure processing of sensitive data is an urgent need in connected media. Doubtful security of sensitive data will result in the citizens' avoidance to use ICT plan of smart cities hindering the robustness of city operations. Therefore, the introduction of global security measures for smart cities is of high priority requiring deeper investigations. The heterogeneous devices are another interesting research field. A smart city integrates diverse sub-systems in its application layer to present prompt and consistent assistance. The issue of aggregation at the application layer is another potential research area [6].

B. CHALLENGES FOR SMART CITIES
The realization of a smart city will face diverse issues during design, arrangement, construction and operation steps. Design and operational costs, device heterogeneity, collection, and analysis of huge data, data security, and sustainability are just to name a few alarming challenges. Fig. 6 shows some of the major issues in the design of a smart city. Furthermore, some approaches are proposed to resolve the above-mentioned challenges that are discussed in the followings. Most of the cities are challenged by urban waste collection and they continuously try to develop a more efficient waste management system. Therefore, a smart waste management system is of crucial significance in a smart city [40].
The ability of smart cities to resolve environmental problems (particularly waste management) is a prominent question which should be explored academically [41]. The transformation of a smart city into zero-waste robust one necessitates four inter-related major approaches: waste prevention, upstream waste separation, timely waste collection, and suitable recycling collected wastes. The objective is to design and develop an IoT-enabled waste management system with an emphasis on associating waste management measures in the whole product life-cycle.
Heterogeneity [42] is another major issue in the development of a smart city. Smart cities encompass multi-vendor and multi-functional sensors, tools, and appliances. The implementation of the concept of smart city requires integrating the aforesaid heterogeneous items at the 55470 VOLUME 9, 2021 application layer. However, the platform incompatibility due to the heterogeneity hinders the integration and inter-operations at the application layer. Despite the challenges of facilitating universal access, smart cities are focused on the design, identification, and purchasing hardware and software enabling the integration of such heterogeneous subsystems.
Design and maintenance costs [43] are a key challenge in realizing the notion of a smart city. Design cost can be defined as the financial capital spent to deploy a smart city. Thus, the lower the design rate, the higher the probability of its realization. Operational expenses are related to regular procedures and supports. Minimum operational rates guarantee the sustainability of service provision with no extra costs on the municipality. Nonetheless, the rate of optimization in the lifespan of a smart city has remained an unsolved challenge.
Smart technologies could offer interesting solutions for various urban challenges including transportation, waste and environmental management. Security and crime prevention, however, are often unaddressed in most cases. Furthermore, when researchers propose a new smart security technology [44], they seldom address the implementation or possible impacts on conventional policing and the urban planning process [45].
Failure management is a major aspect in any smart city as well. Failures are predictable after natural disasters and system collapses (e.g. infrastructure breakdown and network inaccessibility). Sustainable design offers prompt recovery strategies to cope with the failure and retain the normal urban functionalities. However, the detection and application of such recovery strategies will augment design and operational costs. The objective is the implementation of failure recovery measures with the lowest cost and the highest efficiency.
With rising greenhouse gas emission, trashes in seas, jungles, and cities, smart cities are in critical situation to decline the environmental damages [46]. Energy-efficient structures, i.e. buildings, air quality monitoring, and renewable energy sources can help the cities to reduce their adverse ecological effects. Protection of the environment and resources for our children through lowering the carbon emission [47] and efficient utilization of the resources are among the main issues of modern smart cities with a decisive role in policies intended to mitigate the direct and indirect consequences of the city economy on the sustainability. In this regard, modern cities are concentrated on renewable energy supplies to decline the carbon emission and ensure the sustainability of the city.
Air quality monitoring throughout the city [48] can be used to find times of low air quality, detect pollution roots, and help the data analysts in developing preventive measures. Air quality sensors could contribute to the establishment of a foundation aimed at declining air pollution in even the cities with high population; while will surely save many lives as pollution-related diseases are annually threatening millions of lives.
A smart city connects the social-physical informationtechnology, and business infrastructures for leveraging the cooperative understanding power of the city. The connectivity issue [49] can be divided into three main stakeholders: operators who should offer capable coverage solutions for various sites throughout the city in a reasonable limit; venue owners who support coverage capacity ensuring that they will not lose due to poor mobile connection. Last but not least, local governments who have to control the mentioned groups to guarantee the supply of coverage for developing sustainable and competitive smart city infrastructures.
The urban spaces may seriously challenge the coverage. First, materials applied in the construction of highly-populated smart cities are often reflective thus prohibiting the radio frequency (RF) propagation. Basements and parking lots could be another example since these reinforced concrete structures can block the RF signals. Some buildings may support RF propagation but as they are located in a dense urban region, neighboring buildings could serve as a shield leading to poor coverage.
In this context, an efficient, inexpensive, and scalable solution capable of supporting M2M and IoT usages, smart city patterns, and mobile users is highly required. Moreover, networks have to offer solid and secure public communications for emergency services to further promote the safety of the users of such novel 'smart' media.
Due to their high amounts of data generation, the application of smart devices requires large data storage facilities. In this context, Big Data generation has replaced the conventional data processing approaches. In this regard, the examination of Big Data analytical systems and their integration into smart cities sounds crucial. A limited number of studies have addressed this issue most of which have remained as a proposal and not implemented in real-world situations. Thus, development and testing of Big Data analytical systems in real smart cities could be an interesting research topic. The security of sensitive data is an urgent requirement in interconnected media.
Several newly-emerged technologies like IoT, Radiofrequency identification (RFID), and upcoming Internet technologies can widely assist to the development of smarter cities [50]- [53]. The rising presence of the mentioned technologies has led to the generation of a huge amount of data [54]. As mentioned() [55], about 2.5 quintillion bytes of data are daily created. Moreover, 90% of these data have been generated in the last two years. Upon proper management and analysis, such a huge amount of data (i.e. big data) could drastically influence the operations in the smart cities [39]. The worthy information from big data analytical systems can rapidly transform cities into the artificial ecosystems of mutually dependent, interrelated, and smart digital entities [56], [57].

IV. BIG DATA A. APPLICATIONS OF BIG DATA IN THE SMART CITY
Thanks to the implementation of big data technology in smart cities, the data can be efficiently stored and VOLUME 9, 2021 processed into knowledge, (i.e. information) in order to develop smart city services. Furthermore, big data can help in service and resource expansion. In this regard, advanced tools and approaches can result in high-efficiency and effective data analysis. Such tools and routes can promote cooperation and communication between entities offering services to various sections of a smart city in addition to improving customer satisfaction and business opportunities. Table 1 summarizes common big data usages in a smart city.

B. RECENT ADVANCES ON BIG DATA
Ongoing studies in this area have emphasized the significance of big data in the development of smart cities. Moreover, some of these works have addressed the challenges of adopting big data in the notion of the smart city [70].
Q. Zhang et al. [71] proposed a processing model, so-called firework as a novel computational paradigm allowing assigned processing of data and distributing in a shared edge medium established based on IoT. The developed firework combined distributed data through the provision of virtual data views for the end-users applying pre-determined connections which are represented by a series of functions and databases. This firework is aimed at minimizing latency access of data through approaching processing techniques to the data producer in the edge networks. Firework possesses several participants who have to enroll their databases and corresponding tasks as data views. These data views could be accessed by any stakeholder in order to combine various data views into a particular one to conduct in depth systematic data analysis.
B. Ahlgren and colleagues [72] addressed the importance of IoT in services delivery to improve the citizens' lives (e.g. transportation, and energy efficiency). They emphasized that the IoT-based system has to rely on accessible data and benchmarks (i.e. interfaces and set of rules) to promote third party innovation through improving creator lock-ins. Accordingly, they planned and established a GreenIoT scheme in Sweden for determining the benefits of open platforms and accessible data in developing smart cities. Some guidelines should be designed to deal with the procurements of a user-friendly accessible IoT system (such as conventional data layouts and open application programming interfaces (APIs)).
B. Cheng and coworkers [73] developed an edge analytics platform (GeeLytics) capable of conducting synchronous processing of data at the network edges as well as the clouds. This approach coped with the geo-distributed and low-latency analytics due to the huge deal of IoT data. GeeLytics is aimed at supporting active stream preparation strategies in respect to the features of heterogeneous edge or cloud nodes as well as the capacity of the system. E. Ahmed and M. Rehmani [74] analyzed human behaviors through the use of big data and analytical outlines within a social IoT model. The authors proposed an approach comprising three operational areas. Moreover, the ecosystem of smart cities and big data was explored in this research. It was concluded that, collaborative filtering schemes could help in accurate analysis of human behavior.
H. R. Arkian and colleagues [75] introduced a fog-based data analysis methodology with a cost-effective resource optimization technique applicable in IoT crowd sensing purposes. This scheme (known as MIST) was aimed at reducing the service latency supply in conventional cloud computing. According to the empirical outcomes, the MIST fog-based method outperformed the conventional cloud computing by increasing the number of real-time applications. This realm of research can be further expanded through the following approaches: (a) addition of optional sensing modules to the fog layer, (b) Architectural reinforcement using privacy protection data analytics, and (c) Taking into account the data creators' and users' mobility in the store supply section.
M. M. Rathore and coworkers [76] proposed a system capable of resolving several issues of a smart city environment for instance enabling objects reaction to the context, minimization of data collection cost, and attaining insight on the real-time collected and processed data. Their developed system possessed a four-tier configuration, in which the bottom tier had the duty of collecting the data. The intermediate tier 1 enabled communications between base stations, sensors, Internet, and relays; while intermediate tier 2 was in charge of data supervision and preparation by means of the Hadoop framework. At last, the top tier applied data analysis approaches to generate results. According to their findings, the developed system exhibited higher scalability and efficiency concerning throughput and processing time as compared to the currently-used systems. Their system did not have an intelligent decision-making approach to handle big data in an IoT medium.
F. Alam et al. [77] reported the use of a number of data mining algorithms, i.e. deep learning artificial neural network (DLANN), pre-developed ANN, support vector machine (SVM), Bayesian, K-nearest neighbors (KNN), linear discriminant analysis (LDA), and decision tree algorithms (C4.5 and C5.0) in IoT-based data. The execution time, classification robustness, and confusion matrix of the aforesaid algorithms were compared in this work. Concerning classification robustness, DLANN, ANN, C4.5, and C5.0 outperformed SVM, Bayesian, KNN, and LDA. Nonetheless, C4.5, C5.0, and ANN showed high similarity in classification precision. Bayesian and LDA exhibited the best execution time, while LDA showed a slight superiority over Bayesian in terms of processing time. The authors were intended to examine extensive IoT datasets in more detail. In overall, neural networks could provide the best performance amongst all algorithms. This is due to the fact that neural networks are able to achieve the highest accuracy due to the nature of artificial intelligence technology [78], [79].
D. Mourtzis et al. [80] revealed that the use of IoT in the manufacturing industries can modernize the outdated mechanisms which may lead to the data production process turning industrial data into industrial big data. Such big data would be useless in the absence of analytics power. The implementation of data analytics empowers companies to adopt novel data-driven approaches to manage competitive pressure. Moreover, they also illustrated the simple implementation of an IoT paradigm in a company possessing ∼100 machines.
Alotaibi and colleagues [81] reviewed the diverse relevant concepts such as big data analytical systems and big data in addition to its role in healthcare and healthcare supply chain management (SCM). The influence of Twitter data on SCM was examined as well. They addressed opportunities and challenges of big data-enabled healthcare supply chain in addition to recommending future study topics. Based on their conclusion, big data has substantial promises in healthcare supply chains which requires deeper investigations.
Das and Griffin [82] systematically reviewed bias sources as well as their resolving strategies by overviewing the published papers and interviewing the experts. Their paper analyzed the topic frequency and evaluated the concept reliability by two independent trained coders. The trend of the unstructured textual contents was explored by a text mining pipeline which determined patterns and biases. The findings suggest maintaining the central locations of transportation experts and the public for determining the suitable objectives and metrics for assessing transportation safety, developing novel approaches to correlate big data to the total population, solving difficult problems with the help of big data, and progressing toward new trends and techniques.
N.Abdul Ghani et al. [83] reviewed the latest projects to broaden our insight on the big social media analytics. In this context, the authors classified the literature according to some prominent features. They also compared potential big data analytical technologies. Furthermore, they discussed the applicability of social big media data analytical systems through the emphasis on the presented state-of-the-art. Unsolved research challenges were also described.
C.Lim and coworkers presented the results obtained in diverse cases of big data application in cities as well as their projects in collaboration with the governmental organizations to develop smart cities. In particular, this review article classified the usage of urban data in 4 reference models and identified 6 major issues of data transformation into knowledge. According to the relevant studies, they also proposed five considerations in coping with issues arising in the implementation of the benchmark patterns in real-world cases. The benchmark patterns and assumptions formed a framework to apply data in smart cities. This article contributes to urban planning and policies toward a modern data-rich economy [84].
Abaker Targio Hashem et al. [85] described the latest communication techniques and smart-based applications which can be applied in smart cities. The concept of big data analytics in supporting smart cities was addressed with emphasis on the fundamental role of big data in altering urban populations at various levels. Furthermore, a business model was proposed for a smart city to identify the business and technological issues. Challenges of Big Data in Smart Cities include the challenges that Big Data by its nature facing different challenges such as Volume, Velocity, Veracity, Value, etc. Big Data brings these challenges into any domain that deals with. This study refers to these challenges as Big Data Challenges in Smart Cities. Besides this group of challenges, implementing big data and relevant technologies in Smart Cities are also facing another group of challenges for instance Variety, Variability, Validity, Vulnerability, etc. This research categorizes them as Big Data Implementation Challenges in Smart Cities. Following sections describe these two categories of challenges in details.

1) INTEGRATION CHALLENGES OF BIG DATA IN SMART CITIES
Data generated from various sources will result in a concept known as big data. Data sources surround us as depicted in Fig.7. Diverse applications such as social media, digital pictures and videos, commercial transactions, advertisements, and games have recently accelerated data generation. Big data (BD) is a normal consequence of the developed digital artifacts along with applications. Modern digital technologies such as sensors, mobiles, and social media have penetrated our lives leading to an unprecedented huge deal of data (i.e. Big Data). ''Big Data'' can be defined as an ever-rising dataset whose management with conventional relational management systems is difficult. BD is often described by ten Vs: Volume, Velocity, Veracity, Value, Variety, Variability, Validity, Vulnerability, Volatility, and Visualization [86]- [88] as illustrated in Fig.8.
• The first ''V'' refers to the exponentially-growing volume of intelligent transportation systems (ITS) data. The volume of BD grows supersedes the conventional operational databases or data warehouses which mostly rise to the order of GB or TB. Such huge amounts of BD could be quantified by new units such as Petabyte (1015) and Exabyte (1018).
• The second ''V'' stands for the velocity of ITS data, which has a wide variation range. Data could be continuously generated and collected in real-time at regularly-set intervals. The velocity of BD indicates the rate of data streamed into the host platforms. Besides the high incoming data rate, velocity causes significant questions on the issue of data aging for instance the validity duration of these data. Sometimes, streaming data have to be analyzed in real-time. As an instance, real-time analyses of video streams obtained from traffic surveillance cameras are crucial in predicting traffic jams and preventing bottlenecks.
• The third ''V'' refers to veracity also known as the ITS data trustworthiness level. In-time and reliable transportation data collection has been one of the serious challenges in the ITS community.
• The fourth ''V'' stands for the ITS data value that is highly dependent on the data age, their sampling frequency, and purposes. For example, consider a collision-prevention application where data aged few minutes old may be no longer valuable. In contrast, route planning applications can utilize non-real-time data. The value could be regarded as a measure evaluating the capability of extracting meaningful data and actionable business insight [89].
• As the fifth ''V'', variety denotes the complexity of BD formats. Besides the conventional structured data, BD often includes semi-structured (such as text data and images stream) and unstructured (like geospatial data streams) data with the ratios of 20% and 80%, respectively. It is believed that the term ''big'' will lose its meaning over time and the self-evident concept of 55474 VOLUME 9, 2021 FIGURE 9. The relationship between sensing (as a service paradigm) and smart cities and big data.
''data'' will be further extended to encompass the entire data types.
As the sixth ''V'' of ITS data, variability stands for a few different items including the lack of consistency in the data which have to be detected by anomaly and outlier detection routes to provide the possibility of any meaningful analytics. The variability of big data could be also assigned to the multitudes of data dimension as a result of various data types and sources. Variability could also indicate the inconsistency in big data loading speed onto the databases. The seventh ''V'' of ITS data refers to the validity. Like veracity, validity deals with data reliability for the considered application. As reported by Forbes, scientists spend about 60% of their time to clean their data prior to any analyses. The advantages of big data analytical systems are as beneficial as their principal data; thus, proper data management practices should be adopted to guarantee reliable data characteristic, common definition, and metadata.
The eighth ''V'' of ITS data indicates the Vulnerability. Big data raise new security issues. Big data containing a breach can be considered as a big breach.
The ninth ''V'' of ITS data refers to the Volatility. Prior to the emergence of big data, organizations stored their data indefinitely since a few terabytes of data may not result in high storage costs; they could be even stored in the live database leading to no functioning issue. Classical data settings may not even possess a data archiving policy. Nonetheless, the high velocity and volume of big data require careful consideration of volatility. Some rules should be established for data currency and availability in addition to ensuring prompt information retrieval if necessary.
Finally, as the tenth ''V'' of ITS data, the visualization is a measure of challenges in visualizing. Currently-available big data visualization approaches encounter several challenges as a consequence of in-memory technological restrictions and weak scalability, response time, and capability. Conventional graphs are not applicable in plotting a billion data points this necessitates various data representation methods, i.e. clustering of data or utilizing tree maps, sunbursts, parallel coordinates, round network plans, or cone trees. Combined with the variables due to the wide diversity of big data and velocity and their complicated associations, the development of a meaningful visualization would be a difficult task.
Storage, analysis, collection, and processing of the data are impossible through the current approaches as they include a huge amount of fast, dynamic, and various information.
Despite that, the comprehension of data usage in smart cities has remained limited. Some works have addressed the utilization of big data in a smart city to identify challenges in specific applications, including transportation, public safety, and sustainability [90]- [95]. Nonetheless, a limited number of investigations were focused on general knowledge about employing of big data in smart cities regardless of their application area [96].
As observed, BD is a result of technological advancements, not the applications or user's requirements. On the other hand, SCs [96] are intended to resolve the issues of modern cities. The rural population migration and suburban concentration to the cities have posed serious challenges to citizens as well as city authorities. Waste management, energy, education, transportation, health, water resources, crime, and unemployment are among these serious problems [35]. SCs have been developed to resolve the mentioned challenges through the use of ICT. According to its definition, SC should possess six smart features: economy, people, governance, mobility, environment, and living [59]. According to Fig. 9, SC and Big Data, with different origins, are converged toward a common objective. Sensing can serve as a service paradigm between them with numerous academic and industrial patterns.

2) IMPLEMENTATION CHALLENGES OF BIG DATA IN SMART CITIES
Design, improvement, and operation of big data in smart cities could be highly challenging. Since a smart city is a VOLUME 9, 2021 highly dynamic and evolving environment, these challenges should be eliminated or at least minimized [97], [98].
As mentioned before, the categories and amounts of data have immensely increased in modern cities offering versatile opportunities for the establishment of data-based smart cities. Similar to other large-scale transformations, moving toward a data-based smart city is not a simple task. In this context, issues arisen in the implementation of smart city projects due to the application of urban big data should be identified. This section is thus devoted to the problems emerging during the data-information transformation in a smart city [84], [99]. Fig. 10 shows the stages involved in collection of data to knowledge discovery in a smart city as well as the six major issues.
• Quality management: The proper quality of urban data is a prerequisite in the identification of reliable smart city information.
• Integration of diverse data: This process involves integrating data from various sources. According to Figure 10, different data could be gathered from various sources in a modern city. The important point is connecting these diverse data to provide the citizens and city authorities with reliable knowledge and excellent-quality information. As various organizations apply diverse data structures, this task could be highly demanding.
• Consideration of privacy issues: Privacy protection is a vital need in developing data-based smart cities to produce high-validity and sustainable value.
• Consideration of the citizens', visitors', and employees' demands: The determination of useful information plays an essential role in developing a data-based smart city as the usefulness of the delivered information has a direct association with the costumers' satisfaction from service.
• Improving the geographic information delivery method: A majority of big data are aimed at data analysis and delivering identified information based on a geographic element (e.g., district and building). In developing a data-based smart city, the knowledge content should be clearly visualized to increase information approval by the audiences. Design of city services: This aspect involves designing smart city services to deliver knowledge from urban big data. Urban big data offers diverse information [84], [100]. In this regard, the proper design of data-based smart city service is vital as it can integrate all the data analytical systems, forming related ideas, and knowledge content design results. Fig.10 illustrates the big data management in a smart city. This figure indicates the issues arisen upon implementing big data and the potential clarifications to resolve them. These smart decisions will positively affect the operations in a smart city.
Sometimes the recorded data might be short of the essential configuration which involves lost values. Then, these data could be useful after an effective and timely process [101]. Table 2 lists some of the major challenges of the implementation of big data in a smart city.

V. BIG DATA FRAMEWORK FOR SMART CITIES
Smart cities produce a huge deal of data that originated from various sources. Thus, the infrastructures should be able to store, process, and analyze such ever-growing data. Several prominent issues have to be considered in the design of a big-data framework for a smart city. Firstly, the big-data framework should ensure the effective storage of various data (structured, non-structured, and semi-structured) [56], [102]. Secondly, they should be capable of processing both real-time and historical data. Third, they should offer flexible data storage and processing (in cases with an unexpected load boost). Finally, they must show the ability to share the processed results across diverse applications/services ascendingly and scalable.
Considering the mentioned requirements, a conceptual bigdata framework is presented in Fig.11. This framework can be categorized into four zones: Data acquisition (Zone 1), Data pre-processing (Zone 2), Data storage (Zone 3), and Data Analytics (Zone 4). All these zones are interconnected, in a way that the output of one of them is used as the input to the subsequent zone. As the first physical layer, Data acquisition encompasses various types of sensors and objects interlinked through diverse networking techniques. Sensors generate the intended data. The communication occurs by wires or through wireless routes (RFID, WiFi, Zigbee, and Bluetooth). A gateway is also applied to connect Zones 1 and 2. Various gateways could be incorporated to obtain more robust frameworks [103].
Zone 2 pre-processes the raw naïve data from Zone 1 as these raw data have to be preprocessed prior to storage to modify their large volume and remove the duplicate data or uncertainty information. A typical preprocessing technique includes data cleaning, integration, and compression.
Data storage is accomplished in Zone 3 which stores and manages immense data sets. The data storage systems can be classified into two classes: storage infrastructure and data management software. The former encompasses the storage devices in addition to the networking devices to link the storage devices. Data management software is another essential requirement of data storage systems.
Zone 4 applies different data analytics approaches for extracting valuable information from large datasets. The data analytics techniques could be divided into three groups: (i) descriptive, (ii) predictive, and (iii) prescriptive as explained in Table 3. Table 3 lists the possible solutions for the issues arising in big data analytics of large wireless networks in terms of the mentioned zones. Many of these research challenges have VOLUME 9, 2021 FIGURE 11. Big data framework for a smart city. been resolved, however, many have remained unsolved. In the next section, the future directions in big data analytics for large-scale wireless networks are described in Fig. 12.

A. DISTRIBUTED DATA PROCESSING MODELS
Despite the huge efforts devoted to the development of distributed data processing models to be used in large-scale wireless networks, numerous issues have remained unsolved.
• Stream data processing: data storage and processing in memory is practically impossible regarding the huge amount of real-time data (e.g., sensor data from WSNs. Therefore, the conventional data analyzing algorithms requiring access to the entire data sets will not operate under this scenario [56].
• In-network processing: Given the tremendous wireless nodes distributed throughout the large-scale wireless networks, the integration of their data is a vital step in the data processing. Nonetheless, data fusion among distributed networks will significantly increase the communication cost. One approach to resolve this problem is in-network processing throughout the network which involves data processing at each individual node rather than the centralized servers. The computational costs can be further declined by clustering the nodes [115] which might result in new challenges: cluster selection to meet the big data requirement.
• Distributed processing: MapReduce [116] could offer several advantages, including simplicity, fault tolerance, and scalability. It, however, suffers from some limitations due to its lower efficiency compared to the other parallel processing models. Conventional parallel computing models (e.g. Message Passing Interface (MPI), OpenMP (Open Multi-Processing), FPGA-oriented programming, and GPU-based parallel processing (e.g., NVIDIA's CUDA)) exhibit better performances compared to the MapReduce and its analogous models. The integration of MapReduce-type models with parallel computing ones could enhance the performance. Moreover, the advantages of some processing platforms could ameliorate the available data analysis algorithms. For instance, FPGA-oriented Convolutional neural network (CNN) [96] and FPGA deep-learning have been explored in this regard.

B. BIG DATA ANALYTICS TO OPTIMISE THE NETWORK
Big data analytics (BDA) can be employed for the optimization of large-scale wireless networks. Here, several unsolved challenges in the use of BDA for network optimization will be discussed.
• Network resource management: According to BDA of network data, the network administrator is capable of predicting the network resource demands. As mentioned in [117], it is easy to predict congestion in some parts of the city upon the incidence of some social events like the marathon.
• Content-centric networks: As suggested in many previous works [118], storage of some popular content (also known as caches) at the base station could remarkably decline the real-time traffic and hence enhance the network activity. The determination of the cache is a new issue. It is possible to obtain cache information by the analysis of the application data. The privacy concerns and the heterogeneity of data from diverse applications make it difficult to attain precise user information.
• Network self-adaption/self-optimization: BDA can remarkably contribute to network self-adaption or self-optimization in self-organizing networks (SONs) [119]. Fan et al. [120] developed a self-optimization approach through integrating fuzzy neural network and reinforcement learning which managed to meet the coverage and capacity criteria of SONs. However, further studies are required in this novel field of research.

C. SECURITY AND PRIVACY CONCERNS
As two prominent challenges in BDA of wireless networks, security and privacy are tightly related whereas showing some differences [121]: Security involves data confidentiality, integrity, and availability while privacy guarantees the VOLUME 9, 2021 appropriate data usage avoiding the disclosure of user private information without his/her consent. Possible research themes in BDA security and privacy within wireless networks are: • Security in data acquisition: During data acquisition, wiretapping behavior may occur leading to information leakage. In this regard, huge studies have addressed the confidentiality of wireless networks. Encryption schemes could be applied in wireless networks [121]. Nonetheless, energy and computational limitations of smart objects have made these techniques impractical in IoT. Thus, novel lightweight protective schemes should be extended for IoT [122].
• Privacy and security in data storage: Personal confidential data could be leaked upon invasion to data storage highlighting the importance of data protection in this phase. Fortunately, the application of encryption algorithms in the data storage phase is more feasible compared to the data acquisition (transmission) phase. Albeit, the enforcement of privacy-preserved operations in data storage phase might be difficult, in particular when the service is provided by a third party entity [123]. Mobile Edge Computing (MEC) [124] is able to resolve the issue through data offloading from the unreliable third party to a trusted MEC server (near the user).
• Privacy in data analytics: The balancing privacy and efficiency of data analysis is a major challenge. The private documents could be protected by encryption and storage in a secure server (or a cloud). On the other hand, analysis of encrypted document is time-consuming, hence declining the data analytics efficiency [56]. There are numerous unsolved issues in data publishing [125], data mining output, and distributed data privacy [126] requiring deeper investigations.

D. MOBILE EDGE COMPUTING FOR BDA
Regarding the intrinsic limitations of wireless nodes like their power restrictions and lower computational ability, the computing tasks are deployed to a remote cloud server with higher computational ability with no resource limitations. Drawbacks such as high latency, performance bottleneck, context unawareness, and privacy exposure have also limited cloud computing [127]. MEC (or Fog computing) could be employed as a complementary part of cloud computing systems to overcome the mentioned drawbacks. The MEC relies on offloading the computational processes from a remote cloud to diverse MEC servers at base stations, IoT gateways, and WiFi APs near the end-users. Therefore, the extensive delay-tolerant jobs would be implemented at remote clouds whereas the less-intensive delay-critical, and context-aware ones are offloaded to an edge server.
• Computing task allocation: Different computing resources can be found in wireless networks. Therefore, the computation resources should be allocated to various computing devices which could be difficult in cases of allocating and coordinating within distributed large-scale networks.
• Lightweight BDA schemes: As expressed in [127] AlexNet (i.e., a typical convolutional neural network) can lead to high server-edge node communication costs. The resource restrictions of edge servers and mobile devices have led to development of lightweight BDA scheme and compressing BDA model whose hardware architecture, optimization, data compression, distributed computing, and machine learning require further investigations [103].

VII. CONCLUSION
The ever-growing increase in connected devices in the cities has resulted in speeding data increment attracting the attention of numerous researchers from diverse fields of study. This article is aimed at presenting a comprehensive review of the undeniable obligation of big data in the smart cities.
To this end, the enabling technologies were described. The architecture was also suggested to manage big data in a smart city. The essential role of big data analytics in the smart city's applications was also addressed. Different cases were explained. Several unsolved challenges were also described to suggest possible research directions for future studies. Finally, it was concluded that big data could play a decisive role in obtaining valuable information and decision-making processes. Nonetheless, research on the role of big data in smart cities is at the beginning of its journey, and resolving the mentioned challenges can facilitate its practical use.