Journals & Magazines >IEEE Access >Volume: 12

SensorsConnect Framework: World-Wide Web for Internet of Things

World-Wide Web For Internet of Things

Abstract:

The widespread adoption of the Internet of Things (IoT) has led to a surge in smart sensing devices connected to the Internet. While IoT enables machines, embedded system...Show More

Metadata

Abstract:

The widespread adoption of the Internet of Things (IoT) has led to a surge in smart sensing devices connected to the Internet. While IoT enables machines, embedded systems, and appliances to access the Internet, they do not interact with it as humans do through the World Wide Web (WWW). Unlike humans, IoT devices lack a unified framework like the WWW for collaboration and data sharing. This is primarily due to 1) separate infrastructure often required for IoT security and privacy and 2) challenges of limited connectivity, device heterogeneity, and evolving technology. This paper presents SensorsConnect, a system that connects IoT devices in a WWW-like framework, enabling real-time sensing data searches across a broad IoT context. It defines the architecture, core processes, and major challenges of SensorsConnect, along with strategies to address these challenges. A motivating scenario illustrates its potential impact in real-life situations, such as finding a drive-thru coffee shop or crossing a country border. Using real-time road status and service occupancy data, SensorsConnect enhances Google Maps service recommendations, not only minimizing customer wait times but also distributing workload more evenly across service points. Performance evaluation shows that SensorsConnect reduces average service times by 46% at drive-thru locations and 31% at border crossings compared to Google Maps. This promising application could improve public access to live data, supporting real-time decisions and enhancing quality of life.

World-Wide Web For Internet of Things

Published in: IEEE Access ( Volume: 12)

Page(s): 168500 - 168516

Date of Publication: 13 November 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3496892

Funding Agency:

Contents

SECTION I.

Introduction

The development of the Internet, accompanied by the World Wide Web (WWW) [1], has made a profound transformation in human life. Over the past few decades, we have become increasingly dependent on these revolutionary technologies in almost every aspect of our daily lives. The Internet has reshaped how we communicate, operate business, collect information, and even seek education, overcoming geographical boundaries and boosting international connectivity. Meanwhile, WWW, with its network of connected data, has given us remarkable access to extensive resources of knowledge, entertainment, and services, all within a few clicks.

The birth of the Internet opened the way for global data exchange and the WWW, in its inception, boosted content sharing by using unified protocols [2] (HTML and HTTPS) that were unsophisticated and smoothly adopted by the Internet community. Expanding on these fundamentals, the Internet of Things (IoT) has surpassed mere data exchange to connect not only individuals but also the vast list of everyday objects that surround us. The estimated figure of IoT devices installed around the world [3] exceeds 29 billion IoT devices by 2030, which is two times the number of devices installed today. However, these IoT systems [4] are fragmented. In other words, most IoT systems are deployed with devices connected to their IoT stacks (bubbles) but isolated from the outside world. Even though the current fragmentation in the IoT systems is by nature and, in many cases, systems are isolated for privacy or security reasons, we have neglected to consider alternative design approaches that by nature, their data publicity could be beneficial.

Projecting the human experience of using the Internet onto the experience of connected devices, it becomes evident that while things are now Internet-connected, they lack a standardized framework like the WWW for sharing data and documents. The concept of a global network of interconnected computers began in the 1960s, but the Internet’s significant impact on human life only became evident in the 1990s with the introduction of the WWW. Initially introduced by Tim Berners-Lee [2] to facilitate document sharing, the WWW has since become integral to key activities like learning, trading, communication, transportation, and entertainment. Similarly, a framework similar to the WWW for connected devices could profoundly transform their interoperability. For instance, smart cities relying on seamlessly connected devices, without the need for numerous APIs for fragmented IoT systems, could become a reality. In this context, our paper introduces SensorsConnect, designed to establish a cohesive connection among IoT devices, similar to the WWW’s role in connecting humans via the Internet.

Continuing this metaphor, after the Internet became widely accessible to the public and organizations and individuals gained the ability to create globally linked websites, finding specific information became a significant challenge. Search engines [5] addressed this need, making it easier to find desired information with remarkable speed. These engines have had a profound impact on the human experience by enabling rapid access to targeted information. As content within WWW communities became increasingly unstructured, crawling techniques emerged as a foundational method for building search engines. Historically, in the WWW framework, hyperlinks [2] on web pages played a crucial role in this development phase. The design of crawling algorithms is fundamentally based on these embedded hyperlinks, which navigate through web pages to link, index, sort, and cluster content, making it accessible via search prompts. Recognizing a similar challenge in locating intent-specific sensing data, we introduce SensorsConnect to address search challenges natively within the IoT context.

The remainder of the paper is organized as follows. Section II discusses related work in this domain. Section III presents the SensorsConnect architecture. Section IV provides an overview of the key processes of the framework. Section V presents a motivating scenario to demonstrate the usability and feasibility of the proposed framework and measures its impact compared to state-of-the-art methods. Section VI illustrates the SensorsConnect Research paths. Lastly, Section VII provides concluding remarks.

SECTION II.

Related Work

A. IoTCrawler

The fragmentation of IoT systems has shaped intra-nets of things similar to the Web in its infancy days. IoTCrawler [6] is an EU-funded project aiming to provide a framework, a search engine for data produced by IoT applications based on traditional search algorithms such as crawling, indexing, ranking, and discovery. This framework integrates existing IoT systems, enabling the generated data to be indexed, searched, and queried efficiently. IoTCrawler addresses the challenges associated with IoT devices that generate large amounts of heterogeneous data, which often vary in data formats, communication protocols, and semantics. The authors claim that the longest time that IoT developers spend is in the integration process. IoTCrawler has been inspired by the Semantic Sensor Network (SSN) [7], which enhances Sensor Web Enablement (SWE) [8] by incorporating a semantic layer. However, the SSN ontology, consisting of 41 concepts and 39 objects, is not suitable for IoT systems with resource constraints, as it demands significant computing power and storage. Although complex models can provide more detailed object queries, they are often challenging to implement and deploy. Their processing requirements make them impractical for resource-limited environments. Therefore, IoT models must consider the inherent limitations and dynamic nature of IoT systems.

IoT models need to define the relationships and properties that facilitate interoperability between IoT systems. This creates a design dilemma between simplicity and complexity in developing an effective IoT data model. To address this, IoTCrawler adopted IoT-lite [9], a streamlined version of SSN. Additionally, for data streaming, IoTCrawler implemented IoT-Stream [10].

IoTCrawler creators have built a robust framework that can solve the search for sensing data created by IoT systems. However, the IoTCrawler framework is complex to adopt by the IoT community alongside accepting the fact of integrating heterogeneous IoT devices. Going back in time, when Berners-Lee [2] introduced the World Wide Web (WWW) concept, the initial purpose was to share documents globally. The WWW framework initially had three components: The Hypertext Transfer Protocol (HTTP), The HyperText Markup Language (HTML), and the Web Browser. The first Web page [11] included some simple plain text and embedded text hyperlinks. In other words, the WWW framework starts simple and clearly defines three components that can easily be adopted and used to share the documents by the internet community, skipping the other protocols that may complicate the WWW framework at the releasing time. The Internet community went through a learning curve that helped evolve the WWW framework that we now know. Similarly, the IoT community should agree on a unified and simple framework at the beginning to create a sharable IoT world.

B. Sensor Web Enablement

Geospatial Consortium (OGC) [8] introduced Sensor Web Enablement (SWE), a framework that enables discovering, accessing, and integrating sensor data from different sources into web-based applications. SWE made countless efforts to provide standards to manage and share sensor data to faciliate the process of collecting data from a wide range of sensors and distributing it to many observation platforms. However, in 2006, IoT devices had not yet emerged in the world yet, and internet technology was oriented toward humans, making the design biased toward human needs. For this reason, Web technologies and standards, like HTML, influenced the design of the SWE framework. For instance, Sensor Markup Language (SensorML) [8], one of the SWE standards, is based on Extensible Markup Language (XML).

Hypertext Markup Language (HTML) is a protocol that we build to transfer web page layouts over the internet considering elements human needs such as head, body, footnote, paragraphs, etc. XML protocol has added extended elements not included in HTML and offers to build custom elements. That’s why SensorML adopted XML protocol to define the data model of sensors. Considering the goal is the observation of sensors at the front-end of a user, it makes sense to use these languages, HTML and XML, that can be styled by attaching cascading style sheets and become interactive by attaching javascript code for human needs. These protocols are efficient if the interpreter is a web browser that present the content to a human users with devices having relatively high constraints. However, for limited-constraint devices used in IoT systems, the JSON format has become the dominant choice for data transfer and manipulation due to its efficiency in parsing and storage requirements. In addition, Sensor Markup Language (SensorML) [12] provides a complex data model for sensors that contains heavy details about sensors for two reasons: first, it is initiated by the Earth Observation (EO) community [8], [13] and NASA, which are interested in these sophisticated details, like sensor tolerance, sampling rate, sensor manufacturing, etc. Second, OGC may not have imagined that connected devices will massively grow and less complex standards may be more efficient and easier to adopt. In addition to all these issues, the components used to build such a framework architecture have evolved during this period. In other words, during the release of SWE, web systems followed a server-client pattern, while recently, we have seen the evolution of web design and IoT systems, such as adding layers like edge and cloud. Therefore, we need to design the SensorsConnect architecture based on the recent technology used in IoT systems and to satisfy the needs of several interacting parties: humans and IoT devices.

C. Mobile CrowdSensing

Recently, the world has become crowded with devices with a myriad of sensing capabilities. Ganti et al. [14] introduced the Mobile CrowdSensing (MCS) concept that uses existing sensors on user devices to collect data and generate valuable information. The proliferation of user devices, such as smartphones, tablets, and smart wearable devices, enabled the realization of the MCS concept. The MCS collectively collects and shares data between participants and extracts information of common interest using individual mobile devices.

Although the proliferation of user devices has fortified the existence of MCS, it is the main curse of MCS. Offering MCS services to users as an incentive to share their data seems to be a compelling idea where service providers save the cost of deploying and managing sensing devices, and users get the services for free. Nonetheless, concealed paid prices in the form of challenges and limitations affect its widespread adoption. The following part outlines the significant challenges for MCS from the user and service provider perspectives.

1) User Perspective

Privacy: Privacy is a significant concern for users participating in MCS. For instance, some applications use motion sensors, such as accelerometers and gyroscopes, to detect users’ activities or monitor the surrounding world. Collecting such data could breach user privacy by leaking private information about their daily activities that they may not want to share with service providers.
Power: MCS considers these devices the data sources that require sending updates continuously, which dramatically reduces the battery run time.
Cost: Utilizing network bandwidth to transfer the data is a tangible cost in addition to the hidden cost of exploiting the collected data.

2) Provider Perspective

Data Redundancy and Scarcity: Though the proliferation of user devices shores MCS, the heterogeneous distribution of these devices in the sensing environment causes data redundancy in some places and scarcity in others [15].
Data credibility: Service providers have no control over user devices; hence, the reliability and accuracy of the collected data are likely to be compromised [16].
Complexity: Extracting the information from the collected data is challenging in MCS due to the dynamic state and heterogeneity of its data sources [17].

Therefore, SensorsConnect aims to reduce reliance on user data by providing unified interfaces [18] for infrastructure-based sensing deployed in the public domain.

D. Collaborative IoT

The concept of the Collaborative Internet of Things (C-IoT) [19] emerged within the evolving IoT landscape as a way to address the fragmentation of IoT systems, which often function as isolated silos. C-IoT introduced a paradigm shift, moving from individual, standalone IoT systems to a networked community of collaborative IoT systems. The author [19] proposed a collaborative model for building C-IoT architecture to replace multiple isolated solutions. In this model, IoT systems within a C-IoT framework utilize shared infrastructure to exchange resources and data, enhancing resource efficiency and reducing data redundancy. However, despite its potential, the C-IoT concept has not yet seen widespread adoption within the IoT community.

Behmann and Wu [19] proposed the C-IoT concept from the fields and domains perspective. He classified the IoT domains into individual (citizens), industry (businesses and organizations), and infrastructure (city and government). In addition, the field classifications, such as healthcare, transportation, agriculture, supply chain, etc., are commonly known. The conceptual idea of C-IoT is presented as a target diagram [19] with three circles that represent the three domains and triangles intersecting these circles to illustrate the collaboration of applications within the same fields.

An example of implementing this vision was when citizens’ phones, business stores and the minister of health collaborated to flatten the COVID-19 spreading curve in the healthcare field using the COVID Alert app [20]. Nonetheless, the COVID Alert app has failed to meet expectations even though it cost the federal government [21] in Canada about 20 million. Moreover, some provinces, like British Columbia, Alberta, Nunavut and Yukon, refused to use the app. The prominent challenges faced are limited adoption and privacy Issues. That led to around 96 percent [22] of the citizens who tested positive not using the COVID-19 app, and consequently, Health Canada decided to stop relying on it.

Aside from this vision, Montori et al. [23] introduced a C-IoT architecture, SenSquare, for smart cities and environmental monitoring. SenSquare was founded on the MCS concept, deploying user devices as the primary data source. The focus of SenSquare is to address data availability, not data quality, and to support the development of services by end users. However, we still cannot see wide adoption of SenSquare due to reliance on end-user devices and assuming end users can build their own services. Even though SenSquare or the earlier discussed C-IoT perspective has not made an immense impact, they presented some notions that we can embed in SensorsConncet, such as having a shareable infrastructure and giving privileges for business entities, not the end users, to build services. Furthermore, we have seen the exact root cause of failure in these articles, primarily relying on user data, which IoT systems have no control over, so there is no guarantee of availability.

E. Public Sensing

The Public Sensing (PS) term, which deploys and utilizes PS devices spread across public places, offers a proper solution to the problems faced by MCS. PS as a concept appeared before MCS; however, it was not widely adopted for two main reasons. First, the technologies were not mature enough to realize the PS concept at that time, causing academics and industries to prefer the deployment of MCS. Second, there was no clear definition or standard architecture for deploying public sensing as a service.

Multiple attempts have been made to define public sensing (PS). Al-Fagih et al. [24] described PS as a service based on versatile data sources within smart cities, such as cell phones, radio frequency identification tags, and sensors on roads, buildings, and living spaces. Younes and Elmougy in [25] consider PS as sensor networks constructed by the prevalence of smartphones. Perera et al. [26] introduce a framework for the Sensing as a service concept. They believe Sensing as a service stems from Smart City and IoT concepts. The authors anticipated the proliferation of sensing sources by 2020. Their article presents four conceptual layers for sensing as a service model: sensors/sensor owners, sensor publishers, extended service providers, and sensor data consumers. This conceptual architecture explained the sensing process; however, it overlooked the interactions of users and service providers. The authors of these articles [25], [27] considered PS a concept that utilizes only smartphones as data sources, similar to the MCS idea. Zhang et al. [28] explicitly mixed the PS term with MCS. Since then, the scientific community has used the MCS term that conveys utilizing user devices as data sources.

Today, the evolution of the PS concept in the literature can be seen as a fork in the road with two distinct paths: one path treats public sensing and mobile crowd sensing (MCS) interchangeably, relying on user-generated data, while the other focuses on utilizing installed devices in the surrounding environment. The widespread proliferation of user devices has significantly accelerated the growth of MCS over the past decade. However, the growth of PS, which depends on environmental devices, has been hampered by factors such as limited connectivity, device heterogeneity, and technological immaturity.

Currently, the advancements in technology and the emergence of IoT standards are reshaping the IoT landscape, potentially facilitating a broader PS adoption. While MCS faces various challenges, many of these could be avoided by embracing the PS concept through the use of existing environmental devices. To address this, we introduce SensorsConnect,1 an infrastructure-based sensing solution that implicitly adopts the PS concept within the widespread context of IoT systems.

SECTION III.

SensorsConnect Architecture

The primary objectives of SensorsConnect are as follows:

Hiding the heterogeneity of sensing devices by providing a cloud-unified interface.
Acting as a collaborative framework seeking to make the experience of data sharing between the service providers in SensorsConnect, like sharing documents between Internet users using WWW, by defining a unified data model the service providers agree on.
Providing a search engine for real-time sensing data through a unified interface where end users can improve decision-making, leveraging sensing data availability

SensorsConnect favors infrastructure-based sensing over mobile sensing (user devices), given that most businesses nowadays already have sensing devices installed on their premises for different purposes. Moreover, sensing devices [29] tend to be cheaper, so the installation cost of the sensing devices dramatically plunged compared with their prices when MCS was introduced. The core value proposition, in that case, lies in enabling these businesses to be able to hock up their sensing platforms easily and flexibly to a global sensing ecosystem to benefit their users and the general public directly or indirectly.

MCS exploited the proliferation of user devices to avoid the cost of deploying sensing devices. In contrast, deploying infrastructure-based sensing can be more beneficial for the individual and community as the overall expense is cheaper than exploiting user devices. For instance, counting people entering a mega store requires a single device at each entrance. However, using user devices, as the popular time service in the recent Google Maps update, requires frequent updates from each user device in this store. Using this mega store example, the following arguments explains in-depth why prioritizing the use of infrastructure-based sensing in our framework is efficient than MCS.

It requires significantly lower network bandwidth than MCS, as only one devices will be now updating the backend.
Using counters at the entrances reduces the overall power consumption required to calculate the number of users. In contrast, using MSC requires a higher power consumption because it combines the power consumed by thousands of user devices that continuously send their data and computing devices that handle the collected data and estimate the number of users.
It can achieve higher accuracy than MCS since not all users have smartphones that have navigation sensors. In addition, some customers with smartphones within the store might not have installed this application, which will affect the accuracy of the estimated crowdedness value, and that’s the same reason for the failure of the COVID Alert [22]. In other words, MCS usually counts on estimations based on the presence of devices running the applications inside the store. In contrast, SensorsConnect can have more reliable measurements relying on the devices installed on the store premises.
Breaching user privacy in SensorsConnect is diminished, assuming it will replace relying on users’ data using infrastructure-based sensing. SensorsConnect can violate privacy in some situations, even though its devices are installed in public zones. No doubt, it is better than collecting data directly for user devices.

MCS will remain partially a source of data collection in SensorsConnect till the infrastructure becomes mature enough to abandon relying on user devices entirely. Although we still need to handle privacy, practically speaking, users would be less vulnerable to the breaching of data privacy. The service providers depend on the availability and distribution of these devices to collect real-time data and offer their services. In such a case, the distribution of the data sources is used to overcome data redundancy and security. Furthermore, SensorsConnect overcomes data credibility and the risk of data manipulation that MSC faces. For instance, Weckert [30] manipulated Google Maps by creating a fake traffic jam using 99 smartphones in a handcart. Also, we expect that the complexity of extracting information in SensorsConnect will be less than that in MCS because there is no need to deal with the dynamic state of mobile sensing devices; instead, we can extract information directly on the edge device without the need for complex algorithms in the cloud.

Figure 1 presents the SensorsConnect architecture that we recently introduced [31]. It consists of five main layers: Perception, Edge, Cloud, Business, and User Interface. In Figure 1, the layers with three stacked containers (Perception, Edge and Cloud) have multiple instances in the framework. In the following, we define each layer in detail, describe its functionality, and highlight the significant challenges to be addressed.

FIGURE 1.

SensorsConnect architecture.

Show All

A. Perception Layer

The Perception layer collects and delivers sensing data to the Edge layer in appropriate formats. From the computing capability perspective, we classify the devices of the Perception layer into two classes: Limited computing power devices such as devices to collect the surrounding weather conditions, detect door motions, or read RFID cards. High computing power devices such as surveillance cameras to count people or traffic cameras to monitor road conditions. Figure 1 shows a few examples of the sensors that SensorsConnect can hock. For example, the framework can connect the car charging stations to help find and book a charging station spot or hock RFID scanners in stores and warehouses to publish the currently available products in each store and warehouse or the whole market. Also, integrating parking devices makes looking for a spot in a city downtown during rush hours easier, given that the framework facilitates data sharing between parking lots and navigation apps.

1) Challenges

Heterogeneity of IoT devices: Myriad IoT platforms, sensing devices, and data types pose the heterogeneity challenge in the Perception layer. Although IoT devices commonly use data transfer protocols, such as MQTT, AMQP, CoAP, and DDS, and SensorConnect can provide an adaptor for each, IoT vendors define in-house standards for payloads (messaging standard). Unlike WWW, the framework was released using HTTP as a data transfer protocol and HTML as a standard defining payload.
Security: Sensing devices are closely integrated into our daily lives, providing crucial information to emergency response, traffic flows, and autonomous vehicle systems. In some cases, securing these devices can be life-critical. However, many sensing devices [32], [33], [34] cannot execute complex security algorithms due to their processing, memory, and storage constraints, making them particularly vulnerable to cyber-attacks. SensorsConnect can mitigate this issue by using the EdgeChain framework [35], which offers a secure method to connect low-computing devices with the system.

B. Edge Layer

The Edge layer manages and processes data collected by the Perception layer, acting as a data adapter by integrating four key components to handle this data effectively. Given the current fragmentation and heterogeneity of sensing devices, a heterogeneous API component is essential. It provides a range of APIs, including those supporting common IoT protocols and vendor-specific APIs, to facilitate the connection of various sensing devices. The ML/AI engine component performs feature extraction tasks directly at the Edge layer, helping to distribute the computational workload between the Cloud and Edge, while minimizing unnecessary data transmissions. Local data storage offers the necessary memory space for processing, extracting, and preparing the collected data. To deliver data to the Cloud layer, standardized methods are required to ensure seamless collaboration between IoT systems. Therefore, the Edge layer includes a Unified Interoperable Driver for IoT (UIDI) [18], which structures the data into a standardized format, making it easily accessible and usable by other IoT systems without the need for additional processing.

1) Challenges

Standardization: A crucial challenge in the Edge layer is the lack of standardization, as it connects to the Perception layer, which consists of heterogeneous devices. The component of heterogeneous API is required to handle a variety of communication protocols, frameworks, and development tools, so it integrates different IoT systems into SensorsConnect. The Edge layer uses UIDI [18] protocol to standardize data transmitted to the Cloud to address heterogeneity in data and promote collaboration. Nevertheless, numerous works remain to be conducted to cover all existing platforms, protocols, etc.
Latency: Latency [36], [37] is another hurdle for achieving real-time responses in SensorsConnect. On the one hand, SensorsConnect reduces transmission latency by minimizing the data sent, focusing only on necessary features and extracting information from raw data. The feature extraction process, however, causes delays, which can contribute to overall latency. Performing feature extraction efficiently without affecting real-time responsiveness requires adequate computing resources, which may be unavailable in many cases. [38].

C. Cloud Layer

The Cloud layer serves as the framework central component, connecting all layers:User Interface, Business, Edge, and Perception. It offers a variety of interfaces through a set of APIs designed to accommodate different Business needs. This layer implements resource management APIs that manage resources across various layers, such as IoT devices in the Perception layer and data management in the Cloud layer. Additionally, It also implements APIs for connecting, installing, and configuring IoT devices and external APIs that integrate outsourced APIs, like Google APIs [39], with SensorsConnect.

Furthermore, IoT systems integrated into the framework possess computing clusters in the Cloud layer. These clusters allow IoT systems to execute multiple functions for different services on shared computing instances, facilitating collaboration between integrated IoT systems. These cloud functions include managing and controlling IoT devices in the Perception layer, connecting and collaborating with other integrated IoT systems, and carrying out prediction tasks. Thus, this setup enables collaboration between integrated IoT systems and encourages data sharing among IoT systems.

This layer deploys multiple databases to manage and maintain historical data, real-time/fresh data, and cache data. The primary role of store real-time and historical data is to enable the system to respond to real-time queries, conduct data analysis, and build predictive models. The cache server optimizes the retrieval process to reduce the processing times for repeated queries.

1) Challenges

SensorsConnect aims to respond to queries in real-time. However, the Cloud layer faces significant data challenges, including heterogeneity, lack of standard data modelling, data management, data availability, scalability, and unbalanced load during peak times.

Data challenges: The main challenge in the Cloud layers revolves around data management [40], which is a multi-dimensional issue encompassing the following aspects:
Heterogeneity: SensorsConnect shall support various services across different domains, such as transportation, healthcare, smart cities, smart buildings, and e-commerce, each with distinct data structures and requirements. This diversity leads to heterogeneous data within the SensorsConnect infrastructure. Finding a universal data model that accommodates all these data types is challenging. There are two primary data models: relational and non-relational. The non-relational data model is more suitable for SensorsConnect because it outperforms the relational model in key areas:
1. Flexible: Relational data models have predefined schemas, which work well for structured systems like banks, schools, or HR management. In contrast, the SensorsConnect architecture requires a non-relational data model due to its flexible schema, which can adapt to the evolving needs of the system.
2. Scalable: Non-relational models can easily scale horizontally (adding more instances or rows) and vertically (adding new attributes or columns). In contrast, relational models face challenges with vertical scaling due to their fixed schema defined during the design stage.
Availability: Data availability is vital in the Cloud layer, as the users expect prompt responses from sensing devices. While data availability relies on low latency between the Perception and Edge layers, external factors like network conditions can also affect overall delays. Also, if real-time data is unavailable, SensorsConnect can provide estimated responses based on historical data, which be may not accurate. Additionally, if the framework lacks sensing devices within the scope of a specific query, it can estimate values using data from similar available sensors or outsource data from external services not integrated within SensorsConnect.
Scalability: Data growth raises another challenge in the Cloud layer.The issue is not only storing large volumes of data but also managing and maintaining this data growth while ensuring the high-quality of service and providing real-time queries from sensing devices
Shareability: The framework aims to foster a collaborative environment where services can share data rather than repeatedly requesting the same data from user devices. A unified data model is essential to support collaboration between service providers. SensorsConnect encourages service providers to adopt UIDI by offering access to shared data from other services.
Barriers of Collaboration: Service providers within SensorsConnect are encouraged to share collected data. Additionally, SensorsConnect aims to identify potential services that do not yet exist but could fulfill emerging needs. However, discovering these new services and estimating their potential benefits is challenging.

D. Business Layer

The Business layer allows hosted entities, such as enterprises, organizations, and service providers, to access the framework and provides them with the necessary management tools. It consists of four components: master console, analytics dashboard, pricing tools, and business interface. The master console provides hosted entities access to their installed devices across different layers, including instances of cloud functions, databases in Cloud, and IoT devices in the Perception layer.

Also, the analytics dashboard offers entities interactive visualization capabilities to monitor the status of installed devices and view real-time query volumes and historical traffic, helping track any anomalous events. For example, in the points of sale for a specific service, the Analytics Dashboard can predict the required production volume and suggest customer offers based on analyzing historical customer data. Additionally, Business Interface provides Cross APIs to facilitate the integration of additional services if needed. For instance, if a service provider requires users to make additional interactions, such as booking parking spots or completing purchases, Cross APIs must route these requests to their system.

Pricing Tool defines three pricing schemes: Pay-as-you-go (PAYG), On Demand (OD), and Spot Market (SM). It recommends one or multiple schemes based on business needs, tracks usage, and suggests pricing plan updates when necessary. The cloud computing [41] became the fifth utility, coming after electricity, gas, telephony, and water. Thus, it’s crucial for helping business entities select suitable pricing schemes [42] that minimize expenses.

1) Challenges

Service Expandability Some businesses may need to perform transactions, such as purchases or bookings, that are not directly supported by the framework, making service integration challenging due to the heterogeneity of services. For example, Google Assistant [43] directs users to the service provider’s application to handle such needs. SensorsConnect aims to offer service integration APIs to help entities integrate services that require further actions after receiving real-time queries and completing interactions without directing users to external apps.
Legacy compatibility: Some entities may have legacy infrastructures incompatible with modern systems, complicating migration. For example, a business might have an existing traditional Relational Database Management System (RDBMS). SensorsConnect must provide transformation functions to facilitate the migration of such databases to Cloud.

E. User Interface Layer

The User Interface layer is the front end of the framework, where users can access its services. This layer provides a unified user interface for all integrated IoT systems, personalizes and manages user content, and discovers and recommends services based on user interests [44], [45]. It fulfills these roles using the following components.

The query handler enables users to access SensorsConnect services using multi-modal queries. Additionally, privileged users can access extended functionalities, which enable them to manage, maintain, monitor, and administer SensorsConnect services.

Additionally, the Personalization component delivers personalized services to users based on their interests and contexts. It also adapts the user interface according to user preferences. For instance, if a user prefers voice commands, the interface can be customized to be voice-friendly. This can include features such as a welcoming voice message to initiate a conversation with the user.

Also, Context Manager holds all data describing the user and consists of three types of data:

User preferences are collected during user interactions. For example, a user might prefer riding buses instead of taking taxis.
current context is predicted using sensing devices built into the user device. For instance, the system might predict that a user is in a vehicle by measuring their speed with the built-in Global Positioning System (GPS).
user habits are recognized by analyzing user activities over longer periods, such as a day, a week, or a month. For example, it might be observed that the user usually drinks coffee in the morning on their way to work.

Lastly, Service Discovery is responsible for discovering new services for each user by analyzing his behaviour, preferences, and context.

1) Challenges

Query understanding This is a crucial aspect of the User Interface layer, as the efficiency in handling and interpreting queries directly impacts how quickly SensorsConnect can retrieve accurate responses. Query understanding is a core component of search engines and is typically measured by how quickly users get the results they intend to see. While search engines use sophisticated performance metrics and algorithms to rank results based on the likelihood of matching the query [46], [47], virtual assistants like Google Assistant, Siri, and Alexa use chatbot technology to handle user queries. Unlike search engines, which use crawling methods to find results [48], these assistants often use structured response methods and fixed patterns.
Recent advancements have seen improvements in chatbots, such as ChatGPT and Gemini, which generate responses more like human conversation. However, models like GPT-3 are trained on historical data and may not answer queries related to current events. To address this, tools like ChatGPT and Gemini have integrated external sources to provide responses based on the latest web data. SensorsConnect abstracts and structures IoT data using the unified interface UIDI, enabling the use of lightweight versions of large language models (LLMs) for query understanding. Retrieval-augmented generation (RAG) [49] is another method that provides contextual information to improve response accuracy. Nevertheless, RAG can be resource-intensive, especially with the need for frequent updates of real-time IoT data.

SECTION IV.

SensorsConnect Key Process

The SensorsConnect architecture operates through two main workflows: Collect-Store and Query-Respond. The Collect-Store workflow gathers data from IoT devices installed in public or private settings that are subscribed to SensorsConnect and store this data for future or real-time access. The Query-Respond workflow manages user queries by first receiving requests via a unified interface, then retrieving relevant real-time sensing data, and lastly preparing a response tailored to user preferences.

A. Collect-Store

SensorsConnect implements a generic data pipeline with three stages: Data Collection, Data Preparation, and Data Storing.

1) Data Collection

sensors in the Perception layer collect the states of the measured phenomena from the ambient. The data collection process implicates:

Signal conditioning: The collected sensors’ signals may need some adjustment, such as noise filtration, signal amplification, and signal levelling.
Signal conversion: Sensors and transducers convert changes in physical phenomena into changes in electrical parameters, such as changes in the resistance, current, or volt. Then, the analog signals that capture the sensing phenomena transfer into digital signals by analog to digital converters.
Signal Transmission: After having a digital version of the phenomenon of interest, sensing devices send the collected data to the Edge layer via heterogeneous APIs.

2) Data Preparation

The data preparation includes Extract, Transform, and Load (ETL) running at the Edge layer. Data extraction involves cleaning the collected raw data and identifying important features and information from the data streams. Then, data transformation reformats the extracted features, following a structured format. Lastly, Data loading stores the formatted data in the real-time database.

3) Data Storing

SensorsConnect revolves around storing data in a structured manner that guarantees efficient storing and retrieving real-time data to be ready for users’ queries. When the data management component receives collected data, it saves it in the real-time database and moves the existing record to the historical database. With this approach, data management improves utilization efficiency by keeping the real-time database as light as possible and storing historical data in cold storage, given that the historical data has a lower calling rate and cost than real-time data.

B. Query-Respond

Users can submit queries to SensorsConnect, which responds using real-time data collected from sensing devices. When real-time data is unavailable, SensorsConnect utilizes machine learning and artificial intelligence algorithms to infer the needed information from historical data stored in the Cloud layer. Figure 2 illustrates the main processes in the Query-Response cycle. SensorsConnect provides a graphical user interface (GUI) that consolidates integrated services, allowing users to customize their accounts to include only their preferred services. Additionally, users can submit requests through text and voice interfaces. For the voice and text interfaces, the Query-Response cycle involves three key steps: query reception, query processing, and query retrieval.

FIGURE 2.

Query-respond process.

Show All

Query understanding is a vital component of web search engines. Search engines return ranked results that may match the user intent. The framework adopts similar approaches. However, it seeks to provide only one or two results that should match the user intent. Retrieving one or two results matching the user intent in SensorsConnect will be much easier than the traditional web search engine, considering unifying the collected data and storing it in a structured format. In contrast, The web search engine uses crawling techniques over unstructured data from websites [48] to find matched results. The vast structured data collected from sensing devices creates parallel web content space. This can potentially replace crawling and indexing techniques, which retrieve a list of ranked results to avoid misinterpretation of users’ queries. Thus, modern Natural Language Processing (NLP) techniques can retrieve only the desired response seamlessly. Figure 2 names this data space by chatbot data cloud.

SensorsConnect deploys a chatbot agent to act as customer service to assist users in getting their services using text and voice queries. The query using the text interface goes through 3 processes [50]:

Tokenization splits a query into words to parse and perform Part of Speech (POS) tagging for each phrase in the sentences of the query.
Entity extraction identifies entities in a text, such as places, persons’ names, companies’ names, values, and percentages.
Intent matching uses the extracted entities to know the user intent using the training phrases and then sends the extracted entities, intents, and the query text to the LLM agent.

Then, using the extracted information alongside the query text, the LLM agent uses the available tools [51] in the server to retrieve the real-time sensing data required to generate the query response.

SensorsConnect integrates functions to convert speech to text on top of the text interface stack to support the voice query. These functions are as follows:

After the microphone gets the user’s voice, the signal passes through a pre-processing stage that includes noise filtration, amplification, and analog to digital conversion [52].
Automatic Speech Recognition (ASR) [53] converts the voice signal into text.
The recognized text is fed into the same cycle of the text query to generate a response in a text format.
This response in a text format passes through a text-to-speech converter based on the Wavenet model [54].
Finally, the generated signal [52] is converted to analog, filtered, and amplified before being emitted by the speaker.

SECTION V.

Motivating Scenario

SensorsConnect unlocks new possibilities by leveraging publicly available data to enhance existing services and introduce new ones. This section demonstrates how SensorsConnect can improve customer experience in recommendation engines by reducing service time—a scenario showcasing just one of its many applications. However, this example should not limit readers’ vision for how SensorsConnect could reshape the IoT landscape if widely adopted. Similar to the World Wide Web (WWW) [2], which began as a tool for sharing documents via hyperlinks and has since expanded far beyond that original purpose, SensorsConnect holds transformative potential for IoT.

By connecting sensing devices installed on service premises, SensorsConnect can enhance recommender engines. To illustrate, we compare SensorsConnect’s performance to Google Maps in a scenario where service recommendations are based on real-time occupancy data. Currently, Google Maps suggests service locations based on travel times, which consider distance and estimated road traffic. However, without occupancy data, the nearest location may not be the best choice, as it could be highly crowded, resulting in longer wait times.

In our hypothetical deployment, we assume that sensing devices at service locations are integrated into the SensorsConnect framework. For simplicity, we refer to the recommendation engine powered by SensorsConnect as SensorsConnectR, and Google Maps’ engine as GoogleR. With real-time access to occupancy data from these devices, SensorsConnectR can recommend services based on both travel time and occupancy, potentially reducing wait times for customers and balancing workloads across service points. The following section presents a statistical analysis evaluating the potential advantages of SensorsConnectR over GoogleR using datasets from two services: finding a drive-thru coffee shop and crossing the USA-Canada border.

A. Modeling The Motivating Scenario

Google Maps has dramatically evolved in the last decade, and one of its main services is providing users with live updates about state traffic conditions. In a real-life scenario, a user searches by text, voice, or shortcut icons to find services, such as groceries, gas stations, or hotels. Google Maps indicates a sign that a place is crowded based on estimating the number of present individuals using data collected from their smartphones. However, GoogleR suggests the 20 nearest locations that match the user’s request and sorts them in ascending order by the travel times without considering occupancy factors. Since GoogleR depends only on travel time, a user can spend a longer time getting his service if the closest place is relatively busier than the others. SensorsConnectR can utilize the sensing devices installed on service premises, such as surveillance cameras or door sensors, to report instantaneously how busy the services are. Then, SensorsConnectR sorts the suggested list using the travel time and the calculated wait time for each listed service. Thus, SensorsConnectR potentially enhances the user experience compared to GoogleR and distributes the customer load between service locations.

We have developed two algorithms to assess the performance of SensorsConnectR compared to GoogleR. Algorithm 1 illustrates how GoogleR works, simulating it using developer Google APIs [39]. The procedures of GoogleR are as follows: First, the user writes and submits a query for a particular service he is interested in. Second, After the query is submitted, it goes to the query understanding [48] component to determine the user intent. Third, the system returns the 20 nearest matched places using the Nearby API [39]. Fourth, using the Distance Matrix API [39], GoogleR estimates the travel times based on traffic congestion for each route of the matched result. Equation 1 shows the formula to estimate the travel time for route $i, t_{i}$ , where the $v_{s}$ is the average speed for road segment s, and $l_{s}$ is the length of the road segment s.

$\begin{equation*} t_{i} = \sum _{s=1}^{n} \frac {l_{s}}{v_{s}} \tag {1}\end{equation*}$ View Source

This function returns a Travel Time array,

$T_{(m,1)}$

, that contains m estimated travel time for the matched results.

$\begin{equation*} T_{(m,1)} = \begin{bmatrix} t_{1} \;\; t_{2} \;\; \ldots \;\; t_{m} \end{bmatrix}^{T} \tag {2}\end{equation*}$

View Source

Finally, GoogleR sorts the list of results in ascending order by travel time and presents them.

$\begin{equation*} R_{g}=Sort(T_{(m,1)}) \tag {3}\end{equation*}$

View Source

Algorithm 1 GoogleR

procedure Recommending based on Travel time

Get the user query

Determine the user intent

procedure Nearby Search [39]

Get matched places IDs’ for the 20 closest places

end procedure

procedure Distance Matrix [39]

Calculate the travel time for each matched place

end procedure

10:

Sort the places in ascending order by the travel time

11:

end procedure

Algorithm 2 explains the procedures of SensorsConnectR. The first four steps are the same as GoogleR, except for the parts that estimate the wait time for each matched place. The process returns the wait time array, $W_{(m,1)}$ , where $c_{1}$ , $c_{2}$ to $c_{m}$ and $s_{1}$ , $s_{2}$ , to $s_{m}$ are numbers of customers and the average service time per customer, respectively, for place 1, 2, to, m.

$\begin{equation*} W_{(m,1)} = \begin{bmatrix} c_{1} s_{1} \;\; c_{2} s_{2} \;\; \ldots \;\; c_{m} s_{m} \end{bmatrix}^{T} \tag {4}\end{equation*}$ View Source

Algorithm 2 SensorsConnectR

procedure Recommending based on Travel time and wait time

Get the user query

Determine the user intent

procedure Nearby Search [39]

Get matched places IDs’ for the 20 closest places

end procedure

procedure Distance Matrix [39]

Calculate the travel time for each matched place

end procedure

10:

procedure Wait-time Matrix

11:

Estimate the wait time for each matched place

12:

end procedure

13:

Sort the places in ascending order by the service time

14:

end procedure

Then, the algorithm calculates the overall expected service time, $T_{s}$ , for each place, where $T_{s}$ equals the estimated travel time plus the estimated wait time as shown in equation 5.

$\begin{equation*} {T_{s}}_{(m,1)} = T_{(m,1)}+ W_{(m,1)} \tag {5}\end{equation*}$ View Source

Finally, it sorts the results in ascending order by the estimated overall service time $T_{s}$ .

$\begin{equation*} R_{SensorsConnect}=Sort({T_{s}}_{(m,1)}) \tag {6}\end{equation*}$ View Source

B. Drive-Thru Coffee Shops

Getting a coffee using the drive-thru during rush hours can take up to 15 to 20 minutes, causing customer dissatisfaction [55]. Customers often depend on their heuristic experience to find the least busy location to get their morning coffee, but sometimes, their choices may not be optimal. To find the drive-thru in the downtown Toronto area with the fastest service time, SensorsConnectR can utilize vehicle counting sensors or extract the count from the surveillance cameras installed to monitor the drive-thru lines at coffee shops to calculate the expected wait time to serve customers. Google Maps estimates the current occupancy of places using the presence of user devices [56]. The popular times of fields present the hourly occupancy average over the recent six weeks for businesses during the week. To be more realistic, we scraped popular times data2 from Google Maps for real coffee shops in the downtown Toronto area using a Python script3 with Selenium API [57].

Assume a user is going to query where they can get a coffee. Figure 3 illustrates a scenario in which a user relying on GoogleR spends more time than one relying on SensorsConnectR. According to GoogleR, considering only travel time, the recommended location has a service time of about 35 minutes. In contrast, SensorsConnectR, taking into account travel time and service time, has an approximate service time of 15 minutes. Therefore, SensorsConnectR utilizes the installed devices in drive-thru coffee shops alongside the trip time to recommend the optimum place with the lowest overall service time. We utilize the Nearby API and Distance Matrix API [39] to locate nearby drive-thru coffee shops and determine the travel times from the user’s location to the nearby coffee shops. To assess the performance of SensorsConnectR compared to GoogleR, we calculated the average spend times of 20 different users at 20 locations getting coffee during the weekday open hours, using both recommenders. We assumed that the popular time divided by 10 equals the number of customers, and the average service time for the drive-thru option is between 2 and 4 minutes. [58].

FIGURE 3.

A drive-thru query in Downtown Toronto shows that the nearest drive through could have a longer wait time.

Show All

Figure 4 shows the average time spent getting coffee on weekdays using the two recommenders. Compared to GoogleR, our recommender reduces the average daily service time by 37% to 59%, depending on the weekday, with an overall average reduction of around 46%. The curve lines depicted in Figure 5 represent the average wait times for getting the service using the two recommenders on weekdays (Monday to Friday) during working hours. These lines generally show that the coffee shops are significantly crowded from morning until noon. If all the nearby coffee shops are empty, the closest ones would be the best choice, as travel times would be the dominant factor in the overall service time calculations. Thus, SensorsConnectR and GoogleR curves converge in non-busy times and diverge in peak times. Figure 6 shows the average consumed times to get served on weekends (Saturday and Sunday) during working hours. Because most people do not work on the weekend, the coffee shops are less crowded than during the working days, with only two peak operating hours. Thus, on weekends, in general, the two recommenders have, on average, lower overall service times compared to their recommendations on working days. These two line graphs indicate that SensorConncetR is most effective during peak hours, reducing average service times by 81.8

FIGURE 4.

The average service times for drive-thru during weekdays.

Show All

FIGURE 5.

The average service times for drive-thru over the working hours during working days.

Show All

FIGURE 6.

The average service times for drive-thru over the working hours during weekends.

Show All

C. USA-Canada Border Crossing

The USA-Canada border is one of the busiest crossing borders in the world. We conducted the study using datasets for the historical border wait times of the US-Canada borders [59]. The flow of non-commercial travellers [59] is more likely to experience long waiting times compared to commercial flow. Therefore, we conducted this study to assess how SensorsConnectR could enhance the border-crossing experience for the busier flow and evenly distribute travellers at borders, presuming getting real-time traffic data from the public sensors installed at the USA-Canada borders. The study includes seven USA-Canada borders around the Niagara area, and we selected 20 distributed locations inside the USA near Niagara Falls to represent different travellers’ queries. Like the drive-thru coffee shop service, we used the Google Distance Matrix to estimate trips’ durations for these 20 locations, measuring cross-border times 24/7 using the two recommenders, given the value of the wait times at the borders.

The multi-bar chart in Figure 7 gives information about the average service time for both recommenders during the weekdays. As mentioned in the coffee shop case study, if the occupancy factors of places are equal, both recommenders suggest the nearest location. Since most of the borders are empty during most working hours in working days, the two recommenders’ average wait times are almost identical except on Friday. Based on this article [60], the traveller flow is typically heavy on Fridays and weekends. Thus, using SensorsConnectR during these days can be effective and minimize the total travel time, as the chances of a far border from a query having a lower waiting time than a close one increase. Since SensorsConnectR can detect unoccupied borders during rush hours, it can reduce crossing times, which can be crucial for many travellers. Figure 8 shows the average time required to cross the border using the two recommenders for the 20 travellers along the day on working days. They often suggested the same border from 12 am to 1 pm since the nearest border for the chosen 20 locations is empty. Like the working days’ curves, Figure 9 presenting results for the weekends shows both recommenders typically selected the nearest border during the period from 4 am to 11 am, which is a shorter period compared to the unoccupied period during the working days, given the weekends are busier. In the remaining period, generally, SensorsConnectR provides suggestions with lower average crossing times at the border compared to GoogleR. In conclusion, this analysis indicates that the impact of using SensorsConnectR in the border crossing study effectively diminishes travellers’ times by 31 percent on average during rush hours. In other words, SensorsConnectR can help flatten the travel time curve at peak hours by distributing travellers based on border occupancies to avoid creating congestion zones.

FIGURE 7.

The average service times for USA-Canada borders during weekdays.

Show All

FIGURE 8.

The average service times for USA-Canada borders over the working hours during working days.

Show All

FIGURE 9.

The average service times for USA-Canada borders over the working hours during weekends.

Show All

SECTION VI.

SensorsConnect Open Research Issues

Adopting a new framework by the IoT community is a key dilemma. Thus, it is crucial to define open research issues in SensorsConnect, which must have clear directions and objectives to keep the IoT community informed and ensure the project continuity. This section defines the research paths researchers can pursue toward contributing to SensorsConnect.The Sankey graph shown in Figure 10 depicts research paths for each layer of the SensorsConnect architecture.

FIGURE 10.

The research paths.

Show All