Unlocking Edge Intelligence Through Tiny Machine Learning (TinyML)

Machine Learning (ML) on the edge is key to enabling a new breed of IoT and autonomous system applications. The departure from the traditional cloud-centric architecture means that new deployments can be more power-efficient, provide better privacy and reduce latency for inference. At the core of this paradigm is TinyML, a framework allowing the execution of ML models on low-power embedded devices. TinyML allows importing pre-trained ML models on the edge for providing ML-as-a-Service (MLaaS) to IoT devices. This article presents a TinyMLaaS (TMLaaS) architecture for future IoT deployments. The TMLaaS architecture inherently presents several design trade-offs in terms of energy consumption, security, privacy, and latency. We also present how TMLaaS architecture can be implemented, deployed, and maintained for large-scale IoT deployment. The feasibility of implementation for the TMLaaS architecture has been demonstrated with the help of a case study.

perception of the environment. 23 A. MOTIVATION 24 The symbiosis of AI and Internet-of-Things (IoT) is nat- 25 ural. IoT provides a perception layer for many smart-city 26 The associate editor coordinating the review of this manuscript and approving it for publication was Wai-Keung Fung . applications and ML augments sensing capabilities by deriv-27 ing intelligence from data collected through this percep-28 tion layer. A typical IoT device will have a micro-controller 29 unit (MCU) potentially with an integrated low-power radio 30 transceiver for wireless connectivity. Each MCU is furnished 31 with a limited amount of memory and is typically designed 32 to operate on a coin cell for several years. These low-cost 33 IoT devices have a small footprint as often designed to 34 be less obtrusive. In current IoT architecture typically data 35 from the end-nodes/IoT devices is transmitted and aggre-36 gated at the gateways. The transmission of data is sup-37 ported by a variety of connectivity technologies ranging from 38 Unlicensed WiFi and Low-Power Wide-Area-Networking 39 (LPWAN) technologies such as LoRa and Sigfox, as well 40 as Licensed Cellular radio e.g. LTE for Machines (LTE-M) 41 and Narrow Band (NB) IoT. The gateways are then con-42 nected to Cloud platforms via a broadband internet con-43 nection and allow storage, processing and inference on the 44 The current architecture not only incurs power cost to 75 facilitate wireless connectivity but also introduces non-76 deterministic delay. Fig. 2 shows the probability density func- Transport (MQTT) packet published by an IoT device e.g. 79 temperature sensor for a subscriber such as a remote mon-80 itoring tablet. Each packet is 140 bytes in size and latency 81 is calculated at the application layer. It is obvious that there 82 is a minimum fixed latency (which itself is a function of the 83 load on the server, time of day, etc.) and a power-law distri-84 bution for the random delay. This scenario does not include 85 the time taken to perform ML to derive inference. However, 86 it shows that the round trip latency of cloud-centric solutions 87 is unpredictable. Therefore, it is highly desirable to move the 88 inference capabilities to the edge. Another disadvantage of cloud-centric architecture is that raw 91 data from the sensor has to traverse to the cloud via gateway 92 and the internet. Any compromise in security leads to privacy 93 violations. Also, there is no transparency on how data is being 94 used and which applications are authorized to use it. Moving 95 inference closer to the user, i.e. on the edge can provision 96 an architecture whereby raw data is never transmitted to the 97 cloud. Only inferences driven on the raw data are transmitted 98 to the cloud and stored for application-specific actions. Lastly, in the current architecture gateways need to be 101 connected to the internet. Provisioning such connectivity 102 incurs both capital expenditure (CAPEX) and operational 103 expenditure (OPEX) with management overheads associated 104 with maintaining infrastructure. Local inference capabilities 105 reduce the reliance on connectivity, i.e. provision of services 106 in areas where internet connectivity is intermittent or even 107 does not exist at all.

166
The goal of this article is to provide a comprehensive 167 overview of how TinyML-as-a-Service (TMLaaS) architec-168 ture and also demonstrate its usefulness through a practical 169 case study. We also outline practical considerations in imple-170 menting the proposed architecture. , a two-stage neural architecture search 197 approach has been proposed that first optimizes the search 198 space to fit the resource constraints, and then specializes 199 in the network architecture in the optimized search space. 200 This allows the model to adapt to different QoS and resource 201 constraints. Tree-based learning approaches [12] have been 202 studied in the context of reducing computational load and 203 storage requirements. Such models offer acceptable accu-204 racy, however, they are limited in terms of their scalability. 205 Hybrid approaches such as [13] combine DS-CNN neural 206 architectures and tree-based learning to offer solutions that 207 attain accuracy by using neural networks for feature extrac-208 tion and computational efficiency by using a shallow Bonsai 209 decision tree to perform the classification. TinyML frame-210 work can exploit neural architecture search (NAS) to design 211 models that can meet the stringent MCU memory, latency, 212 and energy constraints. An example of such an approach can 213 be found in [14]. Articles The rest of the paper is organized as follows: In Section II, 247 we present a brief account of existing ML approaches 248 and literature on implementing these models on resource-249 constrained MCUs. In Section III, the design and imple-250 mentation challenges of interoperability and performance 251 characterization to unlock the full potential of ML solu-252 tions has been discussed. TFLite Micro-based design and 253 workflow implementation of TinyML solutions is detailed. 254 Section IV explains MLaaS architecture in detail along with 255 its operational cycle termed at ML-Ops. Section V presents 256 the ease of development and deployment of the proposed 257 TLMaaS architecture with the help of a simple case study. 258 In Section VI, emerging ML modalities, road maps and open 259 issues have been discussed with a focus on Transfer Learn-260 ing and Federated Learning. Section VII concludes the paper 261 while Section VIII details the future work.

263
TinyML is an emerging field at the intersection of embed-264 ded systems and ML. Effectively, TinyML provides tools 265 to develop ML models which can be executed on resource-266 limited devices. The process flow for TinyML deployment 267 starts with the collection of data from the hardware where 268 an inference engine is required. The data can either be 269 logged on onboard storage or could be directly imported into 270 user-friendly tools e.g. Edge Impluse Studio. The ML model 271   A typical TinyML workflow will include training of 306 ML model using the TensorFlow framework on a 307 high-performance computer (e.g. using TFLite within Jupyter 308 Notebook) or on a cloud server (e.g. using Google Colab). 309 The Tensor Flow model is then converted to a TFLite format 310 using a converter. The converted model can then be exported 311 as a C-byte array. On the MCU, the application utilizes the 312 TinyML library which exposes the OpResolver application 313 programming interface (API). In contrast to TensorFlow, 314 TFLite Micro supports only a limited number of operations 315 for implementing NNs. The application developer utilizes 316 OpResolver to specify which operators need to be linked to 317 the final binaries of the IoT firmware. This in turn minimizes 318 the file size for the compiled firmware. Within the firmware, 319 a contiguous chunk of memory ''arena'' needs to be allo-320 In order to understand the operation of ML-based embed-387 ded applications, it is important to understand ML systems 388 development life cycle and aspects of the continuous high-389 quality operation. This corresponds to a complete ecosys-390 tem of ML-embedded solutions from development through 391 to delivery. This includes system construction, its integra-392 tion, testing, releasing, deployment and infrastructure man-393 agement [21]. Deriving from this definition, MLOps is a 394 set of engineering practices that aim at unifying ML system 395 development and operation with the objective of ensuring 396 continuous integration (CI), continuous development (CD) 397 and continuous testing (CT). In principle, MLOps is equiv-398 alent of DevOps i.e. combination of practices and tools that 399 increases the ability of the software provider to deliver appli-400 cations and services at a rapid pace. The following steps 401 provide a footprint of activities that may be used as a ref-402 erence classification for MLOps: (i) Sampling: For embed-403 ded applications, it is of utmost importance to have samples 404 of data arriving at an accurate rate for processing. This is 405 particularly important for real-time embedded applications 406 using data from multiple sources; (ii) Analysis: It involves 407 the classification of data based on the model schema and its 408 requirements. It also includes the metadata which can be of 409 higher value as compared to raw data hence a key enabler 410 for federated learning; (iii) Structure: The data needs to be 411 formatted in accordance with the ML model input structure. 412 It needs to be partitioned into training and validation sets; 413 (iv) Training: The TinyML model is trained using the 414 structured data; (v) Evaluation and Validation: The model 415 is evaluated and validated against test data and bench-416 marks; (vi) Deployment: The validated model is deployed; 417 (vii) CI: It refers to multiple integrations on a daily 418 basis. These can include algorithmic as well as visualiza-419 tion/firmware updates; (viii) CD: Based on the CI, devel-420 opment is also continuous with updates based on changing 421 application dynamics or algorithms. New models can be 422 developed and trained. Network bandwidths, latency and pri-423 vacy are key aspects of TinyML performance that may require 424 a CD approach to guarantee QoS; (ix) CT: Continuous vali-425 dation and testing of the updated models.

427
In this section, we aim to demonstrate the ease of develop-428 ment and deployment of the proposed TLMaaS architecture 429 with the help of a simple case study. The case study will 430 showcase how some of the features previously outlined can 431 be implemented in practice, as well as present a benchmark 432 of different ML models on the hardware. Through this case 433 study, we explore the feasibility of implementing the most 434 commonly used classical classification and deep neural net-435 work (DNN) techniques.  The implementation of the case study in terms of hardware 471 and software components is shown in Fig. 5. We employ 472 NodeMCU (ESP8266) as an IoT device connected with an 473 I2C interface to the IMU chip (MPU-6050). ESP8266 is set 474 up as a WiFi client device and streams the six raw features 475 from the IMU to the gateway. The gateway firmware scans for 476 the data but also provides a sub-set of functionalities outlined 477 in Fig. 3. For the gateway, we employ HELTEC LoRa Wire-478 less Stick. The gateway has dual connectivity, i.e. WiFi and 479 LoRa. The gateway implements: i) Device Manager: which 480 handles the data streamed by all devices. It also manages con-481 nectivity between devices and the gateway. ii) Lightweight 482 Web Server: A lightweight web server that hosts a web appli-483 cation written using React a JavaScript framework. The appli-484 cation provides functionality to provision either WiFi or LoRa 485 connectivity to the cloud. The web application also provides 486 visualization for the outcome of the local inference process 487 on a per-device basis. The web application interacts with the 488 web server which is fundamentally a part of the firmware 489 through RESTful API.iii) TMLaaS Engine: this is imple-490 mented in the firmware using EloquentTinyML wrapper on 491 TFLite. This allows the execution of TFLite on ESP32 MCUs 492 present on the HELTEC board. The implementation process 493 for TMLaaS functionality is comprised of three steps. First, 494 we collect the training data to train the classifiers offline while 495   Although the proposed TMLaaS architecture itself is in its 537 infancy, there are two key ML modalities and relevant open 538 challenges that need to be addressed by the community. In our 539 subsequent discussion, we outline what these ML modalities 540 are and highlight their importance for enabling the pervasive 541 deployment of TMLaaS architecture.

543
A typical TinyML workflow involves training a model using 544 high-performance computing platforms and then compiling it 545 through the TinyML framework which can be interpreted on 546 the edge device for making an inference. As an example, the 547 model developed by OpenAI recently to solve rubrics cube 548 not only required 1K desktop computers with several graph-549 ical processing unit (GPU) accelerators but also consumed 550 2.8 Gigawatt-hours of electricity. The evolving trajectory to 551 expedite TMLaaS at the edge is to supply pre-trained models 552 through the edge device and can be employed for inference. 553 repository (e.g. TensorFlow Hub). These pre-trained models 554 can then be imported by edge devices for inference. It is 555 not always possible to train and re-train to build TinyML 556 models because (a) Training on large datasets is computa-557 tionally expensive; (b) The input data for certain use cases 558 are not available but the trained models are often available. 559 Additionally, Transfer Learning (TL) provides the capabil-560 ity of applying knowledge gained while solving one problem 561 and applying it to different but semantically similar prob-562 lems. Therefore, a pre-trained model on a semantically sim-563 ilar task can be easily downloaded from the hub/repository 564 for the edge device and can then be employed for infer-565 ence. TL scenarios may be Inductive (i.e., source and target 566 problem domains are same but tasks are different), Unsuper-567 vised (i.e. lack of labeled data in target domain), or Trans-568 ductive (semantic similarities in source and target problems 569 but different domains for inputs). Since TMLaaS architec-570 ture is still in its infancy, the adoption of on-the-fly pre-571 compiled ML models through TL remains an open research 572 area. As a part of the future work in this area, several challenges 616 need to be addressed for both TL and FL-based approaches 617 before they can play their significant role in unleashing 618 wide-scale penetration of TMLaaS solutions. The three key 619 research challenges in the TL area are: a. How can the edge 620 device evaluate the trustworthiness of a pre-trained model, 621 i.e., how can an edge device be sure that the source domain 622 data set was representative and did not have any biases? b. 623 How can we build a trustworthy dissemination protocol for 624 sharing retrained TL models between edge devices? c. The 625 third and the most important design question is how we 626 develop an interpretable TL model, i.e. transition towards. 627 Explainable ML models for TMLaaS architecture. From FL 628 perspective, some of the important design questions include: 629 a. TinyML minimizes the energy cost paid in the transmission 630 of data to the cloud by allowing edge execution of the ML 631 model. However, for federated averaging at least the local 632 updates of the model need to be transmitted to the Cloud. 633 While this does not compromise privacy, it has energy and 634 connectivity costs. A typical FL update workflow requires 635 bandwidth that far exceeds the capability of current LPWAN 636 technologies. Therefore, it is these trade-offs that will dictate 637 when and how these two frameworks should be utilized in 638 conjunctions. b. It is possible to compress FL models using 639 ML compression techniques. TinyML frameworks e.g. Ten-640 sorFlow Lite needs to explore how communication efficient 641 model compression can be accomplished for FL.

642
In summary, both TL and FL will play an instrumental role 643 in the proliferation of TMLaaS architecture. The architecture 644 itself when viewed through the lens of these ML modalities 645 provides a rich design space for further research. We hope that 646 the highlighted design issues will trigger community interest 647 in exploring some of these open research areas further. He has published more than 90 papers in leading IEEE conferences and 752 journals. His current research interests include intersection ICT, applied 753 mathematics, mobile computing, and embedded systems implementation. 754 Specifically, his current research is geared towards: design and implemen-755 tation of communication protocols to enable various applications (rehabil-756 itation, healthcare, manufacturing, and surveillance) of future RAS; and 757 design, implementation, and control of RAS for enabling future wireless 758 networks (autonomous deployment, management, and repair of future cel-759 lular networks). He was awarded the G. W. and F. W. Carter Prize for 760 best thesis and best research paper. He has been awarded COST IC0902, 761 Royal Academy of Engineering, EPSRC, Horizon EU, and DAAD grants 762 to promote his research outputs. The Hashemite University, Zarqa, Jordan, where 776 he is also the Director of the Innovation and 777 Entrepreneurial Projects Center. He is working as 778 an Android Development Freelancer, by integrating, and utilizing the IoT 779 promising technologies for the management of smart homes and cities. His 780 experience in many programming languages gives him the motivation to 781 diversify his interests in utilizing many platforms and technology engines. 782 His current research is funded by the Royal Academy of Engineering through 783 two programs: Transfer Systems through Partnerships (TSP); and distin-784 guished international associate (DIA) in the fields of smart agriculture, 785 drone-assisted micro irrigation, and tiny machine learning on the edge IoT 786 devices. His current research interests include drone assisted wireless com-787 munications, public safety communication networks, backscatter commu-788 nication, deep learning, power harvesting, stochastic geometry, device to 789 device (D2D), machine to machine (M2M) communications, modeling of 790 heterogeneous networks, cognitive radio networks, and cooperative relay 791 networks.