StateOS: A Memory-Efficient Hybrid Operating System for IoT Devices

The increasing significance of operating systems (OSs) in the development of the Internet of Things (IoT) has emerged in the last decade. An event-driven OS is memory efficient and suitable for resource-constrained IoT devices and wireless sensors, although the program’s control flow, which is determined by events, is not always obvious. A multithreaded OS with sequential control flow is often considered clearer. However, this approach is memory consuming. A hybrid OS seeks to combine the strengths of the event-driven approach with multithreaded approach. An event-driven cooperative threaded OS represents a hybrid approach that supports concurrency by explicitly yielding control to another thread. Although this approach is memory efficient, as cooperative threads are not preemptive, it may not provide sufficient real-time performance. This article proposes a memory-efficient hybrid OS, called StateOS, for resource-constrained IoT devices. It is an event-driven cooperative threaded OS with partial real-time performance. StateOS implements a hybrid task scheduler that combines two cooperative threaded subsystems as kernel processes on a priority-based preemptive scheduler. This approach provides adequate real-time performance for IoT devices at a low memory cost.


I. INTRODUCTION
Internet of things (IoT) research has been very active over the past decade.This technology is expected to change people's daily lives by becoming part of the surrounding ambient objects [1].In 2020, the number of IoT connections exceeded that of non-IoT connections for the first time by 12 billion [2].
Most of the deployed IoT devices are based on wireless sensors.These devices share similar restrictions as the nodes of wireless sensor networks (WSN), such as restricted resources, distant deployment, unreliable network connections, and dynamic network topology.Therefore, existing WSN operating systems (OSs), such as TinyOS [3] and Contiki [4], are also utilized in IoT devices.
The typical OS for resource-constrained IoT or WSN devices supports either an event-driven or a thread-based programming model.In an event-driven model, programs are collections of event handlers, and the execution of an event handler is triggered by events.This approach is well-suited for data-centric IoT applications.Event-driven OSs are memory efficient and, thus, attractive for use in resource-constrained IoT platforms.However, programming a complex system with an event-driven model may be challenging because of the manual control of the stack, the lack of blocking functions, and the events that determine the flow of the program [5], [6], [7], [8].
A thread-based model allows for sequential control flows of a thread.This model is attractive from a programmer's perspective, as the programming pattern is intuitive for the human mind.A typical multithreaded OS manages concurrent threads with preemptive task scheduling, in which the execution of a thread can interleave.Preemptive task scheduling offers certain advantages, such as automatic task switching and automatic stack management.In a preemptive multithreaded OS, each thread requires individual stack memory allocation.This entails a memory consumption problem in resource-constrained devices that may compromise their overall performance.
A hybrid model is a compromise solution for a memoryefficient, multithreaded OS.Many previous proposals merged event-driven systems and multithreaded systems in different combinations to obtain a balance between memory consumption and performance.Cooperative threaded programming, exemplified by Protothreads [5], is a hybrid model that supports cooperative threads in an event-driven system.These threads are specifically programmed to voluntarily hand over processor control to another thread at the yield point to enable concurrency between the threads.However, an event-driven cooperative threaded system can have problems with real-time requirements [7] because a time-sensitive task cannot obtain processor control until the current task reaches the yield point.
This study contributes to the literature by proposing a memory-efficient hybrid OS, StateOS, that offers an adequate real-time performance.StateOS implements macro-based application interfaces for programming event-driven WSN applications in a threaded fashion, a hybrid task scheduler that supports cooperative and preemptive task management, a hybrid memory management module that can alleviate fragmentation problems, semi-automated stack management interfaces for cooperative task management, and cross-layer network architecture to reduce communication overheads.With these features, this approach provides a memory-efficient OS for resource-constrained wireless devices to support increasingly complex IoT tasks.
StateOS is based on a cooperative threaded programming approach and a hybrid task-scheduling solution.The OS implements a hybrid task scheduler to support cooperative and preemptive task scheduling.A priority-based preemptive context switcher manages two kernel processes with different priorities.This allows the process with higher priority to preempt the other.In both kernel processes, a cooperative task scheduler is implemented to manage the threads cooperatively.As a result, this hybrid approach can provide adequate real-time performance.
StateOS is oriented toward resource-constrained IoT and WSN devices.Therefore, it is designed to be a lightweight and modularized system that utilizes a microkernel architecture.In addition, the network protocol structure follows a cross-layer design to reduce the memory cost of network communication.
The remainder of this paper is structured as follows.Related research is reviewed in Section II.The system architecture and the kernel are proposed in Sections III and IV, respectively.A code example is shown in Section V, and the platform implementations are presented in Section VI.An evaluation of the proposed approach is given in Section VII.Finally, discussion and conclusion are provided in Section VIII.

II. RELATED WORK
Existing OSs for resource-restricted systems, such as WSN and IoT devices, support an event-driven programming model, a multithreaded programming model, or a hybrid model that combines event-driven and multithreaded models.
TinyOS was one of the earliest OSs to address the unique restrictions of WSN devices.It follows monolithic kernel architecture that is efficient but challenging to understand and maintain.Contiki applies a modularized design.The kernel implements an event scheduler that dispatches the event to the executing task.OpenWSN is an event-driven OS that focuses on providing network stack services.It retains a simple system design with a basic monolithic kernel architecture.SOS implements a modularized system structure using weakly linked components.The interactions between these components are accomplished by event-driven messages.
The event-driven programming style can be challenging [7], [11] due to associated programming difficulties, such as eventdetermined control flows and manual stack management.On the other hand, OSs that support a multithreaded programming model provide the programmer with a more familiar programming experience for sequential flow control, proactive task management, and automatic stack management.Typical examples of multithreaded OSs are MANTIS OS [12], RIOT [13], FreeRTOS [14], Zephyr [15], and Mbed OS [16].
MANTIS OS is a multithreaded OS designed for WSN microsensor platforms.It implements a layered architecture based on a lightweight preemptive kernel.RIOT is a multithreaded OS that aims to provide a Linux-like programming experience.It has a microkernel architecture with a preemptive scheduler.FreeRTOS is a popular real-time OS for small embedded systems and has been ported to IoT platforms.It supports multithreaded programming using a preemptive task scheduler.In contrast to the OSs above that support traditional IoT platforms with 8-bit microcontroller units (MCUs), Zephyr and Mbed OS, by default, are used on platforms with 32-bit MCUs [15], [16].Zephyr has two kernel implementations: a microkernel for less-constrained devices and a nanokernel for resource-limited devices.It supports multithreaded programming through different strategies, including prioritybased, cooperative, earliest-deadline-first, preemptive, and nonpreemptive scheduling.Mbed OS is a preemptive multithreaded OS that supports real-time software execution.It implements a kernel based on CMSIS-RTOS RTX [17] which is designed for Cortex-M processor-based platforms.
For certain resource-constrained devices, multithreaded OSs are heavyweight.However, complex IoT applications still prefer a multithreaded OS if the device supports it.Many programmers find a multithreaded OS to be more familiar and clearer for programming than an event-driven OS.
A hybrid OS is a compromise approach that combines eventdriven and multithreaded systems.It aims to provide a memoryefficient OS with a thread-based programming style.Previous studies have attempted to create a balance between resource consumption and performance by assembling event-driven and multithreaded systems in different approaches.
One hybrid approach, exemplified by Protothreads [5] and TinyThreads [6], implements cooperative threaded APIs in an event-driven system.Event-driven tasks are explicitly programmed to perform yield operations as cooperative threads.Protothreads provide macro-based abstractions and allow a thread to yield when performing blocking operations.A protothread is stackless, so it is memory efficient but requires manual stack management.TinyThreads is a library extension of TinyOS.It supports cooperative threads by implementing a cooperative scheduler within a TinyOS kernel task.TinyThreads allocates individual thread stacks, which makes them heavier than Protothreads.This approach does not support real-time performance because there is no preemption between the threads.
TinyMOS implements a TinyOS subsystem in the primary thread of a MANTIS OS kernel.The subsystem manages eventdriven tasks that can spawn slave threads to perform long-term operations.TOSThreads is the official designated multithreaded solution for TinyOS.It has a preemptive thread scheduler that maintains TinyOS as a subsystem in a high-priority thread.Long-term tasks are processed by application threads.SenSpire OS has a preemptive kernel that maintains two hierarchical event-driven subsystem threads.Including the interrupt routine, this forms a three-level event-driven system.Non-event-driven threads are low-priority threads that process long-term application tasks.Event-Bus has event-driven subsystems based on a preemptive scheduler.A subsystem maintains multiple cooperative subroutines to support a message-based, eventdriven model called the publish-subscribe model.OpenSwarm 1 implements a hybrid kernel and natively supports preemptive and cooperative scheduling.The preemptive scheduler manages the thread-based program, and short reactive tasks are handled by the event handler functions.
This hybrid approach is flexible because the programmer can choose a suitable programming model for different tasks.However, this hybrid system structure inherits problems from both system models, such as the event-driven task needing to be run-to-complete and the preemptive threads being memoryconsuming.
HybridKernel [23] takes an alternative approach that combines the event-driven cooperative threaded model and the preemptive multithreaded model.HybridKernel allows the creation of multiple event-driven Protothreads subsystems as preemptive threads.This approach solves the lack of the realtimeness of Protothreads by allowing preemption between the subsystems.Similar to Protothreads, HybridKernel requires manual stack management, which is a potential burden for the programmer.HybridKernel demands a fixed-sized stack memory allocation for each preemptive thread, which is similar to other preemptive multithreaded solutions.The stack memory allocation is a heuristic and involves stack overflow risk.As a result, programmers are inclined to allocate redundant stack memory, causing memory waste.
StateOS is related to the works above and addresses the problems revealed by the HybridKernel approach.StateOS's hybrid approach combines cooperative threaded and preemptive systems to provide partial real-time support to threads, which is similar to that offered by HybridKernel.Memory efficiency is achieved by introducing a memory-efficient hybrid task scheduler that consumes only one additional stack memory.In addition, cooperative stack management processes are semiautomatized by the task APIs.

III. STATEOS: AN OVERVIEW
StateOS is intended to provide a cooperative threaded OS with real-time capability for sensor-based IoT devices and wireless sensors.This OS implements a microkernel architecture, cross-layer network protocol design, and hybrid task scheduler to address resource constraint-related issues.
StateOS supports cooperative threaded programming through macro-based task APIs.These APIs are mainly designed for programming system modules and high-performance applications.This native programming model is not advocated as novicefriendly because of the system-specific language and system knowledge requirements.StateOS extensively supports a state machine-based visual programming model, statecharts, as a novice-friendly alternative approach.Statecharts are supported by statechart middleware and action libraries, in addition to StateOS.Readers are kindly referred to [11] for detailed information.
The StateOS architecture is depicted in Figure 1.The kernel implements a microkernel architecture with essential functions, including task APIs, task management services, and resource management functions.
The cross-layer management entity manages system services and sensor services as modules.These services are configurable according to the application requirements.The database  maintains the system's global information, such as the system's dynamic parameters and network details.It is openly accessible to other components for performance optimization and system diagnosis.
The protocol stack collects the network protocol programs and provides network communication services.It provides cross-layer interfaces for efficient network communication.
The diagnosis toolbox is an optional component that collects diagnostic instruments, such as a debug message printer (via cable or radio), diagnosis shell, radio signal evaluator, executive time analyzer, and memory logger.These tools are intended to aid in the debugging process.
Hardware heterogeneity is handled by hardware abstraction interfaces.The implementation of these interfaces is platformdependent.Up to now, we have supported some MCUs of the ARM M0 series and the Microchip XMEGA A series.Moreover, hardware implementations include some useful sensors and radios.
StateOS applies a cross-layer network communication design, depicted in Figure 2, for resource efficiency and optimized network performance.This cross-layer design was proposed by [24], who distinguished inter-module communications as asynchronous messages and synchronous function calling.
Asynchronous messages are used to carry primitive data messages that cross vertically adjacent layers in a requestresponse manner.Similar to the traditional open systems interconnection module, network messages between application layers traverse all protocol layers with the necessary overhead.In contrast, system services, especially network services (e.g., clock synchronization and traffic control), can directly access the lower layers to reduce the overhead problem.
The control and management communications between parallel components are handled by horizontally synchronous function calls.Network services can control protocol layers for optimized network performance.Furthermore, applications and protocol programs can also efficiently access system services and databases.

IV. THE KERNEL
StateOS implements a lightweight kernel with reduced functionality.The kernel implements essential system functions, as shown in Figure 3, including task APIs, suspended task handlers, a task scheduler, and resource management services.
There are various suspended task handlers, such as mailbox, delayed task handler, mutex, and generic event handler.The suspended task is assigned to the corresponding handler based on the cause of the suspension.
Cooperative threads in the kernel are called tasks, which are managed by a lifecycle state model, as depicted in Figure 4.The state transition is driven by task APIs and the task scheduler.A task can be created by task-creating functions.This newly created task is in the state new.Depending on the function, a new task can be issued to the task scheduler and is queued for execution in the state ready.Alternatively, a new task can be sent to a suspended task handler as an event-driven task and labeled by the state suspended.An event-driven task can be triggered by events, and its state changes to ready.A task is in the state running during execution.A running task can cooperatively release processor control by performing yield or suspend operations, which change its state to ready or suspended, respectively.Finally, a finished task is in the state end before being destroyed.

A. Task APIs
The kernel implements macro-based task APIs that support the cooperative threaded programming model.Typical APIs are categorized in Table I to illustrate their related functionalities.
The implementation of a task starts with the declaration of a task prototype interface and ends with a particular task return interface, TASK END.Altogether, they complete a switch-case structure as the base of local continuation, whereas a yield point is implemented by a case statement that is labeled with the line number of the source code.The other task return interface, TASK EXIT, is used to terminate a task in the middle of the process.
Code 1 primarily expands an example task implementation.The macro TASK a basic prototype interface, as expanded in lines 13-18, which implements a part of the switchcase primitive structure.The variadic task argument a is automatically initialized when the task begins, as in lines 15. Between lines 22-27, it expands a basic yield operation YIELD that contains a local continuation structure as a yield point (lines [23][24][25][26].The macros VAR SAVE and VAR RESTORE in the YIELD expansion function preserve the local variable b across the yield point.In the end, the macro TASK END, as expanded in lines 31-34, completes the primitive structure of the task.In addition to the macro TASK, task APIs provide interfaces that extend the macro TASK for particular purposes.The macro prefixed by STARTUP declares a startup task that is executed when the system starts.A mail handling task is declared by the macro MAILBOX MK, which can automatically respond to system messages. Task flow control APIs include task yield operations and suspend operations.These operations are extensions based on the YIELD operation (as introduced previously).The yield operations send a running task back to the scheduler and repeatedly estimate the condition until it is satisfied.Alternatively, a task with a suspended operation is dispatched to the associated suspended handler.For example, the operation TASK WAIT DELAY suspends a task in the delayed task handler, and a mutex lock failure by the operation TASK MUTEX LOCK suspends the task in the mutex handler.
Semi-automatic local variables are preserved by task flow control APIs.Unlike other stackless approaches, it is safe to use local variables across the local continuation structure in StateOS.However, task flow control APIs cannot detect the presence of local variables.Therefore, they must be introduced to the corresponding yield/suspend operations as variadic arguments, as shown in Code 1, line 6.
The APIs provide a synchronized task-calling mechanism that allows a task to call a subroutine task and wait until it is completed.This task-calling process is similar to calling a C function.The macro TASK CALL creates and introduces a new task to the kernel and then suspends the running task until the called task is finished.Two pairs of parentheses follow the macro TASK CALL.They are for the variadic task arguments and local variables because C language does not allow multiple variadic arguments in one set of parentheses.The subroutine task issued by TASK CALL is capable of passing the return value to the caller.The caller should fetch this return value through the macro TASK CALL RETVAL.
A task is typically created and introduced to the kernel by a task-creating function.The functions os add task and os add pree task can send a task directly to the scheduler as a regular or a preemptive task, respectively.The other taskcreating functions can dispatch a newly created task to the relevant suspended handlers as an event-driven task.
Information about a task is maintained in a data structure, namely, a Task Control Block (TCB).A TCB includes the task implementation address, state, command, identity, name, preemption, priority, task arguments, etc. Task APIs implement TCB management functions for more flexible flow control.For example, the function tcb add followup can link multiple TCBs in a daisy chain.When the previous task in the chain is completed, the following one is automatically invoked, and the function tcb clone clones an existing TCB.

B. Suspended task handlers
Suspended tasks are event-driven because they are typically resumed by a specific event or signal.These tasks are suspended in the corresponding handlers until they are reactivated.
The mailbox system provides an interlayer communication approach between the system services and applications.A potential mail recipient can register a mail handler (with the macro MAILBOX MK) in the mailbox system.Mail delivery can trigger the handler as a high-priority task.A typical use of the mailbox is to forward the radio messages received from the protocol stack to applications.
Delayed tasks are suspended in delayed-task handlers.This module is associated with a hardware timer that is used to count down task delays.A delayed-task queue sorts these tasks by their delay time.A timer event dispatches expired tasks to the scheduler.
The mutex module implements a locking mechanism for safe resource access.The failure of a mutex lock attempt causes task suspension.These suspended tasks are sorted by their priority in the associated mutex entries and wait for the required resources.Once the mutex is unlocked, relevant tasks are resumed.
The event handler manages generic event-suspended tasks.These tasks are registered in the associated event entries.

C. Task scheduler
The kernel ready tasks using a hybrid strategy.The task scheduler, as depicted in Figure 5, comprises two task queues: pree-queue and coop-queue.A context switcher empowers the pree-queue to preempt the coop-queue.In each task queue, a task routine is implemented as a scheduler to manage the tasks in a priority-based cooperative manner.This structure achieves hybrid task scheduling through preemptive task management between the task queues and cooperative task management between the tasks in the same queue.
Figure 6 demonstrates the task scheduling implementation in response to the hardware interrupt.The system interrupt INT0 interrupts task T0 and inserts a preemptive task T1 in the pree-queue.When INT0 returns, the context switcher issues a context switch to the pree-queue, and T1 is executed.In this demonstration, it is presumed that T1 is the only task in the pree-queue at the moment.Therefore, the completion of task T1 empties the pree-queue and causes another context switch that continues task T0.
Task T2 is the subsequent task that is executed when T0 ends.Task T2 invokes preemptive task T3, causing an immediate context switch.The interrupt INT1 interrupts the execution of T3 and inserts another preemptive task, T4 into the pree-queue.However, task T4 must wait until T3 is finished because it follows cooperative task management in the same task queue.Task T2 is resumed once T4 is completed.
Thus far, this proposed hybrid strategy is naïve because a blocked pree-queue task can prevent the coop-queue tasks, and a blocked high-priority task can prevent a lower-priority task.Therefore, a design principle is established that a preemptive task shall not contain a blocking operation.However, blocking a preemptive task is not prohibited.In such a case, a blocked preemptive task waives this privilege by descending to the coop-queue when it yields.In addition, inside a task queue, any yield operation downgrades the task to the lowest priority, and the task is rescheduled at the end of the task queue.
The pree-queue and the coop-queue are, de facto, two kernel processes scheduled by the context switcher.Thus, both queues require individual memory stacks.The preemptive tasks in the pree-queue typically require small memory stacks, as they are short-lived.Therefore, the stack memory assigned to the preequeue can be small (e.g., 128 or 256 bytes).As shown in Figure 7, the pree-queue stack is allocated to the RAMEND (the end of random-access memory).In this way, the wasted memory used to initialize the system is reused as the preequeue stack.The coop-queue uses a native memory stack, and the stack size is dynamic until it overlaps with the stack .heap.

D. Resource management
Resource management modules are essential for managing hardware resources.The power manager controls the system's power-saving level based on the kernel status.Primitive events/signals from hardware are managed/filtered by the interrupt monitor.The system timer provides system timing functions, such as the system tick service and the primitive timer event.
StateOS applies dynamic memory management using a memory allocator.The memory allocator distinguishes the memory allocating requests as long-term and short-lived requests and applies different algorithms to each.This strategy effectively alleviates internal and external fragmentation problems.
The traditional heap-based malloc algorithm (as implemented in the standard C library) allocates memory in a dynamically growing data segment.This algorithm is efficient in processing long-term requests.However, frequent short-lived requests can cause the external fragmentation problem [25].In contrast, buddy system [26] divides the memory pool into fix-sized memory blocks to alleviate the external fragmentation problem [27].However, when long-term memory requests occupy the memory pool, this can lead to an internal fragmentation problem.
StateOS allocates separate memory sections to fulfill shortlived and long-term requests.shown in Figure 8, the dynamically growing memory pool .heapsatisfies long-term requests using a heap-based malloc algorithm, and the static memory pool .buddyimplements a variant of the buddy system for short-lived requests.This hybrid strategy improves efficiency with regard to dynamic memory management by alleviating fragmentation problems.
The buddy variant maintains a binary tree that monitors the states of the memory blocks.The operations of the binary tree have a time complexity O(logN ), where N is the number of memory blocks.The worst-case scenario is allocating/freeing memory that is smaller than a memory block because the operation has to traverse the tree to reach the bottom leaf.It is inefficient to repeatedly issue small memory requests to a buddy allocator because the algorithm processing time diminishes the system's performance.To alleviate this problem, a memory recycling system is implemented to temporarily hold the recently freed small memory chunks without restoring them to the binary tree.These memory chunks can be quickly reassigned to new requests.However, holding these memory spaces in the long term may cause an unbalanced binary tree and increase internal fragmentation.For this reason, the recycled memory blocks are dumped periodically at the system's convenience.

V. A CODE EXAMPLE
An example code, shown as Code 2, is a code snippet that initializes a radio transceiver.It contains the radio-initializing task radio init task and the startup task start demo.
The task implementation begins with the task prototype interface TASK, followed by the return type, task name, and variadic arguments.The task prototype TASK(int, radio init task, int, timeout), in line 1, contains the return value type int, the task name radio init task, and an argument int, timeout.The reader may notice that the argument's type int and label timeout are separated by a comma.This is because the type and label of an argument are treated as a pair of parameters in the prototype.
The task STARTUP TASK(start demo), as in line 17, implements a startup task.This task calls the subroutine task radio init task in line 19 and is blocked until the called task is completed.
The task radio init task contains the local variable ts across the yield operation TASK WAIT WHILE, as in line 10.This local variable must be introduced to the yield operation for preservation; otherwise, the value of the variable is no longer guaranteed when the task resumes.
The task invoked by the synchronized task-calling macro TASK CALL can return the result as a return value.In the example, the task radio init task has the return value of an integer int.There are two return operations in the example, as can be seen in lines 13 and 14.The return value is read by the macro TASK CALL RETVAL.

}
This example demonstrates that StateOS's cooperative task APIs provide a practical approach to cooperative threaded programming.It is flexible, memory efficient, and easy to use.

VI. IMPLEMENTATION
StateOS has been implemented as an IoT solution on different platforms.Figure 9a shows the baseboard and several sensor modules, including a relay-based magnet sensor, a pressure sensor, and a motion sensor.The baseboard includes an XMEGA256A3Bu MCU and an AT86RF215M transceiver.Using this modularized design, an IoT sensor device can be assembled simply by attaching the sensor module to the baseboard.These sensor modules are supported by the sensor service modules of StateOS.
Network management is achieved using autonomous network services, which include a network scheduler and clock synchronization.The network scheduler synchronizes the radio activities of all IoT devices on the network.This reduces a device's power consumption by minimizing the radio's active period.The clock synchronization service fine-tunes the device's local clock according to the central clock.
The database of the cross-layer management entity maintains a network status list neighbor-table by recording broadcasts from neighboring devices.The table collects information about these devices, which includes their battery level, signal strength, link quality, and traffic throughput.This information can be used by network services, such as topology management and traffic control services.Furthermore, the neighbor-table is uploaded to the server for network diagnosis.The LoRa [28] gateway board, shown in Figure 9b, is an extension of the baseboard that supports LoRa communication using an RN2483 module.It is mainly used to upload gateway data to the service.The LoRa protocol has limited bandwidth, especially over a poor-quality network link.Thus, StateOS further implements a data aggregation module and a compression module to reduce LoRa network traffic.
Some applications demand increased bandwidth that can exceed the LoRa capacity.Therefore, for larger network traffic throughput, we implemented heavy-duty gateway boards, as shown in Figure 9c.This platform is mounted with four LoRa-E5 modules and an ESP8266 Wi-Fi module.This gateway has its own MCU (which also runs StateOS), ARM M0+ ATSAML21G18B, for network management.

VII. EVALUATION
In this section, StateOS is evaluated based on its technical properties, scalability, and performance.The technical properties of an IoT OS include the kernel architecture, scheduling strategy, programming paradigm, programming language, and real-time capability.The scalability of an OS is measured by comparing the data memory consumption and program memory engagement.The performance is determined by calculating the processing time of the kernel operations.
Traditional resource-constrained IoT/WSN platforms are powered by an 8-bit MCU, and the typical IoT/WSN-oriented OSs were originally designed under the 8-bit computing architecture.Therefore, in the scalability and performance subsections, evaluations are performed by comparing operating systems that support 8-bit processor families (e.g., AVR and PIC processors) with comparable performance metrics.The evaluation data for StateOS were obtained on the platform specified in Table II.Evaluation data for other OSs were obtained in literature research.an OS's overall architecture and modularity.StateOS applies a microkernel architecture for a small kernel size and a modularized structure.The system modules are loosely coupled, which achieves a flexible and robust architecture.

A. Technical properties
StateOS implements a hybrid task-scheduling strategy to support cooperative threaded programming at a small memory cost while maintaining adequate real-time capability.The cooperative threaded interfaces in StateOS are provided by the system-specific language.It can be challenging for novice programmers.Therefore, StateOS extensively supports statecharts as a state machine-based visual programming model.

B. Scalability
The scalability of an OS is evaluated by memory usage for handling concurrent tasks/threads.The evaluation is conducted with a methodology similar to [23].In the evaluation, we run 16 cooperative tasks on StateOS, which are contained by two kernel threads (the pree-queue and the coop-queue).
A typical StateOS configuration takes 46 bytes of static data memory, which is its kernel's memory footprint that includes the control blocks of two kernel threads (the pree-queue and the coop-queue).The task scheduler typically allocates 128 bytes of stack memory for pree-queue context saving.Additionally, the buddy memory module requires extra management memory of 19 bytes and binary tree memory of M/B bytes, where M is the memory pool size, and B is the memory block size.It is typical to configure a dynamic memory pool of 1024 bytes with a block size of 8 bytes.Thus, the buddy module takes 147 (19+1024/8) bytes of heap memory as the management cost.Furthermore, the memory recycling system can be optimized to take 20 bytes to implement ten recycling entries.In summary, a functional StateOS requires 341 (46+128+147+20) bytes of data memory (the memory footprint of the hardware implementation is not counted).
In StateOS, a TCB takes a minimum of 22 bytes of memory.Therefore, a running StateOS with 16 concurrent tasks consumes 693 (22 * 16 + 341) bytes of memory, which includes 352 bytes of dynamic memory from 16 TCBs and 341 bytes of system memory consumption.However, this estimation is based on the minimum task profile, with no arguments nor local variables, and the results are suggestive of estimating the system's memory usage.
StateOS takes a minimum of 13K bytes of flash memory, which primarily involves kernel implementation.However, a typical configuration of StateOS consumes more memory to satisfy the application requirements.For example, the StateOS implementation in Section VI consumes 59K bytes of flash   [11] b Contiki supports cooperative threaded programming using Protothreads [5] .c Contiki supports multithreaded programming by a library on top of the event-based kernel [5].d HybridKernel supports cooperative threaded programming using Protothreads [23] .e TinyOS accepts cooperative threaded programming using the library extension TinyThreads [6] .f There are many multithreaded solutions for TinyOS.The official one is TOSThreads.g CSpire is an object-oriented programming language that extends C++ [20].memory, which includes the kernel (13K bytes), hardware implementation (15K bytes), network services (24K bytes), sensor services (2K bytes), and miscellaneous components (5K bytes).
In Table IV, we compare the multiple task overhead of StateOS to a multithreaded solution (MANTIS OS) and three hybrid solutions (Contiki with Protothreads, TinyOS with TOSthreads, and HybridKernel).In this evaluation, the hybrid solutions (including StateOS) are evaluated with 16 cooperative components and two preemptive components, and the multithreaded solution executes 16 preemptive components.To distinguish between processes and threads in this evaluation, we define threads as being cooperative and consuming memory from TCBs, whereas processes are preemptive and consume memory from Process Control Blocks (PCB).The results suggest that StateOS is a memory-efficient approach to implementing multitask systems.

C. Performance
The performance of StateOS is evaluated based on the processing time of the task APIs.Most task operations involve dynamic memory management.The approximate processing time for a short-lived and small-sized memory allocation is 40 µs and for memory free is 41 µs.This performance can be promoted by the memory recycling system to obtain a memory allocation of 16 µs and a memory free of 30 µs.
When the scheduler dispatches a task, it takes 12 µs to establish the task.In addition, preparing an 8 or 16-bit task argument takes less than 1 µs.However, preparing a 32-bit argument consumes a higher amount of processing time of 3 µs.The task yield operations take an average of 4 µs to release the processor control.The other task flow control operations, such as TASK WAIT DELAY and TASK CALL, can have a processing time of 40-60 µs.Furthermore, if local variables are saved during the operation, the processing time increases because of the dynamic memory operations.
The cooperative task-switching procedure includes a flow control operation, a task establishment operation, and possible stack management operations.In summary, the processing time required for a cooperative task-switching operation can be a minimum2 of 16 µs and a maximum of more than 100 µs.Cooperative task-switching operations are usually issued by tasks with low urgency levels.Therefore, its processing speed is sufficient for such tasks.
Compared to cooperative task switching, preemptive context switching between the coop-queue and the pree-queue is faster.It takes about 4 µs to switch between the task queues, which is quick enough for a time-sensitive task to be executed in time.
The results for the comparison of the scheduling overhead of task switch operations in cooperative and preemptive scheduling are shown in Table V.The overhead values are platformdependent.Therefore, the comparison is feasible if the results are unified with the CPU clock cycles.
Cooperative switch time is the scheduling overhead between consecutive cooperative tasks.StateOS APIs provide semiautomatic stack management and versatile task controls (e.g., task concatenation, task callback, and mutex) that are processed between tasks.This user-friendly approach requires more executive time than simple task switches.However, typical cooperative tasks can tolerate longer latency in exchange for flexibility.b TinyOS has a first-in-first-out task queue with a run-to-the-end strategy.We had trouble searching the literature results for the task switch time between the consecutive tasks in the task queue.However, a task switch operation has a few instructions about memory copying and function calling, similar to event posting and command calling operations (both of them take 10 clock cycles).Therefore, we assume that the cooperative switch in TinyOS is around 10 clock cycles.c Contiki is evaluated with Protothreads.d HybridKernel manages cooperative tasks using Protothreads.The results are taken from Protothreads.
Real-time capability can be estimated by the performance of preemptive task switch operations.This reflects the guaranteed maximum time of the system's responsiveness.Of the results for the solutions shown in the table, StateOS takes the least amount of preemptive task-switching time.This shows that StateOS provides sufficient real-time performance for IoT/WSN applications.

VIII. DISCUSSION AND CONCLUSION
This paper presents an embedded OS targeting IoT and wireless sensor devices with distinctive features.The kernel provides a set of macro-based cooperative task APIs that allows for the modeling of event-driven systems in a threaded paradigm.The hybrid task scheduler supports a mix of various scheduling algorithms, such as cooperative, priority-based, and preemptive scheduling.It allows the programmer to balance resource usage and real-time performance based on the application's specifications.The dynamic memory allocator in StateOS implements two separate strategies for long-and short-lived allocation requests.This method can alleviate the fragmentation problem and improve the robustness of the system in the long term.Automatic stack management is typically the privilege of multithreaded OSs.The users of event-driven and cooperative OSs have to manually pass the parameters between tasks and protect local variables.StateOS' APIs provide semi-automatic stack management that automatizes the process of parameter passing and local variable preservation.
A design principle of StateOS is memory efficiency.To this end, several technologies are applied, including a crosslayer communication structure, modularized services, and microkernel architecture.Therefore, the memory occupation of the system is configurable, depending on the application's requirements.
StateOS works perfectly with a statechart visual programming framework [11] that supports modeling and programming wireless sensor programs using graphic statechart diagrams.This combination provides an alternative visual programming approach that can aid developers in creating IoT applications efficiently.
The hybrid approach to kernel design is, in fact, a compromised approach that combines event-driven and multithreaded systems.The system may consume more memory and executive time than event-driven solutions.On the other hand, multithreaded systems need no attention from users for stack management.Compared to this, StateOS implements a semi-automatic stack management solution, requiring users to manually identify local variables.Furthermore, StateOS adopts a dynamic memory management strategy.It has a trade-off of the overhead of managing memory spaces.
StateOS was initially designed for WSN-based solutions, where the gateway handles internet protocol (IP)-based network traffics.Our following works include extending the network stack to support low-power IP protocols such as 6LoWPAN [34], allowing individual access to WSN nodes through an IPbased IoT network directly.It will enable the use of IP-based IoT application protocols, such as Thread [35] and Matter [36], and emerge StateOS as a part of the modern IoT ecosystem.Moreover, the hardware implementations are limited to a few MCU models.We will extend the hardware implementations to other popular IoT MCUs and platforms in the following works.
In conclusion, we proposed a hybrid approach to program- ming IoT and wireless sensor applications in a threaded paradigm with less memory consumption.We expect the proposed solution to be a viable instrument that aids modern IoT application development.

Figure 1 .
Figure 1.StateOS architecture.The components represented using a dashed line are the configurable modules.

Figure 2 .Figure 3 .
Figure 2. StateOS cross-layer network architecture This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3234106This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/

Code 1 . 16 _TCB_PT_
Example of the simplified primary expansion of the task API.All irrelevant details have been omitted. 1 // The original macro-based task implementation 2 TASK(void, my_task, int, a) 3 { 4 int b = 0; 5 6 TASK_YIELD(b); 7 printf("%d\n", a + b); The primarily expanded implementation 13 task_retval_t my_task(tcb_arg_t * _args) 14 { 15 int a = _tcb_argv(_args); This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3234106This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4

Figure 6 .Figure 7 .
Figure 6.Task-scheduling demonstration, where the label T0 and T2 denote regular tasks from the coop-queue; T1, T3, and T4 denote the preemptive tasks from the pree-queue; and INT0 and INT1 denote the hardware interrupts .

Figure 9 .
Figure 9.The StateOS platform examples include (a) a baseboard with sensor modules, (b) a LoRa gateway board, and (c) a heavy-duty LoRa gateway board.
This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3234106This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/ Figure 4. Task lifecycle state model.A task is created in the state new.The state ready indicates that a task is scheduled in the task queue and is ready to be executed.An executing task is in the state running.When the processing is completed, the task is in the state end.A task in the state suspended is suspended in the suspended task handlers and can be triggered to resume by an event.The italic font indicates variadic arguments, i.e., it accepts any number of arguments of any type.
a This table collects the typical and generalized task APIs.b c The return interfaces accepts at most one argument retVal.
Table III lists the technical properties of different IoT OSs.The kernel architecture choice significantly influences

Table II SPECIFICATIONS
OF THE EVALUATED STATEOS PLATFORM Statecharts are supported by the statechart framework a This comparison is conducted by executing 16 concurrent and two preemptive parts in the system.Some OSs may require an additional thread or stack in the kernel.b The StateOS' pree-queue stack is counted as static memory.c Contiki with Protothreads and a multithreaded library.d Contiki kernel requires an extra stack.e TinyOS with TOSThreads requires an extra kernel thread hosting the event system.Thus, it counts as three preemptive parts in total.f MANTIS OS executes 16 preemptive threads because it supports no cooperative tasks.
a TinyOS is evaluated with TOSThreads.
This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3234106 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/