Lightweight Cryptography Algorithms for Resource-Constrained IoT Devices: A Review, Comparison and Research Opportunities

IoT is becoming more common and popular due to its wide range of applications in various domains. They collect data from the real environment and transfer it over the networks. There are many challenges while deploying IoT in a real-world, varying from tiny sensors to servers. Security is considered as the number one challenge in IoT deployments, as most of the IoT devices are physically accessible in the real world and many of them are limited in resources (such as energy, memory, processing power and even physical space). In this paper, we are focusing on these resource-constrained IoT devices (such as RFID tags, sensors, smart cards, etc.) as securing them in such circumstances is a challenging task. The communication from such devices can be secured by a mean of lightweight cryptography, a lighter version of cryptography. More than fifty lightweight cryptography (plain encryption) algorithms are available in the market with a focus on a specific application(s), and another 57 algorithms have been submitted by the researchers to the NIST competition recently. To provide a holistic view of the area, in this paper, we have compared the existing algorithms in terms of implementation cost, hardware and software performances and attack resistance properties. Also, we have discussed the demand and a direction for new research in the area of lightweight cryptography to optimize balance amongst cost, performance and security.


I. INTRODUCTION A. IoT OVERVIEW
Internet of Things (IoT) has already become a dominant research era because of its applications in various domains such as smart transport & logistics, smart healthcare, smart environment, smart infrastructure (smart cities, smart homes, smart offices, smart malls, Industry 4.0), smart agriculture and many more.Many researchers and industry experts have given various definitions of IoT depending on their applications and implementation area, but in simple words, IoT is a network of connected things, each with a unique identification, able to collect and exchange data over the Internet with or without human interaction [1]- [5].In any IoT solution or application, IoT devices are the key elements.These IoT devices could be divided into two main categories (Figure 1): The associate editor coordinating the review of this manuscript and approving it for publication was Kim-Kwang Raymond Choo .rich in resources such as servers, personal computers, tablets and smartphones, etc. and limited in resources (resourceconstrained) such as industrial sensors or sensor nodes, RFID tags, actuators, etc., [6].In this paper, we focus on the second category of IoT devices.These connected devices are becoming more popular due to their use in various application and will flood the market with the emergence of IoT [6], leading an enormous data exchange rate amongst [7].

B. SECURITY CONCERNS OF RESOURCE-CONSTRAINED IoT DEVICES: CHALLENGES AND SECURITY REQUIREMENTS
When billions of smart devices (connected devices) working in a diverse set of platforms, especially when shifting from server to sensors, gives birth to various unprecedented challenges to their owners or users [6] such as security & privacy, interoperability, longevity & support, technologies and many more [8].Also, IoT devices are easily accessible and exposed to many security attacks [9] as they interact directly with the physical world to collect confidential data or to control physical environment variables, which makes them an attractive target for attackers [10].All these circumstances make cybersecurity as a major challenge in IoT devices with demands of confidentiality, data integrity, authentication & authorization, availability, privacy & regulation standards and regular system updates [8].The Figure 2 depicts IoT security challenges and its security requirements.
In this scenario, cryptography could be one of the effective measures to guarantee confidentiality, integrity and authentication & authorization of the traversing data through IoT devices [7].It could also be a solution to secure the stored or traversing data over the network.However, conventional PC based cryptography algorithms do not fit into resource-constrained IoT devices due to their high resource demands.A lighter version of these solutions, lightweight  cryptography, can address these challenges to secure the communication in resource-constrained IoT devices.

C. KEY CHALLENGES WHILE IMPLEMENTING CONVENTIONAL CRYPTOGRAPHY IN RESOURCE-CONSTRAINED IoT DEVICES
The key challenges while implementing conventional cryptography in IoT devices (Figure 3) are as follows [11]: • Limited memory (registers, RAM, ROM) • Reduced computing power • Small physical area to implement the assembly • Low battery power (or no battery) • Real-time response Most of the IoT devices (such as RFIDs and sensors) are small in size and are equipped with limited resources such as small memory (RAM, ROM) to store and to run the application, low computing power to process the data, limited battery power (or no battery in case of passive RFID tags) [6], small physical area to fit-in the assembly [6], [11].Moreover, most of the IoT devices deal with the real-time application where quick and accurate response with essential security using available resources is a challenging task [12], [13].IoT device designers face several risks and challenges, including energy capacity [14], and data security [9].
In these circumstances, if conventional cryptography standards are applied to IoT devices (mainly RFIDs and sensors), their performance may not be acceptable [6].The above issues with conventional cryptography are very well addressed by its sub-discipline, lightweight cryptography, by introducing lightweight features such as small memory, small processing power, low power consumption, real-time response even with resource-constrained devices [6].
Another important aspect of lightweight cryptography is that it is not just applicable to resource-constrained devices (RFID tags, sensors, etc.), but readily applicable to other devices rich in resources that it directly or indirectly interacts with (such as servers, PCs, tablets, smartphones, etc.) [6].

D. MOTIVATION AND CONTRIBUTION
Recently, many algorithms have been proposed for LWC by the researchers.Besides, many works have revealed the security attacks on particular LWC algorithm(s) [15]- [31].A number of published papers have done a fair comparison of hardware and/or software implementations of these algorithms on different platforms as well as in different circumstances [9], [32]- [39].Most of these works have considered the algorithms which are applicable in certain domains or suitable for certain applications.However, a holistic view of the proposed LWC algorithms in terms of their hardware-software performances along with cryptanalysis is missing in these works.Authors in [40] have reviewed a list of different LWC algorithms with their performances on different platforms but missing an inclusive view on their applications and lightweight key demands of cost (memory, physical area, battery, power) and performance (quick response) along with the security concerns.Also, [40] does not include a number of key algorithms, e.g., Keeloq and Midori.In addition, it just provides a list of attacks on LWC algorithms without any security comparison, and thus a clear view of various security attacks on different LWC algorithms is missing.
More recently, [41] discusses on the algorithms, especially submitted to the NIST competition (round 2), which are compliant with LWC Hardware API (proposed by the NIST in 2019) and evaluates them on FPGA platform (Xilinx, Intel, and Lattice).The paper considers only two performance metrics: Throughput and Speed (clock-cycles/byte) which could be its limitation as others (Block/Key size, Memory, Gate Area, Power & Energy requirements) are missing.Also, these algorithms are running in a competition through several rounds (32 out of 57 (in round 1) are competing in the 2nd round).
With a unique aspect in this paper, we have clearly classified the key characteristics of LWC algorithms (missing in the existing survey papers) proposed by the leading research groups [6], [42] in the fields of cryptography along with how LWC satisfies these properties (Table 2).Secondly, our paper compares 41 existing symmetric key lightweight cryptography (plain encryption) algorithms over 7 performance metrics (Block/Key size, Memory, Gate Area, Latency, Throughput, Power & Energy requirements along with hardware and software efficiency) as recommended by the NIST report for resource-constrained IoT devices [6].These LWC algorithms are widely adopted by the industries and the article reveals the top ten amongst them based on their mapping (metrics).These analyses could be useful to researchers/scientists in choosing the right algorithm based on their application requirement(s).Also, demonstrating various IoT applications in real-world along with their lightweight key requirements and their best suite LWC options is a unique contribution in the field of lightweight cryptography.In addition, our paper evaluates various attacks on different LWC algorithms in a grid form.Such comparison eases users to identify the security strength of any LWC algorithm as well as to identify common attacks on LWC algorithms.A recent call from NIST [43] (to create new LWC algorithms for easy and efficient implementation on resource-constrained circuitry) and the results derived from the study (none of the algorithms meets all the criteria of lightweight in terms of cost and performance along with strong security), really encourage to explore the existing list of LWC algorithms from different perspectives for further research.

E. PAPER OUTLINE
Considering the significance of IoT security, this article takes an inclusive view on symmetric key lightweight cryptography algorithms and i) defines hardware and software performance metrics based on identified key characteristics of LWC and gives a broad classification of LWC based on their internal structure (Section II), ii) a comprehensive study of existing LWC algorithms along with their performances, cryptanalysis and real-time use cases (Section III), iii) outlines open research challenges, recommending future research directions (Section IV), and finally iv) concludes in (Section V).

II. LIGHTWEIGHT CRYPTOGRAPHY FOR RESOURCE-CONSTRAINED IoT DEVICES A. CHARACTERISTICS OFFERED BY LWC
The three main characteristics of Lightweight cryptography algorithms and their offerings are listed in Table 2 [9], [11]: As shown in the above table, physical cost, performance and security are the main characteristics to look into while implementing cryptography to any resource-constrained IoT device.Each of these characteristics is further observed where VOLUME 9, 2021 physical space occupied, memory demand and energy consumption as a cost to implement, processing power in terms of latency and through as performance (speed) and block/key length and different attack models including side-channel & fault-injection attacks as a security measure.First two characteristics are satisfied by LWC algorithms by offering simple round functions on the tiny block (≤ 64bit) using a tiny key (≤ 80bit) with simple key scheduling.The last but important characteristic, security, is fulfilled by the adoption of one of the six internal structures (SPN, FN, GFN, ARX, NLFSR, Hybrid) to immune against the security attacks.

B. HARDWARE AND SOFTWARE PERFORMANCE METRICS
Based on first two characteristics (physical and performance) offered by any LWC algorithms, hardware and software specific resource requirement could be measured in terms of memory requirements, gate area, latency, throughput, and power and energy consumption as follows: 1) MEMORY REQUIREMENTS Generally, measured in KB [40].RAM is required to store intermediate values that can be used in computations and ROM is required to store the program/algorithm, and static data, such as algorithm key, S-box (in some cases), etc., [6].

2) GATE AREA
It is the physical area required to implement/run the algorithm on a board/circuit, measured in µm 2 .This space can be specified using logical blocks for FPGA or using GE for ASIC (1GE = 2 input-NAND Gate) [6].Normally, 200 to 2000 GE (out of 1000 to 10,000 GE of total available) are allocated for security reasons in an economical RFID tag [44].

3) LATENCY
It is the time to produce the cipher from the original text in terms of hardware performance [6] whereas the amount of clock cycles per block (during encryption) defines the software latency.

4) THROUGHPUT
Throughput, in hardware, can be measured in terms of plain text processed per time unit (bits per second) at 100 KHz frequency, whereas in software, it is the average amount of plaintext processed per CPU clock cycle at 4 MHz frequency [45].

5) POWER REQUIREMENTS
The amount of power required by the circuit to process the algorithm can be measured in µW.

6) ENERGY CONSUMPTION
Energy consumption per bit can be calculated as follows [40]: Here, latency is in terms of software implementation.

7) EFFICIENCY
Gives performance over resource requirements.For hardware, it can be calculated as follows [40]: Here, complexity means physical space.Similarly, software efficiency can be determined as follows [40]: Here, code size is the algorithm size.

C. STRUCTURE WISE CLASSIFICATION OF LWC
Cryptographic algorithms can be classified into two main categories, symmetric key and asymmetric key (Figure 4) cipher.Symmetric key uses a single key for both encryption and decryption of the data, whereas asymmetric cipher uses two different keys to encrypt and to decrypt the data [46].Symmetric key cryptography is safe and comparatively fast, the only downside of symmetric key encryption is the sharing of key between the communicating parties without compromising it [32].But this could be overcome by pre-sharing the key through a trusted third party.Also, it ensures confidentiality, data integrity and authentication (using authentication encryption mode (AEAD)) of the data.Asymmetric cryptography uses two private-public key pairs.It ensures confidentiality and integrity by making use of the public key of the receiver and further ensures authentication by using the sender's private key (as a digital signature) to encrypt the data.At the other end, the receiver decrypts it by using the sender's public key first and then using his/her private key [46].The only disadvantage of asymmetric encryption is its large key which increases the complexity and slows down the process [32].
In block cipher, both encryption and decryption take place on a fixed size block (64 bits or more) at a time whereas stream cipher continuously processes the input elements bit by bit (or word by word) [46].There are two fundamental properties of any cryptography, confusion and diffusion, introduced by Claude Shannon [35], [40] to strengthen the cipher.The confusion makes the relationship between the ciphertext and the key as complex as possible using substitution (S-box) whereas diffusion dissipates the statistical structure of plaintext over the bulk of ciphertext using permutation [35], [46].The stream cipher uses only confusion property whereas block cipher uses both confusion and diffusion with simple design compared to the stream one.Following the reverse of encryption process to extract the original text is hard in a block cipher whereas stream cipher performs XOR function(s) to encrypt the data that could be easily reverted to its original form.In contrary, Hash is a one-way mathematical function that transforms unspecified length data into a specified-length bit string (short string) which cannot be inverted.
For the above reasons, a block cipher is preferred in resource-constrained IoT devices over stream cipher.This paper concentrates on block cipher, mainly symmetric lightweight block ciphers.It uses one of the following structure: • Hybrid Substitution-Permutation network (SPN) tweaks the data through a set of substitution box and permutation table and formulates them for the following round.A Feistel network (FN) breaks the input block into equal halves and applies diffusion in each round to just one half.In addition, swapping of two halves happens at the beginning of each round.The generalized Feistel network (GFN) is an extrapolated version of the classic Feistel network.It splits the input block into a number of sub-blocks and applies the Feistel functions to every pair of sub-blocks, followed by a cyclic shift proportional to the number of sub-blocks [47].ARX performs encryption-decryption using addition, rotation and XOR functions without making use of S-box.Implementation of ARX is fast and compact but limits in security properties compared to SPN and Feistel ciphers.Nonlinear feedback shift register (NLFSR), applies to both stream and block ciphers, utilizes the building blocks of stream ciphers whose current state is derived from its prior state which is a nonlinear feedback value [20].Hybrid cipher combines any three types (SPN, FN, GFN, ARX, NLFSR) or even mixes block and stream property to improve specific characteristics (for example, throughput, energy, GE, etc.) based on its application requirements.
Out of these structures, SPN and FN are the most popular choice due to their flexibility to implement, based on application requirements [40].Although Feistel structures are incorporated easily into low-average power hardware (due to the absence of round function in one-half of the states), it usually requires more round function compared to SPN structures for safety reasons [48].When there is a choice between fewer SPN function rounds and higher Feistel function rounds with the same level of security and similar energy costs, SPN function could be a smarter choice [48].

III. EXISTING LWC ALGORITHMS
More than fifty symmetric LWC algorithms (plain encryption) are proposed by various academia, proprietaries and government bodies with a focus on reducing cost (memory, processing power, physical area (GE), energy consumption) and enhanced hardware and software performance (latency, throughput).However, many of them do not concentrate on security attacks explicitly and only care about performance and/or implementation cost [13].The structure-wise categorisation of these algorithms is summarised in Table 3.The following subsections unfold these LWC algorithms category wise.

A. STRUCTURE WISE LWC ALGORITHMS 1) SUBSTITUTION PERMUTATION NETWORK (SPN)
AES [49] is a classic example of SPN based algorithm, standardized by NIST, performs on 128-bit block with 128, 192 and 256-bit key variants [50].The minimum GE requirement recorded for AES is around 2400 GEs (23% smaller than the usual one) [50], which is still heavy for some small scale real-time applications [35].It shows the comparatively efficient performance when supplied with additional resources [38].
Another, most hardware and software efficient and ISO/IEC(29192-2P:2012) approved algorithm is PRESENT.It is Substitution-Permutation network based, uses 64-bit block on two key variants: 80-bit and 128-bit keys with the GE requirements of 1570 and 1886, respectively [51].The minimum GE requirement noted for a version of PRESENT is approx.1000 GE (encryption only) [52], where it takes 2520-3010 GE to provide an adequate level of security [35].It is a hardware efficient algorithm and uses 4-bit S-boxes (substitution layer -replaces eight S-boxes with single S-box) whereas it takes large cycles in software (permutation layer) which demands an improved version of this [32], [35], [40], [51], [53].
GIFT [54], an improved version of the PRESENT, was presented in CHES-2017.It offers lighter S-Box with smaller physical space.Also, the number of rounds is less and gives high throughput along with the simpler and faster key schedule.There are two versions of GIFT: GIFT-64, 28-round with 64-bit block size and GIFT-128, 40-round with 128-bit block size.Both use a 128-bit key.Also, lighter version, GIFT-64 found more vulnerable than GIFT-128 [55], [56].Very limited documents have been found with the micro-controller implementation of GIFT [57], [58].
SKINNY [59]  RECTANGLE is an ultra-lightweight block cipher that can be used with various application.With little changes in SPN structure, the rounds are reduced to 25 (compared to 31 rounds in PRESENT) to meet with the competitive environment [53].
TWINE achieves good overall status as PRESENT and also overcomes many of its implementation issues.It operates 64-bit input with two key variants, 80-bit and 128-bit [60].It requires around 2000 GE and a larger circuit size per throughput compared to AES [12].In speed comparison, when 1KB or more ROM is available, AES is faster than TWINE, but when only 512bytes of ROM is available, AES can't be implemented and works 250% faster than PRESENT [12].
Midori was designed with a focus on low/tight energy budget, for instance, medical implants.It comes with two different versions, Midori64 and Midori128.Both of these use a 128-bit key on two different block size 64-bit and 128-bit through 16 and 20 iterations, respectively [48], [61].
mCrypton (miniature of Crypton) [62] is a cost and energy-efficient, lightweight edition of Crypton [63], suitable for both hardware and software deployments.It performs 13 iterations on the 64-bit block using a variety of keys (64-bit, 96-bit and 128-bit).
NOEKEON [64] works on the same block and key size, 128 bit, via 16 iterations.The cipher was rejected by the NESSIE project due to its less resistance against the attacks [65].
ICEBERG [66] is optimized for re-configurable hardware deployment with a property of modifying the key at each clock cycle without compromising quality.Here, the round keys are derived on-the-fly.It performs on 64-bit input with 128-bit key via 16 iterations with a demand of 5800 GE at a throughput of 400 Kb/s [67].
PUFFIN-2 [68] is a compact edition of PUFFIN (2303GE) [69].It uses 80-bit key to perform 34 iterations on 64-bit data using serialized SPN structure.It requires only 1083 GEs for both encryption and decryption.
PRINCE is both hardware and software efficient lightweight algorithm [70] which performs on 64-bit input using a 128-bit key for 12 times [71].The smallest hardware implementation demands 2953GE at a throughput of 533.3 Kb/s.It shows the low energy consumption of 5.53 µJ/bit [72].
PRIDE [70] exhibits low latency and low energy demand with a 128-bit key to perform 20 iterations on 64-bit input.
PRINT [73] is a domain-specific cipher designed for two applications: PRINT-48 for IC-printing applications which make use of an 80-bit key to perform 48 iterations on 48-bit input (402GE) and PRINT-96 for EPC encryption which uses a 160-bit key to perform 96 iterations on 96-bit input (726GE).It uses 3-bit operations where an odd number of bit operation is not feasible, actual deployment of the algorithm is not ready yet.
To obtain efficient hardware and software footprints, LED [75] borrows features from PRESENT (S-box), Lighter version of AES (row-wise data processing) [50] and PHO-TON (mix column approach) [76].There is an absence of key scheduling in LED which is a unique feature.This approach reduces the chip area but increases the security risk like related key attacks [77].It processes 64-bit input using various keys such as 64-bit (966 GE), 80-bit (1040 GE), 96-bit (1116 GE) and 128-bit (1265 GE) keys for either 32 or 48 times [75].
PICARO [78] is a novel cipher with a good balance between performance and security (by an adequate choice of S-box).It has 4 different masking levels with faster hardware performance compare to AES.It uses 128-bit key through 12 rounds and shows high resistance to side-channel attacks.
Zorro [79] is based on AES, suitable for embedded systems and more efficient than PICARO.It takes a similar size of block and key (128-bit) through 24 rounds.
EPCBC (Electronic Product Code Block Cipher) [80] is a lightweight cipher, inspired by PRESENT, supports 96-bit key with the input of 48-bit and 96-bit block to perform 32 iterations.The most compact version needs 1008GE.The optimized sub-key generation technique of EPCBC enhances its immunity against related-key differential attacks.
I-PRESENT [81] is an involutive version of PRESENT inspired by PRINCE and NOEKEON.It takes a similar size of the block and key to perform 30 rounds with two additional 4 × 4 S-boxes (16 times).The most compact hardware implementation requires about 2769 GE (encryption and decryption).

2) FEISTEL NETWORK (FN)
The lightweight DES (Data Encryption Standard) is known as DESL.It works on a similar size of the block (64-bit), key (56-bit) and a similar number of rounds as DES.The reduced number of S-box (eight to only one [82]) and multiplexer [83] used in DESL distinguishes it from DES.It demands 1850 GE which is 20% compact compare to DES (2310 GE) [83].DESL also discards the initial and final permutation of DES to make it lighter [84].DESXL is another lighter edition of DES with a key whitening feature to strengthen the cipher and with 2170 GE demands [83].It performs the same number of cycles and uses the same block size as DESL but larger key, 184-bit (k = 56, k1 = 64, k2 = 64) [84].
Tiny Encryption Algorithm (TEA) is suitable for very small, computationally weak and low-cost hardware [85].It operates 128-bit key on 64-bit input to perform 32 rounds [86] with GE requirements of 3872 [87].Its simple key scheduling is vulnerable to brute force attack [88], [89].Another limitation of TEA structure is it's three equivalent keys for decryption which makes it vulnerable to the attackers [88].The improved version of TEA is (XTEA) which uses the same size of key and block but with more iterations (64 rounds), demanding 3490 GE [90].It offers more complex key scheduling with little change in Shift, XOR and addition functions [91].XTEA was further modified with XXTEA [92] to immune against related-key rectangle attack (on 36 rounds) [91].
Camellia [93] is an ISO/IEC, IETF, NESSIE and CRYP-TREC recognised cipher.It was designed by Nippon Telegraph and Telephone Corporation and Mitsubishi Electric Corporation.Camellia offers a similar level of security by processing the same size of key and block as AES with two round variants, 18 and 24.It is known for its fast software implementations [94] whereas the hardware implementation requires 6511 GE.
SEA [96] is designed for tiny IoT devices, especially for memory-constrained devices [97], with the concept of on-the-fly key generation [96].It uses 96-bit key on two recommended block size 96-bit and 8-bit with the requirement of 3758GE [97] for the most lightweight hardware version.The optimised software execution demands 426 bytes with encryption cycle of 41604 on 8-bit micro-controllers [98].
KASUMI [99] takes 64-bit input to performs 8 iterations using a 128-bit key.It demands 3437GE for deployment on hardware [100].It is mainly designed for GSM, UMTS and GPRS systems.
MIBS [101] takes 64-bit input to perform 32 iterations using two variants of keys, 64-bit (1396 GE) and 80-bit (1530 GE).It is Feistel based structure, makes use of S-box from mCrypton [62] and uses PRESENT's keys extraction technique to derive the sub-keys.
LBlock [102] is an ultra-lightweight cipher, performs 32 iterations on 64-bit input along with 80-bit keys.The smallest hardware deployment needs 1320 GE for a throughput of 200 Kb/s whereas 3955 clock cycles are taken by most efficient software implementation to encrypt a single block (on the 8-bit microcontroller).
The designed and developed by the government of the Soviet Union (1989), the lightweight version of GOST executes on 64-bit input with a 256-bit key for 32 times.The S-Box in this version is adopted from PRESENT [103] with the demands of 651 GE.
ITUbee [104] is a software efficient cipher with a code size of 586 bytes and 2937 cycles (the most compact version of encryption).It takes the same size of key and block (80bit).Here, key scheduling is replaced by round-dependent constants to reduce software overload.
FeW [105] processes 64-bit input with two varieties of the key, 80-bit and 128-bit for 32 times.It makes use of S-box of Humminbird-2 and follows the key expansion process from the PRESENT.There no cryptanalytic attack found on FeW [105].

3) GENERALISED FEISTEL NETWORK (GFN)
Introduced by SONY corporation and approved by NIST, CLEFIA offers 128-bit block with choice of 128,192,256 bit key through 18, 22, 26 round, respectively [106], [107].It shows high performance and strong immunity against various attacks [40], [106] [108], [109] with comparative high cost as the most compact version requires 2488 GE (encryption only) for 128-bit key [107].The strong immunity of CLEFIA against security attacks is grateful to its dual confusion and diffusion properties.In contrary, this demands higher memory and limits its use in ultra-small applications [35].
Piccolo [110] is another ultra-lightweight cryptography algorithm suitable for extremely restricted environmental devices (RFID, sensors, etc.).It processes 64-bit input to perform two iterations, 25 and 31, using two key sets, 80-bit and 128-bit, respectively.The smallest hardware deployment (80-bit key) requires 432 GE and an additional 60 GE to perform decryption.
TWIS [111], derived from CLEFIA, takes equal size block and key (128-bit) to perform 10 iterations.It is a victim of differential distinguisher with probability one [112].
TWINE [60], derived from LBlock, performs 36 iterations on 64-bit state along with two key options, 80-bit and 128bit.The most compact hardware implementation requires 1866 GE.TWINE uses nibble permutation instead of bit permutation (for sub-key generation) of LBlock.Also, it uses a single S-box instead of ten S-Boxes of LBlock.
HISEC [113] performs 15 iterations on 64-bit input along with an 80-bit key, demanding 1695 GE.It shows good resistance against different attacks, and the characteristics are more like to PRESENT except bit-permutation.

4) ADD-ROTATE-XOR (ARX)
SPECK [95], sibling of SIMON and designed by NSA, is a software-oriented cipher.It supports the similar size of blocks and keys as SIMON to perform 22, 23, 26, 27, 28, 29, 32, 33 and 34 iterations.The most compact hardware implementation recorded uses 48-bit block with 96-bit key with requirements of 884 GE whereas the most efficient software implementation requires 599 cycles with 186-byte of ROM for 64-bit block with 128-bit key [95].
IDEA [114], designed by Lai and Massey, makes use of a 128-bit key on 64-bit input to perform 8.5 iterations, mainly used for high-speed networks [115].It uses 16-bit unsigned integer and performs data operations such as XOR, addition and modular multiplication without using S-box or P-box.It is known for its best performance on embedded systems (such as PGP v2.0.) with memory needs of 596 bytes at a throughput of 94.8 Kb/s (the smallest software version) [116].
HIGHT [117], an ultra-lightweight algorithm, processes 64-bit data using a 128-bit key for 32 times.It performs compact round function (no S-boxes) using simple computational operations.The most compact version acquires 2608 GE for 188 Kbps throughput [118].
BEST-1 [119], an ultra-lightweight cipher, targets Wireless Sensor Networks and RFID tags.It takes 64-bit input with a 128-bit key through 12 rounds on 8-bit processors, demanding 2200 GE.The core functions of BEST-1 are mod 2 8 addition and subtraction, bitwise shift and XOR.
LEA [120] is a software-oriented cipher and was introduced by the ETRIK for 32-bit common processor.It processes 128-bit input to perform 24, 28, and 32 iterations using 128-bit, 192-bit and 256-bit keys, respectively.On the ARM platform, LEA performs 326.94 cycles/byte with a storage demand of 590 bytes (code) and 32 bytes for execution.The most compact version requires 3826 GE for 76.19 Mbps throughput [121].

5) NONLINEAR-FEEDBACK SHIFT REGISTER (NLFSR)
With focus on automobile industry, KeeLoq [22] is designed with an aim to keyless authentication (remote access) in cars [122] by Gideon Kuhn.It takes 32-bit input with a 64-bit key to perform 528 rounds.Even though KeeLoq was developed in the '80s, the cryptanalysis report was issued in February 2007 for the first time by Bogdanov [123].
KATAN/KTANTAN [124], inspired by KeeLoq, cipher family applies 80-bit key on various block size (32-bit, 48-bit and 64-bit) through 254 iterations.They could be executed on small-scale hardware (KATAN 802 GE and KTANTAN 462 GE), as mainly designed for RFID tags and sensor networks.They follow a linear structure (LFSR) instead of NLFSR of KeeLoq.KATAN has a very simple key scheduling compare to KeeLoq, whereas KTANTAN exhibits no key generation operations (reduce GE requirement).As the key remains unchanged once initialized, the applications of KTANTAN is limited.KTANTAN-48 (588 GE) is more appropriate for RFID tags.In software, both shows poor performance (low throughput and high energy consumption) due to overuse of bit manipulation [98].
Halka [125] performs well on both hardware and software.It takes 64-bit input with an 80-bit key to perform 24 iterations.The multiplicative inverse based S-boxes (8-bit) with LFSR makes Halka more secure than PRESENT.It demands 138 GE (7% less GE than PRESENT) [125].Also, the software performance is 3 times more efficient than PRESENT [125].

6) HYBRID
Hummingbird [126] is an ultra-lightweight algorithm, introduces a hybrid structure (block and stream).It takes 16-bit input with a 256-bit key to perform 20 iterations.It was vulnerable to several attacks [127].
Hummingbird-2 [128], designed for low-end microcontrollers, takes 64-bit input (initial vector) with a 128-bit key.It performs well on both the platforms (hardware/software).It also satisfies the ISO 18000-6C protocol.It gives better performance compare to PRESENT (on 4-bit microcontrollers) but have few drawbacks: 1) Initialization is necessary before encryption (or decryption) due to its stream property 2) Different encryption and decryption functions and due to that full version is 70% heavier than only encryption.Moreover, its performance degrades while processing small messages.
PRESENT-GRP [35] works on 64-bit input with a 128-bit key to perform 31 iterations.It makes use of the substitution-permutation technique from PRESENT along with a group(GRP) operation for additional confusion properties (in replacement of permutation table).The hardware implementation of PRESENT (1884 GE) is slightly better than PRESENT-GRP (2125 GE).Similarly, PRESENT is more efficient than PRESENT-GRP in software implementation too.
According to the graph (Figure 5), software efficiency competition is won by SPECK, followed by SIMON and then PRIDE.Also, ITUbee, LEA, IDEA and AES show better software efficiency compare to the other LWC algorithms.
Memory (RAM and ROM) requirements by various LWC algorithms can be studied from the above graph (Figure 6) which reveals the first ten, most memory-efficient LWC algorithms.The competition is again won by SPECK and SIMON with less than 200 bytes of ROM and zero bytes of RAM requirement, closely followed by PRIDE.
Another important software metrics, latency and throughput, lead by again SPECK and SIMON with lowest latency rate (408 and 594 cycles/block) and highest throughput  (470.5 and 323 Kb/s) unceasingly followed by PRIDE.ITUbee and IDEA also secure their places in the list of first ten performers (Figure 7).
In terms of hardware efficiency, Midori is on the top of the list, by PICCOLO as runners-up with a minor difference with GOST. Figure 8 visualizes the first ten hardware efficient LWC algorithms.
SEA leads the key and block wise hardware efficiency competition with very little block size (only 8-bit), followed by Hummingbird-2 with a double-size block (and the largest key in this top-10 list) and further by KATAN/KTANTAN with 4 times bigger block compared to the leader (Figure 9).From the graph (Figure 10), we can say that KTAN-TAN demands the smallest area (462 GE) to implement, with a minor difference from PRINT (41 GE more).SPECK/SIMON shows their presence in top 5 lists with less than 900 GE needs.All of these performances are noticed either on 0.13 µm or 0.18 µm technologies.
In terms of energy consumption, Midori shows the lowest energy requirement (1.61µJ/bit), followed by Piccolo,   PRINCE, TWINE and RECTANGLE with small differences amongst (Figure 11).
In summary, SIMON and SPECK shine by their most efficient software implementation but disappears from the top-10 list of hardware efficient LWC algorithms.Also, derived version of AES such as PRESENT and derived lighter versions of DES such as DESL/DESLX, CLEFIA are widely recognised algorithms (by the standardising bodies) due to high-security reasons.Overall, none of the LWC algorithms meets all the efficiency metrics of the hardware and software  requirements and shows distinct performances in different circumstances.

C. CRYPTANALYSIS OF LWC ALGORITHMS
Along with performance and cost, security is an important and essential measure for any lightweight cryptography algorithm.Attack resistance property of any lightweight cryptography algorithm can be measured through cryptanalysis.Cryptanalysis aims at detecting algorithm vulnerabilities by attempting various attacks and decryption techniques [38].The main 4 types of cryptanalysis on block cipher are [38], [51], [53], [131]: Differential cryptanalysis, Linear cryptanalysis, Integral cryptanalysis and Algebraic cryptanalysis.Differential cryptanalysis is an analysis of outputs against various inputs.The special types are higher-order, truncated, impossible and boomerang.Linear cryptanalysis postulates a linear approximation based on the piling-up lemma principle (introduced by Mitsuru Matsui) between plaintext, ciphertext and key by characters or individual bits.Integral cryptanalysis is especially pertinent to block ciphers with substitution-permutation networks.It is documented with two other names such as Square attack and saturation attack too.Algebraic cryptanalysis is based on equation-solving algorithms and has been proven effective on lightweight versions due to its simple structure (less number of rounds with less algebraic complexity).
These cryptanalyses are based on Ciphertext only, Known plaintext, Chosen plaintext and Chosen ciphertext along with MITM, Brute force and side channel.Differential Fault Attacks, a type of side-channel attack, analyzes the internal structure and finds an exploitable place to attack the algorithm [132], [133].Table 5 demonstrates the security analysis of various LWC algorithms in a grid form.The study shows that almost all existing lightweight block cipher solutions suffer from various attacks, especially, related-key attack, followed by various differential and MITM attacks.Moreover, the lighter versions (with reduced rounds) are more vulnerable to various attacks compared to their standard one.

D. STANDARDIZATION OF LWC ALGORITHMS
The organizations/research groups, who are actively contributing in the field of cryptography to improve the lightweight standards for resource-constrained devices are s follows: • National Institute of Standards and Technology, USA (NIST) • International Organization of Standardization and the International Electrotechnical Commission (ISO/IEC) • Cryptography Research and Evaluation Committees, Japan (Cryptrec) • European Network of Excellence in Cryptology (Ecrypt) • National Security Agency of USA (NSA) • CryptoLUX (University of Luxembourg) PRESENT [51] and CLEFIA [106] are the only two algorithms approved by the ISO/IEC 29192 standard whereas AES, CLEFIA, TDES, Camellia, PRESENT, PRINCE, Piccolo, LED, TWINE, SIMON & SPECK, Midori are targeted by Cryptrec.

E. REAL-TIME USE CASES: APPLICATIONS & THEIR LIGHTWEIGHT DEMANDS
The wide range of IoT applications in various fields creates the demand for lightweight cryptography algorithms with different requirements [174].Smart home appliances such as smart TV, smart fridge, smart kettle, smart bulbs, etc., demands for small memory and small processing.The best suit algorithms in this scenario are SIMON, SPECK, PICCOLO and TWINE.Due to tiny physical space and a little or no power backup in RFID tags, SIMON, SPECK, Piccolo and PRINCE are the best options for logistics applications.Nowadays, smart agriculture is an emerging field that demands compact implementation, less processing cycles, little power consumption with plenty of sensors in a remote location.SIMON, SPECK, PRESENT and TWINE fulfil the requirements of smart agriculture.A person under medical treatment in a hospital or at a residence could be monitored for pulse count, level of pressure, sugar and oxygen in the blood, using IoT sensors where security and privacy of the transmitting data are crucial along with tiny circuitry, little processing power and limited batteries (in case of an implanted device) and quick response time.In this constrained environment, SIMON, SPECK, PIC-COLO, PRESENT and Midori are the best suit solutions to secure the communication in health care applications due to their overall compact hardware and software implementation to match with a real-time response while in-body and/or out-body (wearable) implantation.For industrial systems (Industry 4.0) where sensors could be attached to equipment at various places (not easily accessed by the operators), to transmit the data wirelessly for specific distances.In this state, real-time processing is the key element with adequate security (without bothering about energy consumption).Midori and PRINCE show the best performance in a demanding scenario.In an era of 5G technology, automobile industry demands not only in-vehicle communication but also with infrastructures such as traffic signals and road signs (V2X).This communication demands a prompt response (low latency) on a tiny circuitry with high security.Midori, PRINCE, PRESENT, and SIMON are the right choices for auto industries.Keeloq is another powerful LWC algorithm for secure remote keyless entry in cars and buildings [171].

IV. OPEN RESEARCH CHALLENGES AND RESEARCH DIRECTIONS
The ideal algorithm should maintain a proper balance among cost, performance and security (Figure 12).Any two of these three can be easily optimized, whereas achieving all of these together is challenging [38].For example, an increasing number of rounds [131] or key size results in degradation of algorithm performance.These could be achieved by design focus on less memory and less computing power requirement, leading to less Gate Equivalent (physical area) requirements along with low power (energy) consumption without compromising strong security [35].Based on the above study, we have identified the following research issues, which require further attention to make the LWCs algorithms effective in IoT security: 1) One of the two fundamental properties of cryptography, confusion, could be achieved by choosing an efficient and adequate number of S-boxes to demonstrate a proper balance between performance and security [78].So designing simple and fast but strong confusion (Sub-  the number of S-boxes they the demands memory (to store) and computing power (to produce) while maintaining the same security level?(motivation: PRESENT is designed from AES and replaces eight S-boxes with just one.Similarly, many researchers have derived the lighter from the standard cryptography algorithms with a few modifications by reducing substitution-permutation (counter-effect on security level)).But how to replace S-boxes with some other confusion techniques with the same level of security and less overhead of memory and processing cost is still an open problem.2) Making key scheduling lighter with smaller key size and adequate strength, i.e., How to generate random sub-keys from the provided initial key for all n rounds?3) Increase in the number of rounds adversely affects the performance and cost, i.e., How to decrease (or increase) number of rounds without compromising performance as well as security level?
We are currently working on substitution-permutation methods with main focus on S-Box to design a generic lightweight cryptography algorithm, with the right blend of three main characteristics namely, cost, performance and security.

V. CONCLUSION
Due to the exponential growth in the number of IoT devices in various domains, IoT security is one of the main concerns.As a consequence, there is a need for a lightweight algorithm(s) with trade-offs amongst cost and performance and security.For resource-constrained IoT devices, lightweight cryptography is an effective way to secure communication by transforming the data.The well-defined LWC characteristics (cost, performance and security) by NIST are compared, and further research gaps and open research challenges are highlighted in this paper.From the literature review, PRESENT and CLEFIA are the approved block ciphers by NIST due to security reasons along with accepted performance and cost.
On the other side, SIMON and SPECK impress by their most compact implementations.In general, none of the LWC algorithms fulfils all the criteria of hardware and software performance metrics but performs at their best in the specified environment.However, new attacks are reported with the growth of new LWC algorithms which is an inevitable and never-ending process.The war between cybersecurity experts and attackers always opens a door of opportunities for new research in the field of cybersecurity, especially lightweight cryptography.

FIGURE 1 .
FIGURE 1. Two main categories of IoT Devices.

TABLE 1 .
List of Abbreviations and Acronyms.

TABLE 4 .
Hardware and Software performances of LWC algorithms.

TABLE 5 .
Security Analysis of LWC Algorithms.