A Secure Data Infrastructure for Personal Manufacturing Based on a Novel Key-Less, Byte-Less Encryption Method

We are witnessing the advent of personal manufacturing, where home users and small and medium enterprises manufacture products locally, at the point and time of need. The impressively fast adoption of these technologies indicates this approach to manufacturing can become a key enabler of the real-time economy of the future. In this paper, we contribute a secure and dependable infrastructure and architecture for that new paradigm. Our solution leverages physical limitations of the computational process into a defense strategy that makes distributed file storage and transfer highly secure. The main idea is to replace asymmetric or public-key encryption functions with an unkeyed, collision, second preimage, and preimage resistant cryptographic hash function. Such a cryptosystem does not have an inverse function $H {^{\mathrm{ -1}}}$ . We challenge each block hash against the full hash table to recreate the original message. To illustrate the approach, we describe secured protocols that provide a number of desirable properties during both data storage and streaming. Similar to proof-of-work blockchain consensus algorithms, we parameterized the solution based on the amount of infrastructure available. Experiments show the proposed method can recalculate hashes for a 3-dimensional live matrix of 2563 at an average of 14 revisions per second, and one revision every 5 minutes for a bigger matrix of 40963. The increase in cloud infrastructure cost is insignificant compared to the level of protection offered.


I. INTRODUCTION
We are witnessing the advent of personal manufacturing, where home users, small and medium enterprises use devices such as 3D printers, CNC mills, laser jets, and robotics to manufacture products locally, at the point and time of need. The impressively fast adoption of these technologies strongly indicates that this novel approach to manufacturing can become a key enabler for the real-time economy of the future, i.e., a possible paradigm shift in manufacturing toward personal manufacturing. In such a paradigm, people The associate editor coordinating the review of this manuscript and approving it for publication was Raúl Lara-Cabrera . and organizations would not buy a ready-made product. Instead, they would obtain raw material and produce products using their own or locally accessible automated manufacturing (AM) machinery.
With the growing popularity of AM, robotic process automation (RPA), self-driving cars, automated medical devices, video and hologram streaming and internet of things (IoT) in general, the need to securely store and transfer streamable file types such as machine instructions and manufacturing files becomes more and more important.
Thus, the requirements for a modern secure distributed file storage and transfer are changing, and efficient methods of secured cloud storage and streaming are becoming a VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ compelling need. However, securing cloud file storage and transfer is a challenging task [1]. The nature and properties of modern files types impose certain constraints on how secure distributed file storage and transfer methods should operate. One such constraint is the need to repeatedly access streamable files line by line or layer by layer without inconsistencies, delay, or compromising security through exposure of the whole file at once. In this paper, we address this problem and introduce a possible solution based on an efficient approach that utilizes technical limitations of the cloud and leverages them into a security control and defense strategy.
The main idea is to replace an asymmetric or public-key encryption functions with an unkeyed, collision, second preimage, and preimage resistant cryptographic hash function. Such a cryptosystem does not have an inverse function H -1 , and no key to decrypt the hash and get message back unless we pre-calculate a full hash table. We challenge each block hash against the full hash table to recreate an original message. To illustrate this approach, we have constructed secured protocols that provide a number of desirable properties to secure machine codes at rest and during delivery to stream consumption device.
The previous generation [2]- [4] of our solution has been implemented and proven over several years as a mechanism to securely deliver content to 3D printers from the cloud. Today, the 3DPrinterOS cloud has more than 84 000 users who have generated over three million CAD designs and machine codes. Users have produced more than 950 000 physical parts on 28 000 3D printers in 100 countries [2]; these values double every six months [2]. The technology is licensed to Bosch [3], Kodak [4], and other popular desktop 3D printer manufacturers. The solution described in this paper completely reworks the first [5] and second generation [2]- [4] of this secure content delivery mechanism and extends it to any type of manufacturing machine or complex IoT device with command, control, and telemetry.
The main contributions of this paper are: a) a novel, key-less, byte-less encryption method, ready for application to AM; b) an approach that leverages the physical limitations of the computational process [6] into a defense strategy; c) a threat model and security analysis of the proposed approach.
The main use case is the transfer of machine codes from secured cloud storage to a network-connected manufacturing machine. Other potential applications include streaming of a) video; b) holographic video; c) voice communication; d) medical data; e) business file data; f) telemetry, including command and control data to and from self-driving cars.
The remainder of this paper is organized as follows. In Section II, we introduce additional background and discuss the topics addressed in this paper. In Section III, we analyze and discuss why existing cloud file storage and transfer solutions such as digital rights management (DRM), video streaming and 3D model streaming fail to address critical constraints and security problems adequately. In Section IV, we explore a relatively new paradigm of cloud security, live matrix, proactive and passive cloud nodes, and our protocol. In Section V, we thoroughly describe the proposed cloud application infrastructure and architecture; in Section VI, we discuss strong and vulnerable points of such an approach. In Section VII, we describe the setup used to evaluate the proposed method by conducting experiments with a local cloud of machines. Finally, Section VIII concludes the paper by summarizing the results and indicating issues to be addressed in future work.

II. SETTING THE SCENE
This section prepares the reader for the proposed solution, which is described starting in Section IV.
A. STREAMING VERSUS CONVENTIONAL SECURE FILE STORAGE/TRANSFER 1) ARGUMENT: IMPORTANCE OF MACHINE INSTRUCTIONS Seventy years ago, in the so-called ''paper age,'' most products' technical drawings were prepared on paper. Imagine an attacker obtained pictures of the paper sketches of an innovative product. In the best-case scenario, it took many years to find or even build production technology, train engineers, set up a factory and production lines to produce prototypes and then a real product. In the worst case, there is no way to build the product using copies of the sketches, as the ''secret sauce'' required to build that product is somewhere down the production line, inside the heads and hands of the engineers working at a specific factory. A good example is rocket fuel; even with all the sketches of rocket structure and shape, people still need to identify and prepare fuel.
About thirty years ago, we entered the digital age, with the use of computer-aided design (CAD), computer-aided engineering (CAE), computer-aided manufacturing (CAM), computer-aided process planning (CAPP), computer-aided quality assurance (CAQ), production planning and control (PPC), and enterprise resource planning (ERP) tools [7]. However, these tools were initially used primarily to create a virtualization of a product to make measurements, manage bill of materials (BOM), and provide simulations to facilitate quicker changes to a product's structure and shape during prototype testing cycles. Much manual work was still required, including post-processing and manual surface finishing. People are accustomed to using very basic solutions, like digital rights management (DRM), to secure CAD/CAM/CAE designs.
In the past, if such a DRM-protected CAD/CAM/CAE design was compromised, the barriers discussed above would still slow the rate of the product's production and distribution. Compared to the ''paper age'' example, with decades required to produce the product, in the digital age, it might take only six months to figure out the details, find production facilities, and produce a marketable product.
In the personal manufacturing age, CAD/CAM/CAE intended for AM already has the ''secret sauce'' baked in. In other words, the proprietary information required to produce the market-ready product is inside the file. If such a design is compromised, the attacker can reach the market with a production-quality product in just a few days, if not hours. Designs intended for AM and 3D printing contain all of the information needed to manufacture a real production quality product according to exact specifications: make and model of the manufacturing device, direction of layers infill, tolerances, surface finish, materials, speeds, temperatures, durability and taking into account force distribution and dispensation. With recent advances in AM technology, it is possible to manufacture a real working part or a usable product from a CAD/CAM/CAE design in just a few hours.

2) ARGUMENT: AN AM MACHINE IS A THIN CLIENT
The amount of information contained within modern CAD/CAM/CAE files for AM creates a load on the whole supporting infrastructure and requires substantial computing power. There is no way to put a supercomputer into each AM machine.
Over time, there has been a trend in AM to move as much calculation to the cloud as possible due to the low cost of cloud computing power. Initially, slicing for 3D printers was performed on the workstation built into a 3D printer (e.g., [8], [9]). Then, slicing software moved to engineers' workstations [10]. Now, slicing has moved to the cloud [2], with machine code streamed to the AM machine.
The next important step is to stream stepper motor pulses from the cloud directly to the AM machine. Firmware is moving to the cloud. As with software and faster computing, this move improves hardware operation, with incredible increases in quality and speed. For instance, Okwudire et al. [11] sent a low-level stepper motor commands from a server to simplified firmware, which interpreted simple commands and proxied them to the stepper motor drivers. They measured an increase in printing quality and speed.
AM machines should have a thin client built in, not a workstation [8], [9]. This thin client will interpret commands and send back current status and metrics. If the AM does not achieve a certain temperature or speed, the cloud needs to know, to update its manufacturing execution system (MES) and users about the delay. This approach will reduce costs and eliminate the need for local software updates. Moreover, the increase in calculation complexity possible in the cloud enables faster, smoother operation of local AM machines.
To explain why, we must first outline the basic steps that every contemporary AM machine firmware performs: a) read machine code into memory; b) interpret machine code into movements between coordinates; c) plan path through coordinates; d) calculate accelerations and decelerations with lookaheads taking into account inertia and potential forces; e) project movements to the stepper motor axis; f) ensure the motors and toolhead follow the programmed trajectory.
It is difficult to achieve excellent manufacturing quality when performing such processing on microcontrollers. Most firmwares perform only minimal prediction of the toolhead path. As a result, movement of the toolhead creates excessive vibration and noise, and it sometimes hits the wall of the machine. These phenomena cause drops in manufacturing quality with any increment in manufacturing speed, despite the machines' excellent and frequently over-engineered hardware. The problem hides in the microcontrollers, which spend most of their computing time calculating trajectory. The less computing time the microcontroller spends on planning, and the more on operating the hardware, the better the manufacturing speed and quality.
To move the toolhead one millimeter, a stepper motor must perform a certain number of steps. For example, a 0.9 degree per step stepper motor performs a whole revolution in 25 full steps [12]. Such a motor will produce torn movements and generate substantial vibration. Moreover, the movements will be slow because of the inability to accelerate and decelerate efficiently; if configured to operate at high speeds, the machine will skip steps, resulting in missing manufacturing tolerances and overall lower product quality. The same motor operated with so-called micro-stepping, set at 1/32 of a step, will move much more smoothly, but require 800 steps per revolution [12]. However, not every microcontroller can maintain this rate of feeding steps into the motor driver. For context, an ATmega 16 MHz microcontroller with Marlin firmware achieves fewer than 10 000 steps per second (10 kHz) [13].
Moving path planning out of the firmware to a nearby computer increases manufacturing speed and quality. This was achieved by a team of researchers behind the Klipper project [14]. The same ATmega 16 MHz microcontroller described above, but operated with Klipper firmware [14], achieves 151 000 steps per second (151 kHz). It also drives the motors more smoothly, with fewer errors, and improved manufacturing quality. In the Step Benchmarks table [14] we can see that the same hardware can be 10x more efficient with the right software and more computing power. To achieve such improvements, we will ultimately stream encoded physical signal commands from the cloud to AM machines. The method proposed in this paper is ready for these types of applications.

3) ARGUMENT: LARGE FILE SIZES
To explain why AM machine codes should be streamed versus downloaded and stored, we will use the example of a very simple 3D design-an annular cylinder-created in OpenSCAD software [15], [16].
The file for a given object will have a different size depending on which stage of manufacturing it is prepared for, and will involve different representations of the 3D object. We have depicted data file sizes at different stages of digital design for automated manufacturing in Fig. 1. As it shows, file size increases exponentially when moving from a less systematically specified representation of the object to the more specific representation needed to produce the production part.
In Step 1, the initial CAD design can be a few lines of code to mathematically represent a part. In Step 2, the STL file prepared for manufacturing is a set of triangles in space representing a CAD file; in addition to the overall shape of the  File size at different stages of digital design for automated manufacturing: From left: CAD design; an STL file prepared for manufacturing; machine codes for a specific AM machine make and model; command sequence for AM machine stepper motors. File size exponentially increases from the less systematically specified representation of the object to the more specific representation required to produce the physical part.

FIGURE 2.
Representations of a) data encrypted with a static key; b) three attack vectors on static key encryption; c) dynamic key encryption with constantly changing data states-the state changes more quickly than the time required to physically extract the data.
object it contains information on manufacturing tolerances, the higher is precision the bigger is the file size. The lower the tolerances, the bigger the file size. In Step 3, machine code is produced from that file; this code is specific to a certain AM machine make and model. In addition to the shape of the object, it contains information about each individual layer the 3D printer will build to create the object. Each layer requires a certain number of movements of the toolhead. Each movement has an associated speed and information about the amount and speed of material extrusion. In Step 4, the command sequence for stepper motors file represents all of the signals that go to the stepper motor driver to execute the machine code. It includes calculations of acceleration and deceleration, takes into account inertia, timing, and many other factors. This is the exact recipe for how the part is produced. Changes in this last stage of preparation will affect the tolerances, quality, and speed of manufacture.

4) ARGUMENT: THE WHOLE FILE IS NOT NEEDED AT ONCE
In a past experiment [17], we found that a CAD file of a computed tomography (CT) scan of the human brain required about 2 GB, the corresponding medium-quality AM machine codes 6 GB, and print time for the full-size brain was 96 h. For a high-quality 3D print of the human brain, the machine code would be 36 GB, requiring approximately two weeks of manufacturing time on a 3D printer. The 3D printer did not need the entire file at once, as the manufacturing process takes time, and it was possible to transfer the file in smaller segments.

B. PHYSICAL LIMITATION OF COMPUTATIONAL PROCESSES
This is a basic example explaining how the physical limitation of computational process and different types of bottlenecks can be turned into a defense strategy in the cloud.
Let's use an analogy from the physical world. Let distilled water represent data we want to protect. A bottle of distilled water is put on a table, see Fig. 2 a). One approach to obtain the water without opening a lock on the bottle is to drill a small hole to let the water leak out (Fig. 2 b). Our storage solution could be compared with constantly changing bottles, and a robot which pours water from one bottle to another, adding and removing chemicals using different chains of chemical reactions to protect the water (Fig. 2 c). In this scenario, the water that is actually poured is, for example, sometimes a different acid, sometimes a different alkali. An attacker can still start drilling a hole in the bottle, but the bottle is still and steady only for a minute before the robotic arm starts to pour it to a different glass and add some other chemicals to change the state of the liquid. Only robot knows how many chemical transformations and in what sequence would lead back to the original distilled water.
If an attacker starts to drill holes into the bottles to steal the liquid, that attack requires time. If drilling a hole takes 5 minutes, and the bottle is only available in steady condition for 1 minute, then this is a clear bottleneck-a physical limitation. Now imagine a hacker used a faster way to drill a hole. It still takes time and there is a physical limitation-the diameter of the hole (in our approach, the network connection between the nodes and between the hacked node and secured cloud node). Now, the attacker starts to get a liquid. But if it takes, say, an hour to obtain all liquid from the bottle it will do the attacker no good-this exact bottle holds the liquid for only 1 minute-before the bottle is changed and the physical composition of the liquid is changed. The attacker has obtained some small amount of an unknown liquid, with no information about how to turn it back into its original form. By drilling subsequent holes and getting smaller amount of liquids at different stages of the chemical chain or recipe, the attacker will end up with a mysterious mixture with a complex chemical composition. The attacker will not know how to turn this mixture back into its original form. The attacker may have substantial time and computational power to analyze the liquid and to use brute force to get the original mixture. But this is near-impossible, as at some later time even the robot will not know what happened in the past; it does not have enough storage to keep versions of all of the obsolete recipes and chemical reactions. The faster the robot performs its manipulations, the harder it is to access the bottle for a reasonable amount of time, to drill holes or pump out the contents of the bottle. Now, how does the solution described above translate to a computer problem? The metaphor described above with robotic arms and chemicals in bottles explains that it is hard to steal information that is constantly moving and transforming. This is the physical limitation. We compare our metaphoric example with what our solution does in Fig. 2: 1) A bottle with water and a lock on the lid to data store with data at rest, encrypted with a key (Fig. 2 a). 2) A drill bit, a key-ring with different master keys and lock picks, and a hacksaw attacking the locked bottle with water to encrypted data at rest and the use of various attack vectors to get to the data at rest (Fig. 2 b). This comparison represents an encrypted file in storage. Once an attacker gets a copy of the storage or the file, cracking it is only a matter of time. 3) Robotic arm to live matrix (Fig. 2 c). Our solution shuffles the data faster than an attacker can download it from the cloud, due to the physical limitations of computer systems, for example, the network interface. 4) Bottle with added chemicals to the data state in our solution (Fig. 2 c). The data state is static for a short amount of time, then it is changed. Within this short time period, it is hard to successfully extract the full file. The attacker ends up with partial data extracted at different states. 5) Broken bottles to expired data states (Fig. 2 c). If the data state is expired and not yet removed from the computer memory, it can no longer be used for retrieval of the data; thus, attacking it does not help crack the data store.
6) Drill bit to attack vector (Fig. 2 c). Any attack vector requires some amount of time to extract data. Before an attacker can extract the data, its state becomes obsolete, and the attack must be started from the beginning. Any attack through a computer system will face a physical limitation if the secured storage uses these physical limitations as a defense mechanism. 7) The bottle with the next chemical solution where the robotic arm pours the current chemical solution to current data state r and next data state r + 1 (Fig. 2 c). 8) Queue of bottles with different chemical solutions according to a recipe to upcoming data states r + 2, r + 3, r + 4, . . . , r + n according to the playbook (Fig. 2 c).

C. PUBLIC/PRIVATE KEY ENCRYPTION
Why not simply use public/private key encryption to protect manufacturing files? This approach is unfortunately prone to attacks in a manner similar to DRM. If a manufacturing file is encrypted with a static key, and the file is transferred and collected by the attacker, then decrypting it is only a matter of time.
One approach could use software like network security research tool Fiddler [18]. Fiddler can receive encrypted traffic using public-key, e.g., HTTPS traffic. When installed on a machine, it collects all dynamic public/private keys for all communication to/from that computer. It is relatively trivial to use an approach like this to collect dynamic keys and decrypt the files being transferred, without even compromising the software receiving the file. To compromise our solution, Fiddler would need to understand the in-memory live matrix data structure, understand how it is being calculated, and only then potentially perform an attack. This is a much more complicated scenario to execute compared to public-key encrypted file transferred over HTTPS.

III. RELATED WORK
In this section, we present work that we consider to be close to the requirements described above and categorize relevant papers into six subcategories for a more systematic discussion. We start with some general considerations of cloud security, and then go more deeply into specific solutions, like point-to-point and point-to-multipoint secured communication, cloud secured storage, DRM, video streaming, and 3D streaming.

A. CLOUD SECURITY RISKS, REQUIREMENTS, AND MITIGATION
In [19], Brunette et al. provide a comprehensive analysis of possible issues in cloud security and how to mitigate them. They present a solid approach to assess existing cloud applications and provide a requirement base for the design of secure cloud solutions. That work provides notable recommendations. However, from our perspective, a next level-an integral solution-is necessary. For the sake of an ultimate security solution for cloud storage and file transfer, VOLUME 8, 2020 we need a change in philosophy, and a new paradigm-live matrix-which we describe in Section IV-C.

B. POINT-TO-POINT AND POINT-TO-MULTIPOINT SECURED COMMUNICATION
We examined related research on peer-to-peer, point-to-point and point-to-multipoint communication. First, most such solutions tend to use lower layers of the OSI model, mostly layer 3, the network layer. This positively affects the speed and throughput of the communication. At the same time it makes most of the protocols proprietary and exotic, which may make them hard to widely implement for AM machines. In contrast, the solution we propose in this paper is network layer and protocol agnostic, as the only information that is transferred is cryptographic hashes. Our solution would benefit from using a lower layer of OSI model, and streaming hashes over a lower level of the OSI model is a topic worthy of future research and experiments.
Second, the main efforts in the literature are focused on resolution of peers and finding and re-routing if a peer is disconnected. These mechanisms can compliment the solution described in this paper. Many approaches to point-to-point and point-to-multipoint communication security employ basic private/public key encryption, which does not prevent the exposure of intellectual property.
Mastorakis [20] and Mastorakis et al. [21] discuss peerto-peer file sharing application designs and implementations that run on top of Named Data Networking (NDN). The security aspect is in the nature of the NDN architecture; however, this suggests cryptographically signing every packet in the network. NDN uses a distribution of data encryption keys as encrypted NDN data. Because it implements security at the protocol level, NDN offers good protection against negligence, in contrast to TCP/IP, where applications are responsible for security. Although NDN is considered to be the future of Internet [22], it is still at the stage of work in progress, and not yet ready for full production grade implementation.

C. CLOUD SECURED FILE STORAGE AND STREAMING
In their cryptographic protocol [23], Jaatun et al. present an approach that is similar to ours. They segment files among the redundant array of independent net-storages in the computing cloud. The main thrust of their solution is the distribution of data across different cloud providers. Thus, the individual data deposits do not expose enough information about the owner and the file to make them vulnerable. In addition, in order to return the file to the user, the data must be reassembled. In our approach, we similarly distribute file parts to many machines in the cloud; however, we do not set a specific constraint on the form and number of cloud providers; our approach can utilize physical computing machines, virtual machines, Docker containers from one or several providers, etc.
Miller et al. [24] propose several robust security schemes for distributed file systems. They use segmentation of files into file blocks, and file block encryption with asymmetric keys. Similar to [24], we split a file into segments and encrypt each segment with its own key. But we go beyond this, and propose a continuous re-encryption of file segments, with constantly changing keys. Moreover, we may constantly re-encrypt the symmetric keys that data segments are encrypted with. In our approach, re-encryption happens constantly on all cloud nodes at a preset file, computational, or cost limit.
In [25], Giuseppe et al. describe improved proxy re-encryption schemes for keys and apply them to secured distributed storage. We apply a similar approach in our solution, but to file segments, and not just keys. Furthermore, we re-encrypt continuously, regardless of reads and writes to storage. Cloud computing infrastructure prices drop each year; thus, such a re-encryption approach is feasible for use with millions of files.

D. DIGITAL RIGHTS MANAGEMENT
There are many practical DRM-like approaches that are widely used in cloud storage and transfer. These include ECFS [26], and others mentioned in the same paper. In DRM, a file is usually encrypted using a symmetric or asymmetric key or a key combination before it is stored or transferred. In order to access the file, the data consumer needs the key. When an attacker obtains the key by, for example, buying the protected content once, brute force, social engineering, etc., then the file can be used or redistributed infinitely. DRM methods are usually lightweight and can be functional without any need for intensive cloud computing power. From our perspective, DRM methods are too vulnerable by their nature (Sec. II-C).

E. VIDEO STREAMING
Numerous existing streaming approaches [27]- [31] work efficiently and consistently for video and music. Even though some of the protocols have consistency checks, they are not expected to deliver every single byte; insignificant data loss or delay caused by network problems is expected. However, this could be an issue for sensitive data, like CAD designs. For example, in the case of streaming designs to automatic manufacturing machines such as 3D printers or CNC mills, data transfer should be consistent and lossless: loss of a single byte while streaming is unacceptable, as this can lead to a AM machine malfunction or a defective product. At the same time, the streaming should be highly secured, which is not usually a requirement for media streaming protocols. In this paper, we show how to securely stream encrypted file segments directly from a highly secure distributed file storage.

F. 3D MODEL STREAMING
In [32], Lin et al. describe a method to encode 3D models into a JPEG stream in order to transfer 3D designs. However, the solution is not comprehensive and has clear limitations.
In prior research [33], we theoretically described live matrix as a paradigm applied to secured 3D content delivery.
Our prior work is purely theoretical, and so lacks technical details and a real implementation of the method. This paper's contribution is to extend the initial idea with the necessary details for implementation and to technically broaden it to any type of secure file storage and transfer. Furthermore, we describe a threat model and conduct a thorough security analysis. It is worth mentioning that we eliminate the transcoding of files for streaming introduced in a prior work [5], [34].
In previous work [5], [34], we have explained in detail the necessity for secured streaming of 3D files and discussed methods to enforce 3D file copyrights. Our previous approach targeted a small niche case to secure 3D design transfer to 3D printers. That solution is very machine code-centric and lacks a tight coupling with the secured storage. Furthermore, it is vulnerable at the point of extracting a 3D design from the storage and re-encoding it for streaming. In the current paper, we propose a much more secure and consistent end-to-end method to store and stream files-regardless of file typeand without the need to re-encode the file for streaming.

IV. PROPOSED APPROACH
Relying on the principles and paradigms described below, we describe a working solution for highly secure distributed file storage and transfer.

For the cryptosystem [35]
where E is an encryption function, e is an encryption key, D is a decryption function, d is a decryption key, and m is a message, if d = e, then we have symmetric encryption. However, if d does not equal e, we have a public-key or asymmetric-key cryptosystem. The main feature of this cryptosystem is that only knowledge of the static decryption key is required to decrypt the message. For unkeyed cryptographic hash function H , which is collision, second preimage, and preimage resistant [35] there is no such inverse function H -1 , and no key d, e to decrypt the hash h and get message m back: In other words, a key for a hash does not exist. Then, the only way to retrieve the original message is to hash all possible combinations and compare the hashes one by one. For example, if we know that the original message is five symbols from the ASCII table [36], given a strong cryptographic hash function [37] the only way to obtain the original message is to look the hash up in the table-the so-called brute force method [38]. To achieve this, we would need to create a hash table with 256 P 5 , a trillion elements, and then look up the original message by the hash. This makes a brute force attack impractical, requiring substantial computational power.
Our solution is based on the complexity of retrieving the original message by its hash. To make the methods work, the task of our solution is to keep the complexity of the potential message set within a certain threshold, so just enough computing power is available to perform the calculations required.
The solution relies on a logic similar to that behind RSA SecurID tokens [39]. In that case, the same function with the same cryptographic seed is running on both the RSA server and the token (a small piece of hardware with a battery) in a user's pocket. In order to log in to the system, the user must enter a username, password and the code from the RSA token. The code on the RSA token expires every minute, and a new code is generated and is shown on the screen. A minute later, when the code expires, there is no way to reuse the code. In the proposed solution, we do something similar, but by recalculating a hash table and parts of the file on a regular basis-for example, every minute. After a minute, another hash table is calculated to accommodate file parts; the previous hash table expires and is deleted from memory. The process iterates over and over again.
In an abstract way, the solution works like this: 1) A is the (finite) set of symbols from ASCII table; 2) S is the (finite) set of file segments; 3) Each file segment s is set to a fixed length of m bytes; 4) t is a time variable and k is cryptographic salt. 5) G is the (finite) set of permutations of A set members with sample size m, so that A P m ∈ G. 6) Sender and receiver side: for each member g of a set G, together with time t and salt k, we calculate a corresponding hash h g n ,t,k using hash function H . The hash is stored in a hash table T t along with the original member g.
7) Sender and receiver side: when time t is incremented, table T t expires at the moment t = t + t; Step 6 is repeated, and a new table T t+1 is calculated, so 9) Receiver side: When the hint arrives, the receiver challenges it. The receiver performs a lookup against the local version of table T using function L. If such an element is found, L returns a file segment s; otherwise, the value is undefined: VOLUME 8, 2020 10) The successfully received file is a set of hints positively challenged against the hash tables T t , T t+1 , . . . , T t+n . In step 6, on the AM machine side, the same hash table with the same potential elements of set G should be generated in advance, taking into account exactly the same timing and salt (like RSA SecurID tokens have the same time-based function running on the server and the hardware token).
In step 9, when a hint arrives on the AM machine side, we look it up in the current hash table, and retrieve (or do not retrieve) the corresponding file segment. We recreate a file from successfully found segments. This is not a decryption function in terms of 1, as there is no key as such in terms of that equation, and actual static bytes are not transferred in its terms: In the case of TLS/SSL, the actual encrypted bytes of the file are transferred. In our solution, only hints, which expire, are transferred. It is not possible to get a real byte of the file based on that hint a minute later. This is similar to the way in which an expired RSA SecurID code cannot be used.
In the next three sub-subsections we will explain important considerations about our approach.

1) SENDER AND RECEIVER SYNCHRONIZATION
Our approach is agnostic to synchronization method. Time t could be logical or physical time; in our experiments, we use physical time (UTC). Distributed machines can synchronize time against time servers. A minor change in time would not usually put the sender and receiver out of sync, unless the difference is larger than the live matrix state expiration time. For high latency networks or situations when time is slightly out of sync the live matrix expiration time could be increased so there is always a previous live matrix state available. At the same moment, there are two live matrices available-the current one and a previous one.

2) COMPUTATIONAL AND BANDWIDTH OVERHEAD
Overall, cipher text has a positive difference in length between encrypted text and plain text.
Our solution has a bigger positive overhead compared to well-known stream ciphers [44]- [50], block ciphers [51]- [54] and plain text in terms of computation and bandwidth.
Commonly, in stream ciphers [55] the cipher text length has an insignificant positive overhead compared to plain text. A key is used to generate a stream, which is then combined with the plain text to get the cipher text using an XOR operation. This does not significantly affect the amount of information transferred nor computation needed, as XOR is computationally inexpensive.
In block ciphers [56], padding is frequently added to plain text to make it equal to the block size, increasing the bandwidth overhead. Block ciphers also have computational overhead to encrypt each block compared to plain text.
Our solution has a parametric trade-off between computational complexity and security. By increasing the live matrix recalculation frequency, we increase the security level as well as the calculation complexity.
Our solution bandwidth overhead depends on the selected live matrix size and cryptographic hash function. The closer the number of bytes m to the output number of bytes of the hash function, the lower the bandwidth overhead. The recommended hash function output length should be close to the m, but not smaller than m. Overhead can be calculated as: The computational overhead of our approach is lower than that of block ciphers, and depending on the stream cipher algorithm, can be even smaller than stream ciphers. Hash function calculation is less expensive than AES and DES [57], [58]. Another possibility, which is highly application-dependent (e.g., not very practical for IoT and self-driving cars), is to scale hash calculation using a GPU and ASIC implementation of hash functions [59], [60]. However, block and stream encryption is difficult to implement using a GPU and ASIC-based approach.

3) CRYPTOGRAPHIC SALT
Salt k is not static. It changes with time, and could be an access code for a one-time manufacturing license, a PIN code, or part of a private key. Further, parameters other than k affect the setup; even if k is compromised, an attacker would still need to figure out the algorithmic setup and parameters.

B. PHYSICAL LIMITATIONS OF THE COMPUTATIONAL PROCESS
Total security does not exist. Breaking into any system is just a matter of the time and money required to exploit its weaknesses. Indeed, cloud computing itself processes huge amounts of data in parallel, a capability that can be used against attacks. However, the storage, network and computing power of the cloud have physical limits to writing and readings files, transferring files over the network, calculating hashes, and encrypting or decrypting information. The philosophy behind our solution is to set an attacker versus a computing cloud and leverage the physical limitations of the computational process [6] as security controls. Similar to proof-of-work blockchain consensus algorithms [61], we parameterize the solution based on the amount of available infrastructure. The more computational power used, the harder and more expensive it becomes to carry out an attack.
Henceforth, we consider a hacker as a human individual, a group of hackers with special tools, or an automated script or bot with sufficient computing power. A hacker can never know all parameters and exact details of our secured cloud implementation, and it will take a considerable amount of time to find and exploit these weaknesses. This could be mitigated with detective cybersecurity strategies. If the hacker is equipped with comparable computing power, then the physical limitations of the computational process come into play.
There is physical latency at all levels of hardware and software during computational processes. In order to reduce latency, computer L1 and L2 cache memory is located very close to the processor [62]. The more distant some resource is from the processor, the higher the latency. For example, a network interface is usually a main bottleneck for distributed systems [63]. The operating system limit of open ports and I/O descriptors in Linux can be a bottleneck [64]. Our approach is to use these limitations and bottlenecks and turn them into a defense strategy.

C. LIVE MATRIX
Live matrix is a multidimensional data structure in which the data is constantly changing state. The state may even change millions of times per time frame, t n , depending on the computing resources allocated. We refer to a different state during a certain time frame t 1 , t 2 , . . . , t n , as to the revision r 1 , r 2 , . . . , r n . The data in a live matrix are recalculated between revisions. The state of data in such a structure ideally changes more frequently and faster than the time it takes to extract the data from that structure. The period t n during which live matrix is changing its state is much smaller than the period of time t e needed to extract and store a single revision r i of the matrix, as this would be a constantly moving target (Fig. 3). The data in the matrix are only consistent within one revision and become obsolete between revisions; thus, timing is crucial. The whole live matrix structure or any extracted file segments represent an inconsistent revision r incons and quickly become obsolete. The matrix keeps multiple file segments, which reside in many locations of the matrix structure. These are encrypted and/or hashed using standard algorithms, e.g, AES256, 3DES, SHA-2, SHA-3, and located in the matrix at a certain index.
Taking into account the nature of the data to be hashed, i.e., the instructions for controlling the manufacturing machine, self-driving car, IoT or any other data, a specialpurpose hash function can be designed. Matrix vectors that are not accommodated by useful data can be populated with fake data-random data that resemble the original file.
Distributed cloud storage can consist of one or multiple nodes α, β, γ , etc. Each node may consist of one or multiple multidimensional matrices a 1 1 . . . a m,n , b 1,1 . . . b m,n , . . . , etc. (e.g., Fig. 4). The density of each matrix can be set from 0% to 100%. For example, if the density is 10%, then only this ratio of values are filled in with the real segments of files. The rest of the values are synthetically generated data or information very similar to the actual data. When the file is streamed from one location to another, the receiving location should also run a live matrix initialized with the matching encryption seed (based on time or other factors), and with a matching algorithmic setup. The stream comprises hashes of the file segment parts, hints. The actual information transferred in the stream is not the encrypted file parts, and there is no key to decrypt the streamed hints (unless not constantly changing its state matrix is considered a key). As soon as actual bytes of the file are no longer transferred, there is no key as such, and there is no function to decrypt hints (only to perform a look-up against the live matrix on the receiving end). We call this key-less, byte-less information transfer (Fig. 5).
When information is transferred to or from a non-cloud device (usually a data stream-consuming device with limited computing power, like a laptop, a manufacturing machine, a smart car, etc.) there is no physical possibility to keep more than a couple of revisions of live matrices on that side. It is thus impossible for such a device to decrypt the stream even thirty seconds later. This makes it impossible to ''replay'' the stream if an attacker records a fragment or even the whole stream.

D. PROACTIVE AND PASSIVE CLOUD NODES
A proactive cloud node is a cloud node or group of nodes that is autonomous to a certain extent. Proactive nodes do not expose any inbound TCP/UDP ports or APIs over a   local or public network. Proactive nodes are the initiators of any communication between proactive and passive cloud nodes. Passive nodes cannot initiate the communication with proactive nodes; they need to wait for a request from one of the proactive nodes. Passive nodes only reply over the local network to requests incoming from proactive nodes.
Proactive nodes are used to store file segment data, users' public keys, file segments, encryption keys, and streaming playbooks. Passive nodes store metadata and run jobs, e.g., stream data outside the cloud, or deliver data at the right time at the right place, like a manufacturing machine or a self-driving car. The differentiation between proactive and passive cloud node types supports an important principle of segmentation in data security. Moreover, proactive cloud nodes rely on detective controls mechanisms [65] to analyze activity, events, logs, history of node communication, etc. Proactive cloud nodes can implement basic through the most sophisticated detective control methods using artificial intelligence, honeypotting, intrusion detection systems, etc. [66], [67].

E. PROTOCOL
A simplified protocol is presented in Fig. 7, 8, 9 and 10. All communication between cloud nodes is encrypted, although this is not explicitly shown in the figures for the sake of clarity. Fig. 7 describes file upload by the user and secure storage of that file in the cloud. Fig. 8 and 9 describe storage maintenance over time and live matrix recalculation, respectively. There are two options to achieve the recalculation of storage: Fig. 8 depicts the use of a newly created set of keys, while Fig. 9 depicts utilization of the homomorphic properties of encryption methods. In the latter case, each file segment is recalculated by performing a homomorphic operation on a file segment. Thus, no additional key generation and exchange is necessary. The secured streaming protocol is depicted in Fig. 10.

V. IMPLEMENTATION
The highly secure distributed file storage and transfer solution setup (Fig. 6) consists of four types of nodes: a) The command and control node is responsible for storing command and control metadata. For example, whenever it is time to run a periodic job to re-encrypt file segments with a different set of keys, the file segments node and keys and playbook node communicate through this node b) The file segments node keeps the file segments in live matrices, performs a scheduled or on-demand recalculation of hashes or re-encryption of file segments, analyzes the behavior of the command and control node and distribution nodes, and makes corresponding decisions, for example, to support a streaming session initiated by the distribution node. Moreover, this node has controls that measure the speed of data consumption and compare it with realistic consumption rates. If the rate at which data are requested or consumed by the distribution node is faster than expected, an alarm state is triggered for a certain streaming session, or perhaps all sessions, depending on the setup and protection level desired c) The keys and playbook node is responsible for secure storage of keys and playbooks. Playbooks describe the sequence of segments in the file segments node. Without the right key, it is impossible to decrypt the file segment;  conversely, without the right playbook, it is virtually impossible to locate and extract the desired data from storage. Depending on the setup, the keys and playbook node can deliver the right keys at the right time to the right place (e.g., to an AM machine which has already received a secured stream from the distribution node has recreated the file segments from hints using a locally running live matrix, and now needs to decrypt the data from file segments to produce the part). In the alarm state, this node stops the streaming process and stops issuing keys d) The distribution node runs content distribution jobs to transfer files to external sources, like other secured clouds or AM machines or self-driving cars. This node isolates different streaming jobs, optimizes streaming speed based on data transfer rate, and performs data delivery checks in the stream. It also participates in the authorization scheme for external cloud and stream consumption devices. The setup can be extended, so each node type is a sub-cloud of multiple machines implementing distributed live matrices (Fig. 4).

VI. SECURITY EVALUATION
Our threat modeling and security analysis is based on several well-defined threat frameworks from Behl and Behl [68], [69], Behl [70], and Saripalli and Walters et al. [71]. The latter provides the list of ''Threat events compromising cloud security'' [71], which our distributed storage and transfer solution is intended to address. These are: a) Isolation failure: failure to effectively separate storage, memory, and routing causes isolation failure; b) Malicious insider at cloud provider: a cloud provider's employee maliciously alters or corrupts customer data; c) Intercepting data in transit: failure  in cryptographic techniques leads to data sniffing, spoofing and man-in-the-middle attacks during transit; d) Data Leakage on Up/Down: interception of data between the customer and the cloud provider leads to leakage of data to third parties; e) Loss of encryption keys: exposure of customer's secret keys to malicious parties.
We have evaluated the relevant threats and created a threat model, summarized, along with the corresponding mitigation, in Table 1. We have derived the most important attack vectors from our threat model and provide an analysis.
If an attacker is able to pose as an authorized user, he still cannot download the data unless he digitally signs and submits a transaction to stream data to a data consumer.
There is no single point of compromise. If an attacker is able to access one of the node types, he still won't be able to extract data. An attacker needs to get access to at least two different types of node to decrypt the data. File segment nodes, keys nodes, and playbook nodes are proactive and do not expose any TCP ports. Thus, there is no way for the attacker to log in to these nodes unless they get access to a virtual machine or physical hardware and scan memory to get the contents from the running application.
If an attacker is able to obtain a playbook file, it is still only one instance. The attacker will not have access to every modified instance of the playbook. Without continuous updates, the attacker will not have access to the data.
Even if an attacker captures the data stream during a streaming session, it rapidly becomes obsolete very soon, unless the attacker obtained a seed to start and run live matrix. During a single streaming session, the data are from  different live matrix revisions, so in order to decrypt the data, corresponding live matrix states should be obtained. This can be done by compromising the server side or the data consumer side. This is still difficult; for instance, if the attacker gains access to the data consumer side, then he needs to be present from the very beginning of the stream and record the low-level machine code as it is transmitted. The solution depends on the exact data consumer implementation. For example, in the case of holographic video streaming, 3D printing, and other types of AM, data that are already consumed must be disposed of just after consumption. Additionally, if the attacker obtains one full unencrypted sequence of machine code, then this sequence could be used on exact make and model of the manufacturing machine, which makes it harder to distribute and violate the copyright.
In our previous research [5], [33], [34], we assumed that data-once taken from some kind of secured storage-are decrypted and then encrypted with a different method for delivery to the data consumer. Then, the distribution node can also be a point of attack. In that case, the attacker can obtain a file or a stream on the server during re-encryption between storage and streaming. However, in the current approach, there is no need for transcoding the data.
In TLS file transfer, a certain key is used to encrypt data; if an attacker obtains the key, he can decrypt the file. Such keys are often reused by the services, or changed infrequently, making them vulnerable to collision and brute-forcing over time. In our solution, the complexity to brute-force the keys increases exponentially: the file is split into thousands of segments and the live matrix is constantly recalculated.
If an attacker obtains access to the data consumption device, he can receive file parts over a long period of time. There is no way to get all the files from the storage. Consequently, during one session, only one file can be obtained, and over a comparably long period of time. For example, producing a part using automated manufacturing can take days, and movies can last for hours. For an attacker acting this way, it would be inefficient time-wise to extract data from secured storage; this would not allow getting all the data from the secured cloud storage.  If an attacker starts to request more file parts within a shorter time-frame than a certain threshold, then the distribution node stops providing data. If such an attack is performed on a consuming device or a data channel, stopping the stream makes it impossible to get the rest of the file.
If an attacker carries out an attack on the secured cloud or a stream to a data consumer, secure cloud nodes collect the data and compare the metrics with those in configuration files. All abnormal activities, events, and logs can trigger an alarm state. A hacker would need to carry out a comprehensive analysis for a considerable period to figure out which changes in communication would cause an alarm state. By that time, the hacker would likely be detected, and mitigation procedures executed.

VII. PERFORMANCE EVALUATION
In our lab-level implementation, we used the distributed database Cassandra [72] to implement live matrices, the Apache Spark near real-time distributed scale data processing [73] with the Java programming language to implement operations over matrices, and Apache Kafka [74] to maintain a queue of service requests and streaming jobs.
We stored file segments in column families of four bytes each, encrypted with public keys, in Cassandra. For public/private key generation, we used elliptic curve type secp256k1 [75]. We used the last 20 bytes of the public key to uniquely index encrypted file segments in the Cassandra column families. We used Apache Spark to re-encrypt the file segments and recalculate new indexes in Cassandra every two minutes, then provided the updated version of the playbook to the keys and playbook node. For hashing, we used the Keccak-256 [37] hash function.
We used a local cloud of four bare-metal physical machines to run the software. Two machines (one for the file segments node and one for the distribution node) each had an 8x GPU AMD Radeon RX580 chipset with 8 GB GDDR5, i7 CPU, 16 GB RAM, and 128 GB SSD. The other two machines, used for the command and control node and the keys and playbook node, respectively, had an Intel Celeron processor, 4 GB RAM, and 32 GB SSD. We set the GPUs in a computing mode and flashed them with a modified firmware for higher hash rates. On average, each GPU was able to produce 31.5 Mhash/s; a few outstanding GPUs performed at 28.5 Mhash/s. We achieved an average hash-rate of 248 Mhash/s total on each of the machines equipped with 8x GPUs.
First, we tested secured streaming between two cloud machines. We performed secured streaming of the 20 MB file in the local network from the file segment node to the distribution node. We were able to recalculate hashes for the three-dimensional live matrix of 256 3 at an average of 14 revisions per second. In another test, we were able to re-calculate a bigger matrix of 4096 3 with an average of one revision every 5 minutes. Second, we carried out the test between a cloud node and a data consumer node. For this test, we needed one more machine. The external stream receiving side was a laptop with an i7 processor, 8 GB RAM, and 256 GB SSD, GPU AMD Radeon RX570 chipset with 2 GB RAM, intended to emulate a single user consuming the stream. The GPU of this machine was able to produce 18 MHash/s. For this test, we needed to use only one GPU on the distribution node, with timing matching the calculation speed of the receiving machine. We were able to recalculate hashes for the three-dimensional live matrix of 256 3 states at an average rate of one per second. In another test, we performed a streaming session on the bigger live matrix of 4096 3 , and we were able to calculate a new state on average every 64 minutes. The results are reflected in Table 2.
We performed additional tests with different file sizes: 41 MB, 119 MB, 583 MB, and 1.1 GB. The results showed a linear dependency for overhead traffic and overhead server costs. In future research, we will seek to reduce overhead traffic.
The tests showed that overhead increases with smaller matrix sizes. This is a result of change in the matrix size to hash function output ratio in bytes. We recommend the use of proven SHA cryptographic functions, even if this creates a bigger overhead. If minimizing bandwidth is important, then hash functions with a smaller length output should be selected.
We also confirmed that the cloud can adapt to the computing power capacity available on the consumer's end and produce a stream that could be consumed with less computing power. With the increase in computing power needed to calculate live matrix revisions, the power needed by a hacker to try to decode the stream would increase exponentially. Our results show that catching such a stream would be an ever-moving target. Even in the case of success, the information would become obsolete very quickly, making it hard to carry out any analysis to decrypt the file being transferred.

VIII. CONCLUSION
In this paper, we described and evaluated an approach that leverages the physical limitations of the computational process into a defense strategy to make cloud file storage and transfer highly secure. The method was designed to fulfill multiple important requirements for the use cases we discussed. The data transfer is lossless, so this method will work not only for delivering machine code to manufacturing machines, but for many other applications, including audio and video streaming. The most notable features of our approach are: a) The solution is tightly coupled with secured storage, so there is no need for re-encryption in order to stream to remote data consumers, like other clouds or AM machines; b) By its nature, this solution keeps the data in partitions, and streaming also implies partition tolerance on file transfer. The data are segmented, and there are security controls based on the physical limitations of the computational process-it is not physically possible to extract and consume all the data within a reasonable time-frame; c) If multiple machines are used for each type of node, then there is no single point of failure in case of intrusion or fault; d) It may be used for peer-to-peer information transfer, though this requires the live matrix engine be installed on the peer machines participating in the information transfer; e) The solution can send bi-directional streams; on the receiving side, live matrix could be used for the reverse stream, for example, to transmit telemetry or interactive feedback.
In future work, we will concentrate on data storage fault tolerance mechanisms and intelligent adaptiveness to available computing power and network bandwidth.