Synthetic Cells Extract Semantic Information From Their Environment

Based on a recently reported operative definition of semantic information, due to Kolchinsky and Wolpert (2018), here we present a research program based on the intersection between synthetic biology, molecular communication, and information theory. In particular, we first describe and emphasize the role of “synthetic cells” as new (bio)technological platform for theoretical and applied investigations; next we simulate a smart drug delivery scenario whereby synthetic cells extract semantic information from their environment (made of cancerous cells, which provide a signal molecule that triggers the production of a cytotoxic drug by the synthetic cell).

absent in the Shannon definition [1], [2]. The first approach to semantic information was made by Carnap and Bar-Hillel in 1952 [3]. Their theory aimed to calculate the amount of semantic information encoded by a sentence in a particular language. The application of such approach is restricted to study the language and unenforceable to physical scenarios. We recently commented on the possible role of MacKay theory of semantic information [4], [5], highlighting those approaches whereby systems (agents) are embedded in situations, and their internal states and/or performance, or even their existence, depend on the information exchanged with the environment. In this view, the recent proposal put forward by Kolchinsky and Wolpert (hereinafter, KW) seems particularly relevant to this subject [6]. Here we will briefly sketch a possible application of the KW proposal in synthetic biology, and in particular to bottom-up Synthetic Cells (SCs).

B. The Kolchinsky-Wolpert Approach
In 2018 KW proposed an elegant quantitative measure of semantic information received by any physical system influenced by an external environment [6]. According to the definition given by Rovelli in [7], KW defined semantic information as the information a physical system has about its environment that is causally necessary for the system to maintain its own existence over time. As such, semantic information is a subset of causal necessary syntactic information that the system needs to preserve its state, or existence. Below, we describe the KW model and report the most relevant formulas to evaluate semantic information (see [6] for the complete mathematical description of the problem).
To develop their model, i) KW considered a system X with state space X described by physical variables x, and an environment Y with state space Y described by physical variables y, subjected to a stochastic, discrete time, and first order Markovian (i.e., the probabilities of the state depends on the previous event) dynamics over time. For the sake of simplicity only discrete and finite state spaces were analyzed in [6] even if, as explicitly observed, also continuous state spaces can be considered. In their framework, KW assumed that at the initial time t = 0 a correlation exists between the system and the environment, which is described by a joint distribution p(x 0 , y 0 ). This correlation allows to compute the mutual information, measured in bits, between the system and the environment where x 0 and y 0 are the physical variables composing the state space X and Y, respectively, and p(x 0 ) and p(y 0 ) are the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ marginal distributions over the states of the system and of the environment, respectively, at the initial time t = 0. In [6] it was further assumed that the system and the environment are subject to coupled (possibly stochastic) dynamics until time t * , which defines the timescale of interest. Then ii), KW defined the degree of existence of the system as its ability to remain in a low entropy state. In agreement with the principles of statistical physics, to quantify such an ability KW made use of the viability (V) function, which is defined as the negative of the Shannon entropy S of the distribution p(x t ) over the states of system X at time t as Next iii), KW found a way to scramble the correlations between X and Y on the initial configuration (t = 0) by introducing a set of counter-factual partial interventions. Each partial intervention is induced by a particular coarse-graining function φ(y) that specifies the distinctions the system could make about the given fine-(non-coarse-) grained environment [8]. The role of the coarse-graining function is to map each fine-grained state in another possible state. Thus, an intervention modifies the distribution p(x 0 , y 0 ), and consequently also the initial mutual information. The intervened joint initial distribution induced by a given coarse-graining function φ(y) is given byp wherê p φ (x 0 |y 0 ) := p(x 0 |φ(y 0 )) = y 0 :φ(y 0 )=φ(y0) p x 0 , y 0 defines the conditional probability of the communication channel through the environment. The sum over y 0 means that the intervened function may map different states in only one and that the system cannot recognize the granularity of the states space of the environment. It is worth observing that the most destructive intervention, called "fully scrambled", leads to have the system and the environment fully uncorrelated and sop full (x 0 , y 0 ) = p(x 0 )p(y 0 ). In this particular case the initial mutual information is equal to zero since the system is not able to distinguish the states of the environment. Finally iv), the viability V and the mutual information I p pairs, estimated for all the intervened distributions, were compared to those of the actual (or non-intervened) distribution. The optimal intervention was defined by KW as the one achieving the same value of viability as the non-intervened case at the time of interest t while giving the smallest value of mutual information p opt (x 0 , y 0 ) = arg min where Φ defines the set of all possible coarse-grained functions. The stored semantic information is then defined as the initial mutual information under the optimal intervention This operation discriminates the minimum level of distinctions that the system requires about the environment to preserve its existence, thus the causal relation among all the existing relations between the two entities. Of note, the computation of the optimal intervention strictly depends on the time instant t in which the viability is computed.
To recapitulate, the amount of I p under the optimal intervention was defined by KW as the stored semantic information exchanged between the environment and the system since the syntactic information maintained in the optimal intervention is the only needed to preserve the state of existence of the system in the environment. The word stored is added by KW to distinguish the semantic information derived from the scrambling of the relation between environment and system at t = 0, and computed through the mutual information at t = 0, from the observed semantic information, which is the semantic information dynamically exchanged between environment and the system, and measured by KW through the use of transfer entropy. From now on, with semantic information we refer to the KW's optimal intervention.

C. Bottom-Up Synthetic Cells
Bottom-up SCs can be defined as man-made molecular systems partially resembling biological cells both in structure and function [9]. The bottom-up approach consists in the assembly of complex cell-like systems starting from simple components, such as proteins, nucleic acids, lipids, and sugars, into environmental-responsive structures. The resulting SCs are not biological cells manipulated ad hoc, but bio-chemical systems built from scratch. The bottom-up technology allows the construction of non-trivial SCs, but despite these advancements these systems are not yet alive. They should be best considered as a sort of molecular machines rather than minimal organisms [10]. Several cytomimetic functions have been implemented in SCs: sending and receiving chemical signals [11], producing chemical energy (ATP) [12], allowing nutrients enter through large pores [13], and producing proteins [14].
SCs represent an important new synthetic biology tool with intriguing roles in applied research. One of the most obvious applications refers to their potential use as "smart" agents for drug/protein/gene delivery [15]. In this context, SCs can be described as programmable vehicles that interact with a biological milieu. During the "travel" toward the target location, and in its nearby, SCs are supposed to engage chemical communication and perception in order to control their own internal operations -for example the production of a therapeutic compound [16], its release, or even their own self-destruction [15]. Hypothetical SCs capable of behaving in this manner have been scheduled in the synthetic biology agenda, but they are still too complex for the current standards. These limitations, however, do not prevent investigations based on modeling, as the one discussed in this paper. Moreover, numerical simulations can guide future experimental design by predicting how SCs respond to environmental variables.

D. Plan of the Work
The scenarios we are going to discuss embrace semantic information theories and the potential interest in SCs as smart drug delivery agents. We will describe SCs as platforms to investigate semantic information, with the long-term goal of placing side-by-side simulations and experiments. This study can be considered as one of the first attempts to introduce and to find an application for the concept of semantic information in systems where the transmission of information is based on molecules, i.e., molecular communication-based systems. We present preliminary numerical results referred to a simple but effective SC model inspired by (but not identical to) the KW elegant theory. Convinced that the possible connection between SCs and semantic information is very intriguing, we will explore it deeply in future investigations, while this initial report aims at stimulating discussion in the field and attract attention of experimentalists and theoreticians.

II. SEMANTIC INFORMATION IN SC SCENARIOS
We devised a scenario that allows the application of KW approach to an SC characterized by a specific internal behavior. The SC acts as an agent (X ) embedded in an environment (Y). SCs can be engineered to complete specific tasks that could be therapeutically exploited, for instance. In the case we considered, SC has the aim to interact with chemicals hypothetically produced by Cancerous Cells (CCs) present in the environment, and to respond by establishing a dependent dynamics that end with the production by SC of toxin molecules that kill the CCs (Fig. 1). The most important aspects to be considered for the identification of semantic information in models are i) the choice of an agent whose behaviour can be affected by observing the environment, ii) the establishing of the dynamics, and iii) the determination of a measurement for the discrimination of meaningful from meaningless information, thus the choice of the KW's V. With reference to the very last point, KW relate the concept of viability to the ability of the agent to remain far from the equilibrium state. In this specific case, we consider an SC in an equilibrium state when unable to perform the task for which it is designed. Accordingly, for our purposes it is convenient to define an SC "viable" or "dead", respectively, when it is or it is not able to release toxin molecules to kill CCs. The next subsections describes the SC we devised and simulated. The choice of simulated data is purely arbitrary, however the same model could be generalized.

A. Environment
The environment Y is described by a single state variable Y S = {0, . . . , 5}, which defines the level of signaling (S) molecules released in the environment by the CCs. The value Y S = 0 corresponds to the absence of signal in the environment, which means that CCs are not present or the signalling molecules have been degraded (see Section II-C). Graphical representation of the simulated scenario. A Synthetic Cell continuously releases toxin molecules, whose production-rate changes according to the amount of signaling molecules in the environment.

B. SC Structure
The SC behavior includes four operations: perception of the S levels in the environment; internalization of S; de novo production of toxin molecules; and release of toxin molecules. Accordingly, the SC state space X consists of three separate degrees of freedom, indicated as X = X S in × X Sper × X ptox . Specifically, X Sper = {0, . . . , 5} is the variable associated with the perception of S in the outer environment, and biologically represents the "activation" of receptors of S -those that lie on the SC membrane and detect signals from the environment. For simplicity, we assume that X Sper is always equal to Y S , (perfectly responding receptors). X S in = {0, . . . , 5} is the amount of S internalized by SCs that is directly related to X Sper as will be described in Section II-C. Finally, X ptox = {0, . . . , 8} is the amount of toxin molecules inside the SC. These molecules are supposed to be membrane permeable so that they are passively released by SCs in the environment (X ptox decreases).

C. SC Dynamics in the Environment
By analogy with traditional drug delivery systems, in our model we suppose that SCs already have a toxin payload at t = 0. Moreover, as a smart drug delivery agent, SCs display a behavior that is affected (controlled) by the presence of CCs via their chemical exudates. Specifically, when exposed to CCsecreted S molecules, SCs start a de novo production of toxin thanks to internal molecular machineries (X ptox increases). Such internal dynamics increases the possibility that CCs die because of the augmented overall exposure of CCs to the toxin. As mentioned, as far as X ptox = 0, an SC constantly releases toxins in the environment also in absence of CCs. This last state, related to SC death, brings to a high entropy that is associated with the reduction of system viability (please see Section I-B for the computation of viability). The imposed model dynamics are such that: 1) at t = 0 the amount of S in the environment is uniformly distributed over 1, . . . , 5 (i.e., S cannot be absent in the system); the SC has perfect information about the amount of S in the environment (i.e., X Sper = Y S ); X S in = 0; X ptox = 5; 2) the level of Y S remains constant up to t * unless i) S is degraded, ii) S is transported away (e.g., by the flux in a blood vessel), or iii) the information flow between environment and SC is interrupted as in case of the degradation of the receptors on the SC membrane: these phenomena bring Y S = 0, and can occur with a small fixed probability equal to 0.1 at each step; 3) X S in increases of 1 unit per time step as long as X Sper > X S in ; it corresponds to the internalization of S; 4) the toxin, if present in the SC, is continuously released in time with a fixed rate equal to 1 level per time step; 5) the toxin is produced de novo according to the level of S internalized: this operation relates the response of SC to the presence of CCs; when S is present, it leads to an increase of the number of toxin molecules in the SC, and, hence, the number of molecules released in the environment with a fixed rate of 1 level per time step, thus the overall exposure of CCs to toxin; 6) SC produces instantaneously the toxin in an amount equal to Y S if Y S ,X S in ≤ 1, hence if enough S molecules have been internalized with respect to what is present in the environment; 7) once the toxin is produced, we assume that the external molecules of S in the environment cannot affect again the SC, so Y S = X Sper = 0; 8) when X ptox = 0, the SC enter in the above-defined "dead" state because it cannot accomplish its task, and it is attributed a value of internal entropy equal to 100 bits: this corresponds to the KW equilibrium macrostate, with a large entropy [6]; 9) simulations were carried out for t * = 14 time steps and the viability in (2) was evaluated at t = 6. Of note, the threshold values, the number of levels for each variable, and the simulation time are purely arbitrary, since this kind of SC still does not exist. However, when experimental data will be available, it would be possible to implement this model to compare the simulation results with the reality.

III. RESULTS
As mentioned, given the internal dynamics of an SC, the KW approach was used to compute the semantic information exchanged between environment and SC. Specifically, a coarse-graining function grouping the possible states of Y s determined all the possible interventions [6]. Next, for all the intervened and actual distributions, the mutual information I p and the V function were computed. Figure 2 shows simulation results for the SC scenario described in Section II. Specifically, Fig. 2a shows the viability vs time curve for the actual distribution and for the fully scrambled intervened distribution. The distribution p(x t ) used in (2) to evaluate the viability was obtained by marginalization with respect to x t of the multivariate joint probability p(x t , y t , . . . , x 0 , y 0 ) = t k =1 p(x k , y k |x k −1 , y k −1 ), which descends from the first order Markovianity of the problem. Each term in the product was computed as p( is the response of the system to the previous state of itself and of the environment, while p(y t |x t , x t−1 , y t−1 ) is the response of the environment to the previous state of itself and the system, as well as the current state of the system. The vertical dashed line corresponds to the time instant t = 6. Of note, when approaching the analysis of semantic information according to the method of KW, particular attention must be given to the timescale of the analysis because of the time-dependency of V: the timescale must be chosen according to i) the experimental data, ii) the SC functioning and goal, and iii) the expected behavior of the SC to maximize the computation of semantic information content. Even if at the end of the dynamics the viability values for the actual and the intervened distribution reach the same result, given the fact that independently from connections with the environment X ptox = 0 at the end of the simulation because of the constant release of the toxin molecules, we can observe a decrease of the viability of the intervened case with respect to the actual case. This finding is coherent with KW results, and with what was expected by drawing the model: the coarse graining at t = 0 decreases the ability of the SC of interpreting the information transmitted by the environment (i.e., the presence of the S), which can increase the production of the toxin, hence the viability of the SC. The decrease of viability by a coarse-graining is a hint of a premature production of X ptox that leads to a dead macrostate. Although this fact seems to be positive, it is not since we want a toxin production coherent with the presence of S. In Fig. 2b, the information vs viability curve shows the value of semantic information of the SC in the environment that is I S = 1.92 bits with respect to the total mutual information equal to I p = 2.32 bits. The value corresponds to the mutual information obtained for two optimal interventions, which are given by the coarse graining of S levels {1, 2} and {0, 1, 2}, respectively. The presence of two optimal interventions is given by the nature of the level S = 0, which is related to its absence in the environment, and impossible at t = 0. For this reason, the level S = 0 does not carry both mutual information or viability, thus we can consider as optimal intervention the coarse graining of S level {1, 2}. The shown results give some hints about the dynamics of the SC and how it perceives the environmental variables. The SC here described cannot distinguish the environmental levels (or concentrations) of S between {1, 2} because of the dynamics of the system. In particular, since the SC can produce the toxin if Y S − X S in ≤ 1, the two S levels of the optimal intervention lead to the production of the toxin at the same time step. This means that not all the information extracted by the SC from the environment is semantic, thus part of it does not affect SC behavior. Other results could be obtained by changing the behavior of the SC, for example by inducing the releasing of toxin in presence of the S (data not shown for lack of space).
Although simplistic, this model could be regarded as a first step towards the implementation of a predictive model to assess the behavior of SC in response to variation of environmental parameters. The final aim is to i) use realistic data describing the behavior of SC to implement the model, ii) predict the adaptation of SC to environmental variables and, hence, iii) determine the amount of semantic information exchanged by SC and environment to determine the sensitivity of SC for accomplishing the goal for which it was designed, in this case, killing CCs.

IV. CONCLUDING REMARKS
In this letter we have shown how to quantify KW semantic information in an SC-based scenario. We suggest that the development of this model (and similar ones) will further enrich the fields of synthetic biology and molecular communications (and their fecund intersection), fostering future theoretical and experimental developments. In our simple yet informative model, an SC is situated in an environment where it engages chemical communicative dynamics with cancerous cells, and the amount of information that is causally necessary to complete its task has been computed. The basic steps of the procedure (that we intend to optimize in next studies) include the definition of an SC mechanism, an environment in which the SC is situated, a viability function, and the timescale of the simulated dynamics. In particular, we intend to investigate more complex and realistic models, and explore alternative ways to compute SC viability for task-guided systems.