Digital Twins Composition via Markov Decision Processes
Master of Science in Engineering in Computer Science
The use of Digital Twins is key in Industry 4.0, in the Industrial Internet of Things, engineering, and manufacturing business space. For this reason, they are becoming of particular interest for different fields in Artificial Intelligence (AI) and Computer Science (CS). In this work, we focus on the orchestration of Digital Twins. We manage this orchestration using Markov Decision Processes (MDP), given a specification of the behaviour of the target service, to build a controller, known as an orchestrator, that uses existing stochastic services to satisfy the requirements of the target service. The solution to this MDP induces an orchestrator that coincides with the exact solution if a composition exists. Otherwise, it provides an approximate solution that maximizes the expected discounted sum of values of user requests that can be serviced. We formalize stochastic service composition and we present a proof-of-concept implementation, and we discuss a case study in an Industry 4.0 scenario.
Keywords: Service Composition The Roman Model Digital Twins Industry 4.0 Smart Manufacturing
Introduction
The continuous evolution of technologies in the fields of communication, networking, storage and computing, applied to the more traditional world of industrial automation, in order to increase productivity and quality, to ease workers’ lives, and to define new business opportunities, has created the so-called smart manufacturing, or Industry 4.0. Digital Twins (DTs) are up-to-date digital descriptions of physical objects and their operating status. Modern information systems and industrial machines may natively come out with their digital twin; in other cases especially when the approach is applied to already established factories and production processes, digital twins are obtained by wrapping actors that are already in place. The main goal is to establish a tight integration between the physical world and the virtual world, in order to make production more efficient, reliable, flexible and faster. The Digital Twin is an ideal tool to accomplish the purpose of Industry 4.0, since it enables massive exchange of data that can be interpreted by analytical tools, in order to improve decision making.
Inspired by the research about automatic orchestration and
composition of software artifacts, such as Web services,
in (Catarci et al., 2019)
Nevertheless, there is an inherent limitation of approach based on the classical Roman model, which is the assumption that the available services, i.e. the services that can be used to realize the target service, behave deterministically. This assumption is often unrealistic, because in practice the underlying physical system modeled as a set of services might show non-deterministic behaviour due to the complexity of the domain, or due to an inherent uncertainty on the dynamics of such system. In these cases, the deterministic service model is not expressive enough to capture crucial facets of the system being modelled.
Moreover, the above-mentioned techniques work only when the target is fully realizable, i.e.
the specification can either be satisfied or not, with no middle ground.
In the context of Industry 4.0 this might be seldom the case, and instead it would be preferred a technique that, rather than returning no answer,
returns the “best-possible” solution under the actual circumstances.
The work (Brafman et al., 2017)
In this paper, we marry the vision of employing service composition techniques to orchestrate digital twins. We propose a generalization to
the service composition in stochastic setting proposed in (Brafman et al., 2017)
Preliminaries
MDPs.
A Markov Decision Process (MDP) $\mathcal{M} =
\langle S,A,T,R\rangle$ contains a set $S$ of states, a set $A$ of actions, a
transition function $T : S\times A \to Prob(S)$ that returns for
every state $s$ and action $a$ a distribution over the next state,
and a reward function $R : S\times A\to \mathbb{R}$
that specifies the reward (a real value) received by the agent when transitioning
from state $s$ to state $s$’ by applying action $a$. A
solution to an MDP is a function, called a policy, assigning
an action to each state, possibly with a dependency on past
states and actions. The value of a policy $\rho$ at state $s$, denoted
$v_\rho(s)$, is the expected sum of (possibly discounted by a factor $\lambda$, with $0\leq \lambda < 1$) rewards when starting at state $s$ and
selecting actions based on $\rho$. Typically, the MDP is assumed
to start in an initial state $s_0$, so policy optimality is evaluated
w.r.t. $v_\rho(s_0)$. Every MDP has an optimal policy $\rho^*$. In discounted cumulative settings, there exists an optimal policy
that is Markovian $\rho : S \to A$, i.e., $\rho$ depends only on the
current state, and deterministic (Puterman et al., 1994)
The Roman Model in stochastic settings.
The problem of service composition,
i.e. the ability to generate
new, more useful services from existing ones, has been considered in the literature for over a decade (Hull et al., 2008) (Medjahed et al., 2011) (DeGiacomo et al., 2014)
Unfortunately, it is not always possible to synthesize a service that fully conforms with the
requirement specification. This zero-one situation, where we can either synthesize a perfect
solution or fail, often is not very applicable. Rather than returning no answer, we may want
notion of the “best-possible” solution. A model with this notion has been developed in (Brafman et al., 2017)
Problem
Before stating the problem, we give preliminary definitions. A stochastic service is a tuple $\tilde{\mathcal{S}} = \langle \Sigma_s, \sigma_{s0}, F_s, A, P_s, R_s\rangle$, where $\Sigma_s$ is the finite set of service states, $\sigma_{s0}\in \Sigma$ is the initial state, $F_s\subseteq\Sigma_s$ is the set of the service’s final state, $A$ is the finite set of services’ actions, $P_s: \Sigma_s\times A \to Prob(\Sigma_s)$ is the transition function, and $R_s : \Sigma_s\times A\to \mathbb{R}$ is the reward function. In short words, the stochastic service is the stochastic variant of the service defined in the classical Roman model, and it can be seen as an MDP itself.
A target service, as defined in (Brafman et al., 2017)
A stochastic system service $\tilde{\mathcal{Z}}$ of a community of stochastic services $\tilde{\mathcal{C}}$={$\tilde{\mathcal{S_1}}$, . . . ,$\tilde{\mathcal{S_n}}$} is a stochastic service where $\tilde{\mathcal{Z}} = \langle \Sigma_z, \sigma_{z0}, F_z, A, P_z, R_z\rangle$ are defined as follows: $\Sigma_z = \Sigma_1 \times \cdots \times \Sigma_n$, $\sigma_{z_0} = (\sigma_{10}$, . . . , $\sigma_{n0})$, $F_z$ = {($\sigma_1$, . . . , $\sigma_n) \mid \sigma_i\in F_i, 1\le i \le n$}, $A_z = A \times$ {1, . . . , $n$} is the set of pairs $(a, i)$ formed by a shared action $a$ and the index $i$ of the service that executes it, $P_z$($\sigma$’ $\mid$ $\sigma$, $(a, i))$ = $P$($\sigma_i$’ $\mid$ $\sigma_i$, $a)$, for $\sigma$ = ($\sigma_1$, . . . , $\sigma_n)$, $\sigma$’ = ($\sigma_1$’, . . . ,$\sigma_n$’$)$ and $a\in A_i(\sigma_i)$, with $\sigma_i\in\Sigma_i$ and $\sigma_j$ = $\sigma$’$_j$ for $j\neq i$, $R_z(\sigma, (a, i)) = R_i(\sigma_i, a)$ for $\sigma \in \Sigma_z$, $a\in A_i(\sigma_i)$.
We define the set of joint histories of the target and the system service as $H_{t,z} = \Sigma_t\times\Sigma_z \times (A\times \Sigma_t\times\Sigma_z)^*$. A joint history $h_{t,z}=\sigma_{t,0}\sigma_{z,0}a_1\sigma_{t,1}\sigma_{z,1}a_2$ . . . is an element of $H_{t,z}$. The projection of $h_{t,z}$ over the target (system) actions is $\pi_t(h_{t,z}) = h_t$ ($\pi_z(h_{t,z}) = h_z$).
An orchestrator $\gamma: \Sigma_t\times\Sigma_z\times A \to$ {$1$, . . . , $n$}, is a mapping from a state of the target-system service and user action $(\sigma_t, \sigma_z, a)\in \Sigma_t\times\Sigma_z\times A$ to the index $j\in$ {$1$, . . . ,$n$} of the service that must handle it. Crucially, since the stochasticity comes also from the services, the orchestrator does affect the probability of an history $h_{t,z}$. Moreover, in general, there are several system histories associated to a given target history.
Let $P_\gamma(h) = \prod_{i=0}^{|h|} P_t \big(\sigma_{t,i}, a_{i+1}\big) P_z\big(\sigma_{z,i+1}\mid \sigma_{z,i},\langle a_{i+1},\gamma(\sigma_{t,i}, \sigma_{z,i}, a_{i+1})\rangle)\big)$ be the probability of a (joint) history $h = \sigma_{t0}\sigma_{z0} \langle a_1,j_1 \rangle \sigma_{t1}\sigma_{z1} \langle a_2,j_2 \rangle$ . . . under orchestrator $\gamma$.
Intuitively, at every step, we take into account the probability, determined by $P_t$, that the user does action $a_{i+1}$ in the target state $\sigma_{t,i}$, in conjunction with the probability, determined by $P_z$, that the system service does the transition $\sigma_{z,i}\xrightarrow{(a_{i+1}, j)} \sigma$’$z, i+1$, where $j$ is the choice of the orchestrator at step $i$ under orchestrator $\gamma$, i.e. $j = \gamma(\sigma_{t,i}, \sigma_{z,i}, a_{i+1})$.
The value of a joint history under orchestrator $\gamma$ is the sum of discounted rewards, both from the target and the system services: $v_\gamma(h) = \sum\limits_{i=0}^{|h|} \lambda^i \bigg( R_t\big(\sigma_{t,i}, a_{i+1}\big) + R_z\big( \sigma_{z,i}, \langle a_{i+1}, \gamma(\sigma_{t,i}, \sigma_{z,i}, a_{i+1})\rangle) \big) \bigg)$
Intuitively, we take into account both the reward that comes from the execution of action $a_{i+1}$ in the target service, but also the reward associated to the execution of that action in service $j$ chosen by orchestrator $\gamma$.
Now we can define the expected value of an orchestrator to be: $v(\gamma)$ = $\mathbb{E}$$h$$t,z$∼$P$$\gamma$ $\big[v_\gamma(h_{t,z})$ • realizable $(\gamma, \pi_t(h_{t,z})) \big]$ where $realizable(\gamma,\pi_t(h_{t,z}))$ is 1 if $h_t = \pi_t(h_{t,z})$ is realizable in $\gamma$ (i.e. all the possible target histories are processed correctly), and $0$ otherwise. That is, $v(\gamma)$ is the expected value of histories realizable in $\gamma$. Finally, we define an optimal orchestrator to be $\gamma$ = $\arg$ max$\gamma$’ $v(\gamma$’$)$.
It can be shown that, under certain assumptions (i.e. target is realizable, every history has strictly positive value, and the target’s rewards are always greater than services’ rewards), optimality of the orchestrator implies that the target is realized by the orchestrator.
Solution technique
The solution technique is based on finding an optimal policy for the composition MDP. The composition MDP is a function of the system service and the target service as follows: $\tilde{\mathcal{M}}(\tilde{\mathcal{Z}}, \tilde{\mathcal{T}}) = \langle S_{\tilde\mathcal{M}}, A_{\tilde\mathcal{M}}, T_{\tilde\mathcal{M}}, R_{\tilde\mathcal{M}} \rangle$, where $S_{\tilde\mathcal{M}}= \Sigma_{\tilde\mathcal{Z}}\times \Sigma_{\tilde\mathcal{T}} \times A \cup {s_{\mathcal{M0}}}$, $A_{\tilde\mathcal{M}}$={$a_{\mathcal{M0}},1,$ . . .,$n$}, $T_{\tilde\mathcal{M}}(s_{\mathcal{M0}},a_{\mathcal{M0}},(\sigma_{z0},\sigma_{t0},a)) = P_t(\sigma_{t0},a)$, $T_{\tilde\mathcal{M}}((\sigma_z,\sigma_t,a),i$,($\sigma$’$_z$, $\sigma$’$_t$, $a$’))= $P_t$($\sigma$’$_t$,$a$’) • $P_z$($\sigma_z$’$\mid \sigma_z, \langle a,i \rangle)$, if $P_z(\sigma$’$_z \mid \sigma_z, \langle a,i \rangle)>0$ and $\sigma_t \xrightarrow{a}\sigma$’$_t$ and $0$ otherwise, $R$$\tilde\mathcal{M}$$((\sigma_z,\sigma_t,a),i) = R_t(\sigma_t,a) + R_z(\sigma_z, \langle a, i \rangle)$, if $(a,i)\in A(\sigma_z)$ and $0$ otherwise.
This definition is pretty similar to the construction
proposed in (Brafman et al., 2017)
To summarize, given the specifications of the set of stochastic services and the target service, first compute the composition MDP, then find an optimal policy for it, and then deploy the policy in an orchestration setting and dispatch the request to the chosen service according to the computed policy.
Use case
Consider the following scenario: there is an industrial process of ceramics production in which a product must be processed sequentially in different ways. Each sub-task can be completed by a set of available services. The tasks to be carried out in order to complete the industrial process are: provisioning, moulding, drying, first baking, enamelling, painting, second baking and shipping. Such tasks can be accomplished by different types of machines or human workers. Each available service that can perform the task can be seen as finite state machines with a probability and a reward associated to each action. There could be multiple services for the same task, e.g. multiple version of a machine (new one and old one) and a human that can perform the task required, and so on.
When an available service is being assigned a task, this has a task cost in terms of time taken and resources needed for the completion of the operation on that specific service. Usually, in terms of task cost, machines are cheaper than human workers, because they can perform their task much faster. However, the machines have a certain probability to break when they perform their job. In such a case, the machine must be repaired as soon as the operation has been carried out, that incurs in a repair cost for that specific machine.
From the above description of the use case scenario, it is clear that the composition technique must be able to handle the stochasticity of the available services’ transitions, as well as their reward/cost. Indeed, an optimal orchestration depends on several parameters, like the task costs, the breaking probabilities and the repair costs, one for each candidate service for accomplishing a certain task. Therefore, it is not straightforward to determine a priori which service a certain task must be assigned to. For example, it might be the case that despite the task cost of a machine is low, its breaking probability might be high, and considering the repair cost it might let us to prefer a human worker for that task. We argue that our model can fit very well our use case. Indeed, we can reduce the problem to an instance of stochastic service composition suggested above in which a service can capture the task cost, the breaking probability, and the repair cost.
Conclusion
In this paper, we have proposed an extension to previous work on stochastic service composition, in which also the services are allowed to have stochastic behaviour and rewards on the state transitions. We formally specified the problem and proposed a solution based on a reduction to MDPs. Furthermore, we motivated the contribution by showing how it is well-suited for a realistic Industry 4.0 scenario. As a future work, we would like to investigate different improvements such as: the possibility of including exception handling, having separate rewards specifications for the target, employing high-level formalisms to express a non-Markovian reward (e.g. LTL), and to employ learning techniques to learn a model of the target behaviour from data.
luciana.silo12@gmail.com
silo@diag.uniroma1.it
Luciana Silo
luusi