# Uncertainty Propagation Toolbox#

The uncertainty_propagation module quantifies and propagates parametric uncertainties through optimization and simulation problem based on IDAES models. The module has two core features:

1. Given a parameter estimation model and data, calculate the least squares best fit parameter values and estimate their covariance.

2. Given a process model and the covariance for its parameters, estimate the variance of the optimal solution and the objective function.

Consider the optimization problem:

\begin{align*} \mathrm{minimize} ~~ & f(x,p) \\ \mathrm{s.t.} \quad ~ & c(x,p) = 0 \\ & x_{lb} \leq x \leq x_{ub} \end{align*}

Here $$x \in \mathbb{R}^{n\ \times\ 1}$$ are the decision variables, $$p \in \mathbb{R}^{m\ \times\ 1}$$ are the parameters, $$f(x,p):\ \mathbb{R}^{n\ \times\ 1}\ \times \mathbb{R}^{m\ \times\ 1} \rightarrow \mathbb{R}$$ is the objective function, $$c(x,p) = \{c_1(x,p), \ldots, c_k(x,p)\}\ :\ \mathbb{R}^{n\ \times\ 1}\ \times \mathbb{R}^{m\ \times\ 1} \rightarrow \mathbb{R}^{k\ \times\ 1}$$ are the constraints, and $$x_{lb}$$ and $$x_{ub}$$ are the lower and upper bounds, respectively.

Let $$x^*$$ represent the optimal solution given parameters $$p^*$$. In many process systems engineering problems, $$p^*$$ is estimated from data and has some uncertainty represented with covariance matrix $$\Sigma_p$$. This toolbox estimates the uncertainty in the optimal solution $$x^*$$ and objective function value $$f(x^*, p^*)$$ induced by uncertainty in $$p^*$$.

Based on a first-order error propagation formula, the variance of the objective function is:

$\text{Var}[f(x^*,p^*)] = \left(\frac{\partial f}{\partial p} + \frac{\partial f}{\partial x}\frac{\partial x}{\partial p} \right) \Sigma_p \left(\frac{\partial f}{\partial p} + \frac{\partial f}{\partial x}\frac{\partial x}{\partial p} \right)^T$

Likewise, the variance in constraint $$k$$ is:

$\text{Var}[c_k(x^*,p^*)] = \left(\frac{\partial c_k}{\partial p} + \frac{\partial c_k}{\partial x}\frac{\partial x}{\partial p} \right) \Sigma_p \left(\frac{\partial c_k}{\partial p} + \frac{\partial c_k}{\partial x}\frac{\partial x}{\partial p} \right)^T$

Note that $$\text{Var}[c_k(x^*,p^*)] \approx 0$$ because the constraints remain feasible for a small perturbation in $$p$$, i.e., $$c(x,p) = 0$$.

All gradients are calculated with k_aug [1]. More specifically, $$\frac{\partial f}{\partial p}, \frac{\partial f}{\partial x}, \frac{\partial c_1}{\partial p}, \frac{\partial c_1}{\partial x}, \ldots, \frac{\partial c_k}{\partial p}, \frac{\partial c_k}{\partial x}$$ evaluated at $$(x^*, p)$$ are computed via automatic differentiation whereas $$\frac{\partial x}{\partial p}$$ are computed via nonlinear programming sensitivity theory.

The covariance matrix $$\Sigma_p$$ is either user supplied or obtained via regression (with Pyomo.Parmest).

Dependencies

k_aug [1] is required to use uncertainty_propagation module. The k_aug solver executable is easily installed via the idaes get-extensions command.

## Basic Usage#

This toolbox has two core functions:

1. propagate_uncertainty: Given an IDAES (Pyomo) process model with parameters $$p$$ and covariance $$\Sigma_p$$, estimate $$\text{Var}[f(x^*,p)]$$.

2. quantify_propagate_uncertainty: Given an IDAES (Pyomo) regression model and data, first estimate parameters $$p$$ and covariance $$\Sigma_p$$. Then given a second IDAES (Pyomo) process model, estimate $$\text{Var}[f(x^*,p)]$$.

The following example shows the usage of quantify_propagate_uncertainty for a reaction kinetic problem from Rooney and Biegler [2]. Consider the mathematical model:

$y_{i}(\theta_1, \theta_2, t_i) = \theta_1 (1 - e^{-\theta_2 t_i}), \quad i = 1, ..., n$

Here $$y_i$$ is the concentration of the chemical product at time $$t_i$$ and $$p = (\theta_1, \theta_2)$$ are the fitted model parameters.

# Required imports
>>> from idaes.apps.uncertainty_propagation.uncertainties import quantify_propagate_uncertainty
>>> import pyomo.environ as *
>>> import pandas as pd

# Define rooney_biegler_model
>>> def rooney_biegler_model(data):
model = ConcreteModel()
model.asymptote = Var(initialize = 15)
model.rate_constant = Var(initialize = 0.5)

def response_rule(m, h):
expr = m.asymptote * (1 - exp(-m.rate_constant * h))
return expr
model.response_function = Expression(data.hour, rule = response_rule)

return model

# Define rooney_biegler_model_opt
>>> def rooney_biegler_model_opt(data):
model = ConcreteModel()
model.asymptote = Var(initialize = 15)
model.rate_constant = Var(initialize = 0.5)
model.obj = Objective(expr=model.asymptote*(1-exp(-model.rate_constant*10))
, sense=minimize)
return model

# Define data
>>> data = pd.DataFrame(data=[[1,8.3],[2,10.3],[3,19.0],[4,16.0],[5,15.6],[7,19.8]],
columns=['hour', 'y'])

# Define variable_name
>>> variable_name = ['asymptote', 'rate_constant']

# Define SSE_obj_function
>>> def SSE_obj_function(model, data):
expr = sum((data.y[i] - model.response_function[data.hour[i]])**2 for i in data.index)
return expr

# Run quantify_propagate_uncertainty
>>> results = quantify_propagate_uncertainty(rooney_biegler_model, rooney_biegler_model_opt,
data, variable_name, obj_function)


The Python function rooney_biegler_model generates a Pyomo regression model using the input Pandas DataFrame data. This model contains the attributes asymptote and rate_constant which are the parameters $$p$$ to be estimated by minimizing the sum of squared errors (SSE). The list variable_name contains strings with these attribute names.

Similarly, the Python function rooney_biegler_model_opt returns a concrete Pyomo model which is the process optimization problem. This specific example has no degrees of freedom once $$p$$ is specified; the objective function computes the product concentration at time $$t=10$$.

The function quantify_propagate_uncertainty returns the object results which contains several important attributes:

• results.theta_names contains the names of parameters $$p$$

• results.theta contains the estimate values for parameters $$p$$

• results.gradient_f contains the gradient $$\frac{\partial f}{\partial x},~ \frac{\partial f}{\partial p}$$

• results.gradient_c contains the Jacobians $$\frac{\partial c}{\partial x},~ \frac{\partial c}{\partial p}$$

• results.dsdp contains the Jacobians $$\frac{\partial x}{\partial p},~\frac{\partial p}{\partial p}$$

• results.propagation_f contains the estimate variance of the objective function

Important Notes:

1. The uncertain parameters $$p$$ should be declared as Var in Pyomo.

2. The uncertain parameters $$p$$ should not be fixed in Pyomo. Instead, set the upper and lower bounds equal. If they are fixed, k_aug` will give an error message that the optimization problem has too few degrees of freedom.

## Available Functions#

idaes.apps.uncertainty_propagation.uncertainties.quantify_propagate_uncertainty(model_function, model_uncertain, data, theta_names, obj_function=None, tee=False, diagnostic_mode=False, solver_options=None, covariance_n=None)[source]#

This function calculates error propagation of the objective function and constraints. The parmest uses ‘model_function’ to estimate uncertain parameters. The uncertain parameters in ‘model_uncertain’ are fixed with the estimated values. The function ‘quantify_propagate_uncertainty’ calculates error propagation of objective function and constraints in the ‘model_uncertain’.

The following terms are used to define the output dimensions: Ncon = number of constraints Nvar = number of variables (Nx + Ntheta) Nx = the number of decision (primal) variables Ntheta = number of uncertain parameters.

Parameters:
• model_function (function) – A python Function that generates an instance of the Pyomo model using ‘data’ as the input argument

• model_uncertain (function or Pyomo ConcreteModel) – Function is a python/ Function that generates an instance of the Pyomo model

• data (pandas DataFrame, list of dictionary, or list of json file names) – Data that is used to build an instance of the Pyomo model and build the objective function

• theta_names (list of strings) – List of Var names to estimate

• obj_function (function, optional) – Function used to formulate parameter estimation objective, generally sum of squared error between measurements and model variables, by default None

• tee (bool, optional) – Indicates that ef solver output should be teed, by default False

• diagnostic_mode (bool, optional) – If True, print diagnostics from the solver, by default False

• solver_options (dict, optional) – Provides options to the solver (also the name of an attribute), by default None

• covariance_n (int, optional) – Number of datapoints to use in the objective function to calculate the covariance matrix. If omitted, defaults to len(data)

Returns:

results object containing the all information including

• results.obj: float

Real number. Objective function value for the given obj_function

• results.theta: dict

Size Ntheta python dictionary. Estimated parameters

• results.theta_names: list

Size Ntheta list. Names of parameters

• results.cov: numpy.ndarray

Ntheta by Ntheta matrix. Covariance of theta

Length Nvar array. Gradient vector of the objective function with respect to the (decision variables, parameters) at the optimal solution

Ncon by Nvar size sparse matrix. Gradient vector of the constraints with respect to the (decision variables, parameters) at the optimal solution.

• results.dsdp: scipy.sparse.csr.csr_matrix

Ntheta by Nvar size sparse matrix. Gradient vector of the (decision variables, parameters) with respect to parameters (=theta_name). number of rows = len(theta_name), number of columns= len(col)

• results.propagation_c: numpy.ndarray

Length Ncon array. Error propagation in the constraints, dc/dp*cov_p*dc/dp + (dc/dx*dx/dp)*cov_p*(dc/dx*dx/dp)

• results.propagation_f: numpy.float64

Real number. Error propagation in the objective function, df/dp*cov_p*df/dp + (df/dx*dx/dp)*cov_p*(df/dx*dx/dp)

• results.col: list

Size Nvar. List of variable names. Note that variables names includes both decision variable and uncertain parameter names. The order can be mixed.

• results.row: list

Size Ncon+1. List of constraints and objective function names

Return type:

tuple

Raises:
• TypeError – When tee entry is not Boolean

• TypeError – When diagnostic_mode entry is not Boolean

• TypeError – When solver_options entry is not None and a Dictionary

• Warnings – When an element of theta_names includes a space

idaes.apps.uncertainty_propagation.uncertainties.propagate_uncertainty(model_uncertain, theta, cov, theta_names, tee=False, solver_options=None)[source]#

This function calculates gradient vector, expectation, and variance of the objective function and constraints of the model for given estimated optimal parameters and covariance matrix of parameters. It calculates error propagation of the objective function and constraints by using gradient vector and covariance matrix.

The following terms are used to define the output dimensions: Ncon = number of constraints Nvar = number of variables (Nx + Ntheta) Nx = the number of decision (primal) variables Np = number of uncertain parameters.

Parameters:
• model_uncertain (function or Pyomo ConcreteModel) – Function is a python/ Function that generates an instance of the Pyomo model

• theta (dict) – Size Ntheta python dictionary. Estimated parameters

• cov (numpy.ndarray) – Ntheta by Ntheta matrix. Covariance matrix of parameters

• theta_names (list of strings) – Size Ntheta. List of estimated l theta names

• tee (bool, optional) – Indicates that ef solver output should be teed, by default False

• solver_options (dict, optional) – Provides options to the solver (also the name of an attribute), by default None

Returns:

results object containing the all information including

Length Nvar array. Gradient vector of the objective function with respect to the (decision variables, parameters) at the optimal solution

Ncon by Nvar size sparse matrix. Gradient vector of the constraints with respect to the (decision variables, parameters) at the optimal solution.

• results.dsdp: scipy.sparse.csr.csr_matrix

Ntheta by Nvar size sparse matrix. Gradient vector of the (decision variables, parameters) with respect to parameters (=theta_name). number of rows = len(theta_name), number of columns= len(col)

• results.propagation_c: numpy.ndarray

Length Ncon array. Error propagation in the constraints, dc/dp*cov_p*dc/dp + (dc/dx*dx/dp)*cov_p*(dc/dx*dx/dp)

• results.propagation_f: numpy.float64

Real number. Error propagation in the objective function, df/dp*cov_p*df/dp + (df/dx*dx/dp)*cov_p*(df/dx*dx/dp)

• results.col: list

Size Nvar. List of variable names. Note that variables names includes both decision variable and uncertain parameter names. The order can be mixed.

• results.row: list

Size Ncon+1. List of constraints and objective function names

Return type:

tuple

Raises:

Exception – if model_uncertain is neither ‘ConcreteModel’ nor ‘function’.

idaes.apps.uncertainty_propagation.uncertainties.clean_variable_name(theta_names)[source]#

This function removes all ‘ and spaces in theta_names. Note that the current theta_est(calc_cov=True) of parmest in Pyomo doesn’t allow ‘ and spaces in the variable names. Once a future version of Parmest fixes this issue, this function can be depreciated.

Parameters:

theta_names (list of strings) – List of Var names

Returns:

• It returns the following variables

• - theta_names_out (list of strings) – List of Var names after removing all ‘ and spaces

• - var_dic (dict) – Dictionary with keys converted theta_names and values original theta_names

## Examples#

Two examples are provided to illustrate the detailed usage of uncertainty_propagation. In each case, a Jupyter notebook with explanations as well as an equivalent Python script is provided.

## References#

[1] David Thierry (2020). k_aug, dthierry/k_aug

[2] Rooney, W. C. and Biegler, L. T. (2001). Design for model parameter uncertainty using nonlinear confidence regions. AIChE Journal, 47(8), 1794-1804.