Generating Polynomial Models with PySMO¶
The pysmo.polynomial_regression method learns polynomial models from data. Presented with a small number of samples generated from experiments or computer simulations, the approach determines the most accurate polynomial approximation by comparing the accuracy and performance of polynomials of different orders and basis function forms.
pysmo.polynomial_regression considers three types of basis functions
 univariate polynomials,
 seconddegree bivariate pilynomials, and
 userspecified basis functions.
Thus, for a problem with \(m\) sample points and \(n\) input variables, the resulting polynomial is of the form
Basic Usage¶
To generate a polynomial model with PySMO, the pysmo.polynomial_regression class is first initialized,
and then the method poly_training
is called on the initialized object:
# Required imports
>>> from idaes.surrogates.pysmo import polynomial_regression
>>> import pandas as pd
# Load dataset from a csv file
>>> xy_data = pd.read_csv('data.csv', header=None, index_col=0)
# Initialize the PolynomialRegression class, extract the list of features and train the model
>>> pr_init = polynomial_regression.PolynomialRegression(xy_data, xy_data, maximum_polynomial_order=3, *kwargs)
>>> features = pr_init.get_feature_vector()
>>> polyfit = pr_init.poly_training()
 xy_data is a twodimensional python data structure containing the input and output training data. The output values MUST be in the last column.
 maximum_polynomial_order refers to the maximum polynomial order to be considered when training the surrogate.
Optional Arguments
 multinomials  boolean option which determines whether bivariate terms are considered in polynomial generation.
 training_split  option which determines fraction of training data to be used for training (the rest will be for testing). Default is 0.8.
 number_of_crossvalidations  Number of crossvalidations during training. Default number is 3.
pysmo.polynomial_regression Output¶
The result of the pysmo.polynomial_regression
method (polyfit in above example) is a python object containing information
about the optimal polynomial order, the polynomial coefficients and different error and qualityoffit metrics such as
the meansquarederror (MSE) and the \(R^{2}\) coefficientoffit. A Pyomo expression can be generated from the
object simply passing a list of variables into the function generate_expression:
# Create a python list from the headers of the dataset supplied for training
>>> list_vars = []
>>> for i in features.keys():
>>> list_vars.append(features[i])
# Pass list to generate_expression function to obtain a Pyomo expression as output
>>> print(polyfit.generate_expression(list_vars))
Prediction with pysmo.polynomial_regression models¶
Once a polynomial model has been trained, predictions for values at previously unsampled points :math:x_unsampled can be evaluated by calling the
poly_predict_output()
method on the resulting model object and the unsampled points:
# Create a python list from the headers of the dataset supplied for training
>>> y_unsampled = pr_init.poly_predict_output(rbf_fit, x_unsampled)
Flowsheet Integration¶
The result of the polynomial training process can be passed directly into a process flowsheet as an objective or a constraint. The following code snippet demonstrates how an output polynomial model may be integrated directly into a Pyomo flowsheet as an objective:
# Required imports
>>> import pyomo.environ as pyo
>>> from idaes.surrogates.pysmo import polynomial_regression
>>> import pandas as pd
# Create a Pyomo model
>>> m = pyo.ConcreteModel()
>>> i = pyo.Set(initialize=[1, 2])
# Create a Pyomo variable with indexed by the 2Dset i with initial values {0, 0}
>>> init_x = {1: 0, 2: 0}
>>> def x_init(m, i):
>>> return (init_x[i])
>>> m.x = pyo.Var(i, initialize=x_init)
# Train a simple polynomial model on data available in csv format, resulting in the Python object polyfit
>>> xy_data = pd.read_csv('data.csv', header=None, index_col=0)
>>> pr_init = polynomial_regression.PolynomialRegression(xy_data, xy_data, maximum_polynomial_order=3)
>>> features = pr_init.get_feature_vector()
>>> polyfit = pr_init.poly_training()
# Use the resulting polynomial as an objective, passing in the Pyomo variable x
>>> m.obj = pyo.Objective(expr=polyfit.generate_expression([m.x[1], m.x[2]]))
# Solve the model
>>> instance = m
>>> opt = pyo.SolverFactory("ipopt")
>>> result = opt.solve(instance, tee=True)
Further details about pysmo.polynomial_regression may be found by consulting the examples or reading the paper […]
Available Methods¶

class
idaes.surrogate.pysmo.polynomial_regression.
FeatureScaling
[source]¶ A class for scaling and unscaling input and output data. The class contains two main methods:
data_scaling
anddata_unscaling

static
data_scaling
(data)[source]¶ data_scaling
performs columnwise minimax scaling on the input dataset.Parameters: data – The input data set to be scaled. Must be a numpy array or dataframe. Returns:  tuple containing:
 scaled_data : A 2D Numpy Array containing the scaled data. All array values will be between [0, 1].
 data_minimum : A 2D row vector containing the columnwise minimums of the input data.
 data_maximum : A 2D row vector containing the columnwise maximums of the input data.
Return type: (tuple) Raises: TypeError
– Raised when the input data is not a numpy array or dataframe

static
data_unscaling
(x_scaled, x_min, x_max)[source]¶ data_unscaling
performs columnwise unscaling on the a minmaxscaled input dataset.Parameters:  x_scaled (NumPy Array) – Data to be unscaled. Data values should be between 0 and 1.
 x_min (NumPy vector) – \(n \times 1\) vector containing the actual minimum value for each column. Must contain same number of elements as the number of columns in x_scaled.
 x_max (NumPy vector) – \(n \times 1\) vector vector containing the actual minimum value for each column. Must contain same number of elements as the number of columns in x_scaled.
Returns: A 2D numpy array containing the scaled data, \(x_{min} + x_{scaled} * (x_{max}  x_{min})\)
Return type: NumPy Array
Raises: IndexError
– Raised when the dimensions of the arrays are inconsistent.

static

class
idaes.surrogate.pysmo.polynomial_regression.
PolynomialRegression
(original_data_input, regression_data_input, maximum_polynomial_order, number_of_crossvalidations=None, no_adaptive_samples=None, training_split=None, max_fraction_training_samples=None, max_iter=None, solution_method=None, multinomials=None, fname=None, overwrite=False)[source]¶ The PolynomialRegression class performs polynomial regression on a training data set.
The class must first be initialized by calling PolynomialRegression. Regression is then carried out by calling
poly_training
. For a given dataset with \(n\) features \(x_{1},x_{2},\ldots,x_{n}\), Polyregression is able to consider three types of basis functions:
 Mononomial terms (\(x_{i}^{p},p \leq 10\)) for all individual features. The maximum degree to be considered can be set by the user (maximum_polynomial_order)
 All first order interaction terms \(x_{1}x_{2}\), \(x_{1}x_{3}\) etc. This can be turned on or off by the user (set multinomials)
 User defined input features, e.g. \(\sin(x_{1})\). These must be Pyomo functions and should be provided as a list by the user calling
set_additional_terms
method before the polynomial training is done.
Example:
# Initialize the class and set additional terms >>> d = PolynomialRegression(full_data, training_data, maximum_polynomial_order=2, max_iter=20, multinomials=1, solution_method='pyomo') >>> p = d.get_feature_vector() >>> d.set_additional_terms([...extra terms...]) # Train polynomial model and predict output for an test data x_test >>> results = d.poly_training() >>> predictions = d.poly_predict_output(results, x_test)
Parameters:  regression_data_input (NumPy Array of Pandas Dataframe) – The dataset for regression training. It is expected to contain the features (X) and output (Y) data, with the output values (Y) in the last column.
 original_data_input (NumPy Array of Pandas Dataframe) – If regression_data_input was drawn from a larger dataset by some sampling approach, the larger dataset may be provided here. When additional data is not available, the same data supplied for training_data can be supplied  this tells the algorithm not to carry out adaptive sampling.
 maximum_polynomial_order (int) – The maximum polynomial order to be considered.
Further details about the optional inputs may be found under the
__init__
method.
__init__
(original_data_input, regression_data_input, maximum_polynomial_order, number_of_crossvalidations=None, no_adaptive_samples=None, training_split=None, max_fraction_training_samples=None, max_iter=None, solution_method=None, multinomials=None, fname=None, overwrite=False)[source]¶ Initialization of PolynomialRegression class.
Parameters:  regression_data_input (NumPy Array of Pandas Dataframe) – The dataset for regression training. It is expected to contain features and output data, with the output values (Y) in the last column.
 original_data_input (NumPy Array of Pandas Dataframe) – If regression_data_input was drawn from a larger dataset by some sampling approach, the larger dataset may be provided here.
 maximum_polynomial_order (int) – The maximum polynomial order to be considered.
Keyword Arguments:  number_of_crossvalidations (int) – The number of polynomial fittings and crossvalidations to be carried out for each polynomial function/expression. Must be a positive, nonzero integer. Default=3.
 training_split (float) – The training/test split to be used for regression_data_input. Must be between 0 and 1. Default = 0.75
 solution_method (str) –
The method to be used for solving the least squares optimization problem for polynomial regression. Three options are available:
 ”MLE” : The mle (maximum likelihood estimate) method solves the least squares problem using linear algebra. Details of the method may be found in Forrester et al.
 ”BFGS” : This approach solves the least squares problem using scipy’s BFGS algorithm.
 ”pyomo”: This option solves the optimization problem in pyomo with IPOPT as solver. This is the default option.
 multinomials (bool) – This option determines whether or not multinomial terms are considered during polynomial fitting. Takes 0 for No and 1 for Yes. Default = 1.
Returns: self object containing all the input information.
Raises: ValueError
–  The input datasets (original_data_input or regression_data_input) are of the wrong type (not Numpy arrays or Pandas Dataframes)Exception
– * maximum_polynomial_order is not a positive, nonzero integer or maximum_polynomial_order is higher than the number of training samples availableException
– * solution_method is not ‘mle’, ‘pyomo’ or ‘bfgsException
–  multinomials is not binary (0 or 1)Exception
–  training_split is not between 0 and 1Exception
–  number_of_crossvalidations is not a positive, nonzero integerException
–  max_fraction_training_samples is not between 0 and 1Exception
–  no_adaptive_samples is not a positive, nonzero integerException
–  max_iter is not a positive, nonzero integerwarnings.warn
–  When the number of crossvalidations is too high, i.e. number_of_crossvalidations > 10

get_feature_vector
()[source]¶ The
get_feature_vector
method generates the list of regression features from the column headers of the input dataset.Returns: An indexed parameter list of the variables supplied in the original data Return type: Pyomo IndexedParam Example:
# Create a small dataframe with three columns ('one', 'two', 'three') and two rows (A, B) >>> xy_data = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])], orient='index', columns=['one', 'two', 'three']) # Initialize the **PolynomialRegression** class and print the column headers for the variables >>> f = PolynomialRegression(xy_data, xy_data, maximum_polynomial_order=1, multinomials=True, training_split=0.8) >>> p = f.get_feature_vector() >>> for i in p.keys(): >>> print(i) one two

poly_predict_output
(results_vector, x_data)[source]¶ The
poly_predict_output
method generates output predictions for input data x_data based a previously generated polynomial fitting.Parameters:  results_vector – Python object containing results of polynomial fit generated by calling the poly_training function.
 x_data – Numpy array of designs for which the output is to be evaluated/predicted.
Returns: Output variable predictions based on the polynomial fit.
Return type: Numpy Array

poly_training
()[source]¶ The
poly_training
method trains a polynomial model to an input dataset. It calls the core method which is called in the PolynomialRegression class (polynomial_regression_fitting). It accepts no user input, inheriting the information passed in class initialization.Returns:  Python Object (results) containing the results of the polynomial regression process including:
 the polynomial order (self.polynomial_order)
 polynomial coefficients (self.optimal_weights_array), and
 MAE and MSE errors as well as the \(R^{2}\) (results.errors).
Return type: tuple

set_additional_terms
(term_list)[source]¶ set_additional_terms
accepts additional userdefined features for consideration during regression.Parameters: term_list (list) – List of additional terms to be considered as regression features. Each term in the list must be a Pyomosupported intrinsic function. Example:
# To add the sine and cosine of a variable with header 'X1' in the dataset as additional regression features: >>> xy_data = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])], orient='index', columns=['X1', 'X2', 'Y']) >>> A = PolynomialRegression(xy_data, xy_data, maximum_polynomial_order=5) >>> p = A.get_feature_vector() >>> A.set_additional_terms([ pyo.sin(p['X1']) , pyo.cos(p['X1']) ])

class
idaes.surrogate.pysmo.polynomial_regression.
ResultReport
(optimal_weight_vector, polynomial_order, multinomials, mae_error, mse_error, R2, adjusted_R2, number_of_iterations, results_vector, additional_features_array, final_regression_data, df_coefficients, extra_terms_coeffs, extra_terms_feature_vector, extra_terms_expressions)[source]¶ 
generate_expression
(variable_list)[source]¶ The
generate_expression
method returns the Pyomo expression for the polynomial model trained.The expression is constructed based on a supplied list of variables variable_list and the output of
poly_training
.Parameters: variable_list (list) – List of input variables to be used in generating expression. This can be the a list generated from the results of get_feature_vector
. The user can also choose to supply a new list of the appropriate length.Returns: Pyomo expression of the polynomial model based on the variables provided in variable_list. Return type: Pyomo Expression

References:¶
[1] Forrester et al.’s book “Engineering Design via Surrogate Modelling: A Practical Guide”, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470770801