Generating Radial Basis Function (RBF) models with PySMO

The pysmo.radial_basis_function package has the capability to generate different types of RBF surrogates from data based on the basis function selected. RBFs models are usually of the form where

\[\begin{equation} y_{k}=\sum_{j=1}^{\Omega}w_{j}\psi\left(\Vert x_{k}-z_{j}\Vert\right)\qquad k=1,\ldots,m\quad\label{eq:RBF-expression} \end{equation}\]

where \(z_{j}\) are basis function centers (in this case, the training data points), \(w_{j}\) are the radial weights associated with each center \(z_{j}\), and \(\psi\) is a basis function transformation of the Euclidean distances.

PySMO offers a range of basis function transformations \(\psi\), as shown in the table below.

List of available RBF basis transformations, \(d = \parallel x_{k}-z_{j}\parallel\)

Transformation type

PySMO option name

\(\psi(d)\)

Linear

‘linear’

\(d\)

Cubic

‘cubic’

\(d^{3}\)

Thin-plate spline

‘spline’

\(d^{2}\ln(d)\)

Gaussian

‘gaussian’

\(e^{\left(-d^{2}\sigma^{2}\right)}\)

Multiquadric

‘mq’

\(\sqrt{1+\left(\sigma d\right)^{2}}\)

Inverse mMultiquadric

‘imq’

\(1/{\sqrt{1+\left(\sigma d\right)^{2}}}\)

Selection of parametric basis functions increase the flexibility of the radial basis function but adds an extra parameter (\(\sigma\))to be estimated.

Basic Usage

To generate an RBF model with PySMO, the pysmo.radial_basis_function class is first initialized, and then the function training is called on the initialized object:

# Required imports
>>> from idaes.surrogate.pysmo import radial_basis_function
>>> import pandas as pd

# Load dataset from a csv file
>>> xy_data = pd.read_csv('data.csv', header=None, index_col=0)

# Initialize the RadialBasisFunctions class, extract the list of features and train the model
>>> rbf_init = radial_basis_function.RadialBasisFunctions(xy_data, *kwargs)
>>> features = rbf_init.get_feature_vector()
>>> rbf_fit = rbf_init.training()
  • xy_data is a two-dimensional python data structure containing the input and output training data. The output values MUST be in the last column.

Optional Arguments

  • basis_function - option to specify the type of basis function to be used in the RBF model. Default is ‘gaussian’.

  • regularization - boolean which determines whether regularization of the RBF model is considered. Default is True.

    • When regularization is turned on, the resulting model is a regressing RBF model.

    • When regularization is turned off, the resulting model is an interpolating RBF model.

pysmo.radial_basis_function Output

The result of pysmo.radial_basis_function (rbf_fit in above example) is a python object containing information about the problem set-up, the optimal radial basis function weights \(w_{j}\) and different error and quality-of-fit metrics such as the mean-squared-error (MSE) and the \(R^{2}\) coefficient-of-fit. A Pyomo expression can be generated from the object simply passing a list of variables into the function generate_expression:

# Create a python list from the headers of the dataset supplied for training
>>> list_vars = []
>>> for i in features.keys():
>>>     list_vars.append(features[i])

# Pass list to generate_expression function to obtain a Pyomo expression as output
>>> print(rbf_init.generate_expression(list_vars))

Similar to the pysmo.polynomial_regression module, the output of the generate_expression function can be passed into an IDAES or Pyomo module as a constraint, objective or expression.

Prediction with pysmo.radial_basis_function models

Once an RBF model has been trained, predictions for values at previously unsampled points x_unsampled can be evaluated by calling the predict_output() function on the unsampled points:

# Create a python list from the headers of the dataset supplied for training
>>> y_unsampled = rbf_init.predict_output(x_unsampled)

Further details about pysmo.radial_basis_function module may be found by consulting the examples or reading the paper […]

Available Methods

class idaes.surrogate.pysmo.radial_basis_function.FeatureScaling[source]

A class for scaling and unscaling input and output data. The class contains two main methods: data_scaling_minmax and data_unscaling_minmax

static data_scaling_minmax(data)[source]

data_scaling_minmax performs column-wise min-max scaling on the input dataset.

Parameters

data – The input data set to be scaled. Must be a numpy array or dataframe.

Returns

tuple containing:
  • scaled_data : A 2-D Numpy Array containing the scaled data. All array values will be between [0, 1].

  • data_minimum : A 2-D row vector containing the column-wise minimums of the input data.

  • data_maximum : A 2-D row vector containing the column-wise maximums of the input data.

Return type

(tuple)

Raises

TypeError – Raised when the input data is not a numpy array or dataframe

static data_unscaling_minmax(x_scaled, x_min, x_max)[source]

data_unscaling_minmax performs column-wise un-scaling on the a minmax-scaled input dataset.

Parameters
  • x_scaled (NumPy Array) – Data to be un-scaled. Data values should be between 0 and 1.

  • x_min (NumPy vector) – \(n \times 1\) vector containing the actual minimum value for each column. Must contain same number of elements as the number of columns in x_scaled.

  • x_max (NumPy vector) – \(n \times 1\) vector vector containing the actual minimum value for each column. Must contain same number of elements as the number of columns in x_scaled.

Returns

A 2-D numpy array containing the scaled data, \(x_{min} + x_{scaled} * (x_{max} - x_{min})\)

Return type

NumPy Array

Raises

IndexError – Raised when the dimensions of the arrays are inconsistent.

class idaes.surrogate.pysmo.radial_basis_function.RadialBasisFunctions(XY_data, basis_function=None, solution_method=None, regularization=None, fname=None, overwrite=False)[source]

The RadialBasisFunctions class generates a radial basis function fitting for a training data set.

The class must first be initialized by calling RadialBasisFunctions. Regression is then carried out by calling the method training.

For a given dataset with n features \(x_{1},\ldots,x_{n}\), RadialBasisFunctions is able to consider six types of basis transformations:
  • Linear (‘linear’)

  • Cubic (‘cubic’)

  • Gaussian (‘gaussian’)

  • Multiquadric (‘mq’)

  • Inverse multiquadric (‘imq’)

  • Thin-plate spline (‘spline’)

training selects the best hyperparameters (regularization parameter \(\lambda\) and shape parameter \(\sigma\), where necessary) by evaluating the leave-one-out cross-validation error for each (\(\lambda,\sigma\)) pair.

It should be noted that the all the training points are treated as centres for the RBF, resulting in a square system.

Example:

 # Initialize the class
>>> d = RadialBasisFunctions(training_data, basis_function='gaussian', solution_method='pyomo', regularization=True))
>>> p = d.get_feature_vector()

# Train RBF model and predict output for an test data x_test
>>> d.training()
>>> predictions = d.predict_output(x_test)
Parameters

XY_data (Numpy Array or Pandas Dataframe) – The dataset for RBF training. XY_data is expected to contain the features (X) and output (Y) data, with the output values (Y) in the last column.

Further details about the optional inputs may be found under the __init__ method.

__init__(XY_data, basis_function=None, solution_method=None, regularization=None, fname=None, overwrite=False)[source]

Initialization of RadialBasisFunctions class.

Parameters

XY_data (Numpy Array or Pandas Dataframe) – The dataset for RBF training. XY_data is expected to contain feature and output information, with the output values (y) in the last column.

Keyword Arguments
  • basis_function (str) –

    The basis function transformation to be applied to the training data. Two classes of basis transformations are available for selection:

    • Fixed basis transformations, which require no shape parameter \(\sigma\) :

      1. ’cubic’ : Cubic basis transformation

      2. ’linear’ : Linear basis transformation

      3. ’spline’ : Thin-plate spline basis transformation

    • Parametric basis transformations which require a shape parameter:

      1. ’gaussian’ : Gaussian basis transformation (Default)

      2. ’mq’ : Multiquadric basis transformation

      3. ’imq’ : Inverse multiquadric basis transformation

  • solution_method (str) –

    The method to be used for solving the RBF least squares optimization problem. Three options are available:

    1. ’algebraic’ : The explicit algebraic method solves the least squares problem using linear algebra.

    2. ’BFGS’ : This approach solves the least squares problem using SciPy’s BFGS algorithm.

    3. ’pyomo’ : This option solves the optimization problem in Pyomo with IPOPT as solver. This is the default.

  • regularization (bool) – This option determines whether or not the regularization parameter \(\lambda\) is considered during RBF fitting. Default setting is True.

Returns

self object with the input information

Raises
  • ValueError – The input dataset is of the wrong type (not a NumPy array or Pandas Dataframe)

  • Exception

    • basis_function entry is not valid.

  • Exception

    • solution_method is not ‘algebraic’, ‘pyomo’ or ‘bfgs’.

  • Exception

    • \(\lambda\) is not boolean.

Example:

# Specify the gaussian basis transformation
>>> d = RadialBasisFunctions(XY_data, basis_function='gaussian')
generate_expression(variable_list)[source]

The generate_expression method returns the Pyomo expression for the RBF model trained.

The expression is constructed based on the supplied list of variables variable_list and the results of the previous RBF training process.

Parameters

variable_list (list) – List of input variables to be used in generating expression. This can be the a list generated from the output of get_feature_vector. The user can also choose to supply a new list of the appropriate length.

Returns

Pyomo expression of the RBF model based on the variables provided in variable_list

Return type

Pyomo Expression

get_feature_vector()[source]

The get_feature_vector method generates the list of regression features from the column headers of the input dataset.

Returns

An indexed parameter list of the variables supplied in the original data

Return type

Pyomo IndexedParam

Example:

# Create a small dataframe with three columns ('one', 'two', 'three') and two rows (A, B)
>>> xy_data = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])], orient='index', columns=['one', 'two', 'three'])

# Initialize the **RadialBasisFunctions** class with a linear kernel and print the column headers for the variables
>>> f = RadialBasisFunctions(xy_data, basis_function='linear')
>>> p = f.get_feature_vector()
>>> for i in p.keys():
>>>     print(i)
one
two
predict_output(x_data)[source]

The predict_output method generates output predictions for input data x_data based a previously generated RBF fitting.

Parameters

x_data (NumPy Array) – Designs for which the output is to be evaluated/predicted.

Returns

Output variable predictions based on the rbf fit.

Return type

Numpy Array

static r2_calculation(y_true, y_predicted)[source]

r2_calculation returns the \(R^{2}\) as a measure-of-fit between the true and predicted values of the output variable.

Parameters
  • y_true (NumPy Array) – Vector of actual values of the output variable

  • y_predicted (NumPy Array) – Vector of predictions for the output variable based on the surrogate

Returns

\(R^{2}\) measure-of-fit between actual and predicted data

Return type

float

training()[source]

Main function for RBF training.

To train the RBF:
  1. The best values of the hyperparameters (\(\sigma, \lambda\)) are selected via LOOCV.

  2. The necessary basis transformation at the optimal hyperparameters is generated.

  3. The condition number for the transformed matrix is calculated.

  4. The optimal radial weights are evaluated using the selected optimization method.

  5. The training predictions, prediction errors and r-square coefficient of fit are evaluated by calling the methods error_calculation and r2_calculation

  6. A results object is generated by calling the ResultsReport class.

The LOOCV error for each (\(\sigma, \lambda\)) pair is evaluated by calling the function loo_error_estimation_with_rippa_method.

The pre-defined shape parameter set considers 24 irregularly spaced values ranging between 0.001 - 1000, while the regularization parameter set considers 21 values ranging between 0.00001 - 1.

Returns

self object (results) containing the all information about the best RBF fitting obtained, including:
  • the optimal radial weights (results.radial_weights),

  • when relevant, the optimal shape parameter found \(\sigma\) (results.sigma),

  • when relevant, the optimal regularization parameter found \(\lambda\) (results.regularization),

  • the RBF predictions for the training data (results.output_predictions), and

  • the \(R^{2}\) value on the training data (results.R2)

Return type

tuple

References:

[1] Forrester et al.’s book “Engineering Design via Surrogate Modelling: A Practical Guide”, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470770801

[2] Hongbing Fang & Mark F. Horstemeyer (2006): Global response approximation with radial basis functions, https://www.tandfonline.com/doi/full/10.1080/03052150500422294

[3] Rippa, S. (1999): An algorithm for selecting a good value for the parameter c in radial basis function interpolation, https://doi.org/10.1023/A:1018975909870

[4] Mongillo M.A. (2011) Choosing Basis Functions and Shape Parameters for Radial Basis Function Methods, https://doi.org/10.1137/11S010840