ALAMOPY: ALAMO Python#

The purpose of ALAMOPY (Automatic Learning of Algebraic MOdels PYthon wrapper) is to provide a wrapper for the software ALAMO which generates algebraic surrogate models of black-box systems for which a simulator or experimental setup is available. Consider a system for which the outputs z are an unknown function f of the system inputs x. The software identifies a function f, i.e., a relationship between the inputs and outputs of the system, that best matches data (pairs of x and corresponding z values) that are collected via simulation or experimentation.

Installing ALAMO#

ALAMO (Automatic Learning of Algebraic MOdels) is an optional dependency developed and licensed by The Optimization Firm: https://www.minlp.com/alamo-modeling-tool. The provided link include further information on obtaining a license, installing the tool, obtaining the Baron (Branch-And-Reduce Optimization Navigator) solver which ALAMO leverages, and specific examples through a user manual and installation guide. Alternatively, users may directly access the user guide here: https://minlp.com/downloads/docs/alamo%20manual.pdf.

During installations, it is recommended that users Windows 10 users check that the ALAMO path is set. Additionally, users must place the ALAMO license file in the folder where it is installed.

More details on ALAMO options may be found in the user guide documentation linked above. If users encounter specific error codes while running the ALAMOPy tool in IDAES, the user guide contains detailed descriptions of each termination condition and error message.

Basic Usage#

ALAMOPY’s main functions are alamopy.AlamoTrainer, which calls ALAMO to train surrogates from passed data, and alamopy.AlamoSurrogate, which populates an IDAES SurrogateObject with the ALAMO model. This object may then be passed directly to other IDAES methods for visualization or flowsheet integration (see the sections for Visualization and Examples below).

Data can be read in or simulated using available Python packages. The main arguments of the alamopy.AlamoTrainer` Python function are inputs and outputs, which are 2D arrays of data with associated variable labels. Once a trained surrogate object exists, alamopy.AlamoSurrogate takes the model expressions, variable labels and input bounds as arguments. For example,

# after reading or generating a DataFrame object called `data_training`
trainer = AlamoTrainer(input_labels=['x1', 'x2'], output_labels=['z1', 'z2'], training_dataframe=data_training)
trainer.config.[Alamo Option] = [Valid Option Choice]  # see below for more details
success, alm_surr, msg = trainer.train_surrogate()

surrogate_expressions = trainer._results['Model']
input_labels = trainer._input_labels
output_labels = trainer._output_labels
xmin, xmax = [0.1, 0.8], [0.8, 1.2]
input_bounds = {input_labels[i]: (xmin[i], xmax[i]) for i in range(len(input_labels))}

alm_surr = AlamoSurrogate(surrogate_expressions, input_labels, output_labels, input_bounds)

where [Alamo Option] is a valid keyword argument that can be passed to the ALAMO Python function subject a range of available [Valid Option Choices] values to customize the basis function set, names of output files, and other options available in ALAMO.

User may save their trained surrogate objects by serializing to JSON, and load into a different script, notebook or environment. For example,

# to save a model
model = alm_surr.save_to_file('alamo_surrogate.json', overwrite=True)

# to load a model
surrogate = AlamoSurrogate.load_from_file('alamo_surrogate.json')

Data Arguments#

The following arguments are required by the AlamoTrainer method:

  • input_labels: user-specified labels given to the inputs

  • output_labels: user-specified labels given to the outputs

  • training_dataframe: dataframe (Pandas) object containing training dataset

# after reading or generating a DataFrame object called `data_training`
trainer = AlamoTrainer(input_labels=['x1', 'x2'], output_labels=['z1', 'z2'], training_dataframe=data_training)
trainer.config.[Alamo Option] = [Valid Option Choice]  # see below for more details
success, alm_surr, msg = trainer.train_surrogate()

The following arguments are required by the AlamoSurrogate method:

  • surrogate_expressions: Pyomo expression object(s) generated by training the surrogate model(s)

  • input_labels: user-specified labels given to the inputs

  • output_labels: user-specified labels given to the outputs

  • input_bounds: minimum/maximum bounds for each input variable to constraint training search space

surrogate_expressions = trainer._results['Model']
input_labels = trainer._input_labels
output_labels = trainer._output_labels
xmin, xmax = [0.1, 0.8], [0.8, 1.2]
input_bounds = {input_labels[i]: (xmin[i], xmax[i]) for i in range(len(input_labels))}

alm_surr = AlamoSurrogate(surrogate_expressions, input_labels, output_labels, input_bounds)

Available Basis Functions#

The following basis functions are allowed during regression:

  • constant, linfcns, expfcns, logfcns, sinfcns, cosfcns: 0-1 option to include constant, linear, exponential, logarithmic, sine, and cosine basis functions. For example,

trainer.config.constant = 1, trainer.config.linfcns = 1, trainer.config.expfcns = 1, trainer.config.logfcns = 1, trainer.config.sinfcns = 1, trainer.config.cosfcns = 1

This results in basis functions = k, x1, exp(x1), log(x1), sin(x1), cos(x1)

  • monomialpower, multi2power, multi3power: list of monomial, binomial, and trinomial powers. For example,

trainer.config.monomialpower = [2,3,4], trainer.config.multi2power = [1,2,3], trainer.config.multi3power = [1,2,3]

This results in the following basis functions:

  • Monomial functions = x^2, x^3, x^4

  • Binomial functions = x1*x2, (x1*x2)^2, (x1*x2)^3

  • Trinomial functions = (x1*x2*x3), (x1*x2*x3)^2, (x1*x2*x3)^3

  • ratiopower: list of ratio powers. For example,

trainer.config.ratiopower = (1,2,3)

This results in basis functions = (x1/x2), (x1/x2)^2, (x1/x2)^3

ALAMO Regression Options#

  • modeler: fitness metric to beused for model building (1-8)

      1. BIC: Bayesian information criterion

      1. MallowsCp: Mallow’s Cp

      1. AICc: the corrected Akaike’s information criterion

      1. HQC: the Hannan-Quinn information criterion

      1. MSE: mean square error

      1. SSEp: sum of square error plus a penalty proportional to the model size (Note: convpen is the weight of the penalty)

      1. RIC: the risk information criterion

      1. MADp: the maximum absolute eviation plus a penalty proportional to model size (Note: convpen is the weight of the penalty)

  • screener: regularization method used to reduce the number of potential basis functions pre-optimization (0-2)

      1. none: don’t use a regularization method

      1. lasso: use the LASSO (Least Absolute Shrinkage and Selection Operator) regularization method

      1. SIS: use the SIS (Sure Independence Screening) regularization method

  • maxterms: maximum number of terms to be fit in the model, surrogates will use fewer if possible

  • minterms: minimum number of terms to be fit in the model, a value of 0 means no limit is imposed

  • convpen: when MODELER is set to 6 or 8 the size of the model is weighted by CONVPEN.

  • sismult: non-negative number of basis functions retained by the SIS screener

  • simulator: a python function to be used as a simulator for ALAMO, a variable that is a python function (not a string)

  • maxiter: max iteration of runs

  • maxtime: max length of total execution time in seconds

  • datalimitterms: limit model terms to number of measurements (True/False)

  • numlimitbasis: eliminate infeasible basis functions (True/False)

  • exclude: list of inputs to exclude during building

  • ignore: list of outputs to ignore during building

  • xisint: list of inputs that should be treated as integers

  • zisint: list of outputs that should be treated as integers

Scaling and Metrics Options#

  • xfactor: list of scaling factors for input variables

  • xscaling: sets XFACTORS equal to the range of each input (True/False)

  • scalez: scale output variables (True/False)

  • ncvf: number of folds for cross validation

  • tolrelmetric: relative tolerance for outputs

  • tolabsmetric: absolute tolerance for outputs

  • tolmeanerror: convergence tolerance for mean errors in outputs

  • tolsse: absolute tolerance on SSE (sum of squared errors)

  • mipoptca: absolute tolerance for MIP

  • mipoptcr: relative tolerance for MIP

  • linearerror: use a linear objective instead of squared error (True/False)

  • GAMS: complete path to GAMS executable, or name if GAMS is in the user path

  • solvemip: solve MIP with an optimizer (True/False)

  • GAMSSOLVER: name of preferred GAMS solver to solve ALAMO mip quadratic subproblems

  • builder: use a greedy heuristic (True/False)

  • backstepper: use a greedy heuristicd to build down a model by starting from the least squares model and removing one variable at a time (True/False)

File Options#

  • print_to_screen: send ALAMO output to stdout (True/False)

  • alamo_path: path to ALAMO executable (if not in path)

  • filename : file name to use for ALAMO files, must be full path of a .alm file

  • working_directory: full path to working directory for ALAMO to use

  • overwrite_files: overwrite (delete) existing files when re-generating (True/False)

ALAMOPY results dictionary#

The results from alamopy.alamo are returned as a python dictionary. The data can be accessed by using the dictionary keys listed below. For example,

# once the trainer object `trainer` has been defined, configured and trained
regression_results = trainer._results
surrogate_expressions = trainer._results['Model']

Fitness metrics#

  • trainer._results[‘ModelSize’]: number of terms chosen in the regression

  • trainer._results[‘R2’]: R2 value of the regression

  • Objective value metrics: trainer._results[‘SSE’], trainer._results[‘RMSE’], trainer._results[‘MADp’]

Regression description#

  • trainer._results[‘AlamoVersion’]: Version of ALAMO

  • trainer._results[‘xlabels’], trainer._results[‘zlabels’]: The labels used for the inputs/outputs

  • trainer._results[‘xdata’], trainer._results[‘zdata’]: array of xdata/zdata

  • trainer._results[‘ninputs’], trainer._results[‘nbas’]: number of inputs/basis functions

Performance Metrics#

There are three types of regression problems that are used: ordinary linear regression (olr), classic linear regression (clr), and a mixed integer program (mip). Performance metrics include the number of each problems and the time spent on each type of problem. Additionally, the time spent on other operations and the total time are included.

  • trainer._results[‘numOLRs’], trainer._results[‘OLRtime’], trainer._results[‘numCLRs’], trainer._results[‘CLRtime’], trainer._results[‘numMIPs’], trainer._results[‘MIPtime’]: number of type of regression problems solved and time

  • trainer._results[‘OtherTime: Time spent on other operations

  • trainer._results[‘TotalTime’]: Total time spent on the regression

Custom Basis Functions#

Custom basis functions can be added to the built-in functions to expand the functional forms available. In ALAMO, this can be done with the following syntax

NCUSTOMBAS #
BEGIN_CUSTOMBAS
x1^2 * x2^2
END_CUSTOMBAS

To use this advanced capability in ALAMOPY, the following function is called. Note it is necessary to use the xlabels assigned to the input parameters.

trainer.config.custom_basis_functions = ["x1^2 * x2^2", "...", "..." ...]

Visualization#

Visualizing Surrogate Model Results

For visualizing ALAMO-trained surrogates via parity and residual plots, see Visualizing Surrogate Model Results.

ALAMOPY Examples#

For an example of optimizing a flowsheet containing an ALAMO-trained surrogate model, see [Autothermal Reformer Flowsheet Optimization with ALAMO Surrogate Object](IDAES/examples-pse).