ALAMOPY.ALAMO Options

This page lists in more detail the ALAMOPY options and the relation of ALAMO and ALAMOPY.

Installing ALAMO

ALAMO (Automatic Learning of Algebraic MOdels) is an optional dependency developed and licensed by The Optimization Firm: https://www.minlp.com/alamo-modeling-tool. The provided link include further information on obtaining a license, installing the tool, obtaining the Baron (Branch-And-Reduce Optimization Navigator) solver which ALAMO leverages, and specific examples through a user manual and installation guide. Alternatively, users may directly access the user guide here: https://minlp.com/downloads/docs/alamo%20manual.pdf.

During installations, it is recommended that users Windows 10 users check that the ALAMO path is set. Additionally, users must place the ALAMO license file in the folder where it is installed.

More details on ALAMO options may be found in the user guide documentation linked above. If users encounter specific error codes while running the ALAMOPy tool in IDAES, the user guide contains detailed descriptions of each termination condition and error message.

Basic ALAMOPY.ALAMO options

Data Arguments

The following arguments are required by the AlamoTrainer method:

  • input_labels: user-specified labels given to the inputs

  • output_labels: user-specified labels given to the outputs

  • training_dataframe: dataframe (Pandas) object containing training dataset

# after reading or generating a DataFrame object called `data_training`
trainer = AlamoTrainer(input_labels=['x1', 'x2'], output_labels=['z1', 'z2'], training_dataframe=data_training)
trainer.config.[Alamo Option] = [Valid Option Choice]  # see below for more details
success, alm_surr, msg = trainer.train_surrogate()

The following arguments are required by the AlamoSurrogate method:

  • surrogate_expressions: Pyomo expression object(s) generated by training the surrogate model(s)

  • input_labels: user-specified labels given to the inputs

  • output_labels: user-specified labels given to the outputs

  • input_bounds: minimum/maximum bounds for each input variable to constraint training search space

surrogate_expressions = trainer._results['Model']
input_labels = trainer._input_labels
output_labels = trainer._output_labels
xmin, xmax = [0.1, 0.8], [0.8, 1.2]
input_bounds = {input_labels[i]: (xmin[i], xmax[i]) for i in range(len(input_labels))}

alm_surr = AlamoSurrogate(surrogate_expressions, input_labels, output_labels, input_bounds)

Available Basis Functions

The following basis functions are allowed during regression:

  • constant, linfcns, expfcns, logfcns, sinfcns, cosfcns, grbfcns: 0-1 option to include constant, linear, exponential, logarithmic, sine, cosine, and Gaussian radial basis functions. For example,

trainer.config.constant = 1, trainer.config.linfcns = 1, trainer.config.expfcns = 1, trainer.config.logfcns = 1, trainer.config.sinfcns = 1, trainer.config.cosfcns = 1, trainer.config.grbfcns = 1

This results in basis functions = k, x1, exp(x1), log(x1), sin(x1), cos(x1), exp(-(\(\epsilon\) ||x1||)^2)

  • rbfparam: multiplicative constant \(epsilon\) used in the Gaussian radial basis functions

  • monomialpower, multi2power, multi3power: list of monomial, binomial, and trinomial powers. For example,

trainer.config.monomialpower = [2,3,4], trainer.config.multi2power = [1,2,3], trainer.config.multi3power = [1,2,3]

This results in the following basis functions:

  • Monomial functions = x^2, x^3, x^4

  • Binomial functions = x1*x2, (x1*x2)^2, (x1*x2)^3

  • Trinomial functions = (x1*x2*x3), (x1*x2*x3)^2, (x1*x2*x3)^3

  • ratiopower: list of ratio powers. For example,

trainer.config.ratiopower = (1,2,3)

This results in basis functions = (x1/x2), (x1/x2)^2, (x1/x2)^3

ALAMO Regression Options

  • modeler: fitness metric to beused for model building (1-8)

      1. BIC: Bayesian infromation criterion

      1. MallowsCp: Mallow’s Cp

      1. AICc: the corrected Akaike’s information criterion

      1. HQC: the Hannan-Quinn information criterion

      1. MSE: mean square error

      1. SSEp: sum of square error plus a penalty proportional to the model size (Note: convpen is the weight of the penalty)

      1. RIC: the risk information criterion

      1. MADp: the maximum absolute eviation plus a penalty proportional to model size (Note: convpen is the weight of the penalty)

  • screener: regularization method used to reduce the number of potential basis functions pre-optimization (0-2)

      1. none: don’t use a regularization method

      1. lasso: use the LASSO (Least Absolute Shrinkage and Selection Operator) regularization method

      1. SIS: use the SIS (Sure Independence Screening) regularization method

  • maxterms: maximum number of terms to be fit in the model, surrogates will use fewer if possible

  • minterms: minimum number of terms to be fit in the model, a value of 0 means no limit is imposed

  • convpen: when MODELER is set to 6 or 8 the size of the model is weighted by CONVPEN.

  • sismult: non-negative number of basis functions retained by the SIS screener

  • simulator: a python function to be used as a simulator for ALAMO, a variable that is a python function (not a string)

  • maxiter: max iteration of runs

  • maxtime: max length of total execution time in seconds

  • datalimitterms: limit model terms to number of measurements (True/False)

  • numlimitbasis: eliminate infeasible basis functions (True/False)

  • exclude: list of inputs to exclude during building

  • ignore: list of outputs to ignore during building

  • xisint: list of inputs that should be treated as integers

  • zisint: list of outputs that should be treated as integers

Scaling and Metrics Options

  • xfactor: list of scaling factors for input variables

  • xscaling: sets XFACTORS equal to the range of each input (True/False)

  • scalez: scale output variables (True/False)

  • ncvf: number of folds for cross validation

  • tolrelmetric: relative tolerance for outputs

  • tolabsmetric: absolute tolerance for outputs

  • tolmeanerror: convergence tolerance for mean errors in outputs

  • tolsse: absolute tolerance on SSE (sum of squared errors)

  • mipoptca: absolute tolerance for MIP

  • mipoptcr: relative tolerance for MIP

  • linearerror: use a linear objective instead of squared error (True/False)

  • GAMS: complete path to GAMS executable, or name if GAMS is in the user path

  • solvemip: solve MIP with an optimizer (True/False)

  • GAMSSOLVER: name of preferred GAMS solver to solve ALAMO mip quadratic subproblems

  • builder: use a greedy heuristic (True/False)

  • backstepper: use a greedy heuristicd to build down a model by starting from the least squares model and removing one variable at a time (True/False)

File Options

  • print_to_screen: send ALAMO output to stdout (True/False)

  • alamo_path: path to ALAMO executable (if not in path)

  • filename : file name to use for ALAMO files, must be full path of a .alm file

  • working_directory: full path to working directory for ALAMO to use

  • overwrite_files: overwrite (delete) existing files when re-generating (True/False)

ALAMOPY results dictionary

The results from alamopy.alamo are returned as a python dictionary. The data can be accessed by using the dictionary keys listed below. For example,

# once the trainer object `trainer` has been defined, configured and trained
regression_results = trainer._results
surrogate_expressions = trainer._results['Model']

Fitness metrics

  • trainer._results[‘ModelSize’]: number of terms chosen in the regression

  • trainer._results[‘R2’]: R2 value of the regression

  • Objective value metrics: trainer._results[‘SSE’], trainer._results[‘RMSE’], trainer._results[‘MADp’]

Regression description

  • trainer._results[‘AlamoVersion’]: Version of ALAMO

  • trainer._results[‘xlabels’], trainer._results[‘zlabels’]: The labels used for the inputs/outputs

  • trainer._results[‘xdata’], trainer._results[‘zdata’]: array of xdata/zdata

  • trainer._results[‘ninputs’], trainer._results[‘nbas’]: number of inputs/basis functions

Performance specs

There are three types of regression problems that are used: ordinary linear regression (olr), classic linear regression (clr), and a mixed integer program (mip). Performance metrics include the number of each problems and the time spent on each type of problem. Additionally, the time spent on other operations and the total time are included.

  • trainer._results[‘numOLRs’], trainer._results[‘OLRtime’], trainer._results[‘numCLRs’], trainer._results[‘CLRtime’], trainer._results[‘numMIPs’], trainer._results[‘MIPtime’]: number of type of regression problems solved and time

  • trainer._results[‘OtherTime: Time spent on other operations

  • trainer._results[‘TotalTime’]: Total time spent on the regression

Custom Basis Functions

Custom basis functions can be added to the built-in functions to expand the functional forms available. In ALAMO, this can be done with the following syntax

NCUSTOMBAS #
BEGIN_CUSTOMBAS
x1^2 * x2^2
END_CUSTOMBAS

To use this advanced capability in ALAMOPY, the following function is called. Note it is necessary to use the xlabels assigned to the input parameters.

trainer.config.custom_basis_functions = ["x1^2 * x2^2", "...", "..." ...]