Full-Factorial Sampling¶
The pysmo.sampling.UniformSampling
method carries out Uniform (full-factorial) sampling. This can be done in two modes:
The samples can be selected from a user-provided dataset, or
The samples can be generated from a set of provided bounds.
Available Methods¶
- class idaes.core.surrogate.pysmo.sampling.UniformSampling(data_input, list_of_samples_per_variable, sampling_type=None, xlabels=None, ylabels=None, edges=None)[source]¶
A class that performs Uniform Sampling. Depending on the settings, the algorithm either returns samples from an input dataset which have been selected using Euclidean distance minimization after the uniform samples have been generated, or returns samples from a supplied data range.
Full-factorial samples are based on dividing the space of each variable randomly and then generating all possible variable combinations.
The number of points to be sampled per variable needs to be specified in a list.
To use: call class with inputs, and then
sample_points
functionExample:
# To select 50 samples on a (10 x 5) grid in a 2D space: >>> b = rbf.UniformSampling(data, [10, 5], sampling_type="selection") >>> samples = b.sample_points()
- __init__(data_input, list_of_samples_per_variable, sampling_type=None, xlabels=None, ylabels=None, edges=None)[source]¶
Initialization of UniformSampling class. Three inputs are required.
- Parameters
data_input (NumPy Array, Pandas Dataframe or list) –
The input data set or range to be sampled.
When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. A single output variable (y) is assumed to be supplied in the last column if xlabels and ylabels are not supplied.
When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that the range contains no output variable information in this case.
list_of_samples_per_variable (list) – The list containing the number of subdivisions for each variable. Each dimension (variable) must be represented by a positive integer variable greater than 1.
sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.
- Keyword Arguments
xlabels (list) – List of column names (if data_input is a dataframe) or column numbers (if data_input is an array) for the independent/input variables. Only used in “selection” mode. Default is None.
ylabels (list) – List of column names (if data_input is a dataframe) or column numbers (if data_input is an array) for the dependent/output variables. Only used in “selection” mode. Default is None.
edges (bool) – Boolean variable representing how the points should be selected. A value of True (default) indicates the points should be equally spaced edge to edge, otherwise they will be in the centres of the bins filling the unit cube
- Returns
self function containing the input information
- Raises
ValueError – The data_input is the wrong type
ValueError – When list_of_samples_per_variable is of the wrong length, is not a list or contains elements other than integers
IndexError – When invalid column names are supplied in xlabels or ylabels
Exception – When edges entry is not Boolean
References¶
[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf