pysmo.sampling.UniformSampling method carries out Uniform (full-factorial) sampling. This can be done in two modes:
The samples can be selected from a user-provided dataset, or
The samples can be generated from a set of provided bounds.
- class idaes.core.surrogate.pysmo.sampling.UniformSampling(data_input, list_of_samples_per_variable, sampling_type=None, xlabels=None, ylabels=None, edges=None)¶
A class that performs Uniform Sampling. Depending on the settings, the algorithm either returns samples from an input dataset which have been selected using Euclidean distance minimization after the uniform samples have been generated, or returns samples from a supplied data range.
Full-factorial samples are based on dividing the space of each variable randomly and then generating all possible variable combinations.
The number of points to be sampled per variable needs to be specified in a list.
To use: call class with inputs, and then
# To select 50 samples on a (10 x 5) grid in a 2D space: >>> b = rbf.UniformSampling(data, [10, 5], sampling_type="selection") >>> samples = b.sample_points()
- __init__(data_input, list_of_samples_per_variable, sampling_type=None, xlabels=None, ylabels=None, edges=None)¶
Initialization of UniformSampling class. Three inputs are required.
data_input (NumPy Array, Pandas Dataframe or list) –
The input data set or range to be sampled.
When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. A single output variable (y) is assumed to be supplied in the last column if xlabels and ylabels are not supplied.
When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that the range contains no output variable information in this case.
list_of_samples_per_variable (list) – The list containing the number of subdivisions for each variable. Each dimension (variable) must be represented by a positive integer variable greater than 1.
sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.
- Keyword Arguments
xlabels (list) – List of column names (if data_input is a dataframe) or column numbers (if data_input is an array) for the independent/input variables. Only used in “selection” mode. Default is None.
ylabels (list) – List of column names (if data_input is a dataframe) or column numbers (if data_input is an array) for the dependent/output variables. Only used in “selection” mode. Default is None.
edges (bool) – Boolean variable representing how the points should be selected. A value of True (default) indicates the points should be equally spaced edge to edge, otherwise they will be in the centres of the bins filling the unit cube
self function containing the input information
sample_pointsgenerates or selects full-factorial designs from an input dataset or data range.
A numpy array or Pandas dataframe containing the sample points generated or selected by full-factorial sampling.
- Return type
NumPy Array or Pandas Dataframe
 Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf