Full-Factorial Sampling

The pysmo.sampling.UniformSampling method carries out Uniform (full-factorial) sampling. This can be done in two modes:

  • The samples can be selected from a user-provided dataset, or

  • The samples can be generated from a set of provided bounds.

Available Methods

class idaes.surrogate.pysmo.sampling.UniformSampling(data_input, list_of_samples_per_variable, sampling_type=None, edges=None)[source]

A class that performs Uniform Sampling. Depending on the settings, the algorithm either returns samples from an input dataset which have been selected using Euclidean distance minimization after the uniform samples have been generated, or returns samples from a supplied data range.

Full-factorial samples are based on dividing the space of each variable randomly and then generating all possible variable combinations.

  • The number of points to be sampled per variable needs to be specified in a list.

To use: call class with inputs, and then sample_points function

Example:

# To select 50 samples on a (10 x 5) grid in a 2D space:
>>> b = rbf.UniformSampling(data, [10, 5], sampling_type="selection")
>>> samples = b.sample_points()
__init__(data_input, list_of_samples_per_variable, sampling_type=None, edges=None)[source]

Initialization of UniformSampling class. Three inputs are required.

Parameters
  • data_input (NumPy Array, Pandas Dataframe or list) –

    The input data set or range to be sampled.

    • When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. The output variable (Y) is assumed to be supplied in the last column.

    • When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that no range contains no output variable information in this case.

  • list_of_samples_per_variable (list) – The list containing the number of subdivisions for each variable. Each dimension (variable) must be represented by a positive integer variable greater than 1.

  • sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.

Keyword Arguments

edges (bool) – Boolean variable representing bow the points should be selected. A value of True (default) indicates the points should be equally spaced edge to edge, otherwise they will be in the centres of the bins filling the unit cube

Returns

self function containing the input information

Raises
  • ValueError – The data_input is the wrong type

  • ValueError – When list_of_samples_per_variable is of the wrong length, is not a list or contains elements other than integers

  • Exception – When edges entry is not Boolean

sample_points()[source]

sample_points generates or selects full-factorial designs from an input dataset or data range.

Returns

A numpy array or Pandas dataframe containing the sample points generated or selected by full-factorial sampling.

Return type

NumPy Array or Pandas Dataframe

References

[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf