Latin Hypercube Sampling (LHS)

LHS is a stratified random sampling method originally developed for efficient uncertainty assessment. LHS partitions the parameter space into bins of equal probability with the goal of attaining a more even distribution of sample points in the parameter space that would be possible with pure random sampling.

The pysmo.sampling.LatinHypercubeSampling method carries out Latin Hypercube sampling. This can be done in two modes:

  • The samples can be selected from a user-provided dataset, or

  • The samples can be generated from a set of provided bounds.

Available Methods

class idaes.surrogate.pysmo.sampling.LatinHypercubeSampling(data_input, number_of_samples=None, sampling_type=None)[source]

A class that performs Latin Hypercube Sampling. The function returns LHS samples which have been selected randomly after sample space stratification.

It should be noted that no minimax criterion has been used in this implementation, so the LHS samples selected will not have space-filling properties.

To use: call class with inputs, and then run sample_points method.

Example:

# To select 10 LHS samples from "data"
>>> b = rbf.LatinHypercubeSampling(data, 10, sampling_type="selection")
>>> samples = b.sample_points()
__init__(data_input, number_of_samples=None, sampling_type=None)[source]

Initialization of LatinHypercubeSampling class. Two inputs are required.

Parameters
  • data_input (NumPy Array, Pandas Dataframe or list) –

    The input data set or range to be sampled.

    • When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. The output variable (y) is assumed to be supplied in the last column.

    • When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that no range contains no output variable information in this case.

  • number_of_samples (int) – The number of samples to be generated. Should be a positive integer less than or equal to the number of entries (rows) in data_input.

  • sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.

Returns

self function containing the input information

Raises
  • ValueError – The input data (data_input) is the wrong type.

  • Exception – When number_of_samples is invalid (not an integer, too large, zero, or negative)

sample_points()[source]

sample_points generates or selects Latin Hypercube samples from an input dataset or data range. When called, it:

  1. generates samples points from stratified regions by calling the lhs_points_generation,

  2. generates potential sample points by random shuffling, and

  3. when a dataset is provided, selects the closest available samples to the theoretical sample points from within the input data.

Returns

A numpy array or Pandas dataframe containing number_of_samples points selected or generated by LHS.

Return type

NumPy Array or Pandas Dataframe

References

[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf

[2] Webpage on low discrepancy sampling methods: http://planning.cs.uiuc.edu/node210.html

[3] Swiler, Laura and Slepoy, Raisa and Giunta, Anthony: “Evaluation of sampling methods in constructing response surface approximations” https://arc.aiaa.org/doi/abs/10.2514/6.2006-1827