Latin Hypercube Sampling (LHS)¶
LHS is a stratified random sampling method originally developed for efficient uncertainty assessment. LHS partitions the parameter space into bins of equal probability with the goal of attaining a more even distribution of sample points in the parameter space that would be possible with pure random sampling.
The pysmo.sampling.LatinHypercubeSampling
method carries out Latin Hypercube sampling. This can be done in two modes:
 The samples can be selected from a userprovided dataset, or
 The samples can be generated from a set of provided bounds.
Available Methods¶

class
idaes.surrogate.pysmo.sampling.
LatinHypercubeSampling
(data_input, number_of_samples=None, sampling_type=None)[source]¶ A class that performs Latin Hypercube Sampling. The function returns LHS samples which have been selected randomly after sample space stratification.
It should be noted that no minimax criterion has been used in this implementation, so the LHS samples selected will not have spacefilling properties.
To use: call class with inputs, and then run
sample_points
method.Example:
# To select 10 LHS samples from "data" >>> b = rbf.LatinHypercubeSampling(data, 10, sampling_type="selection") >>> samples = b.sample_points()

__init__
(data_input, number_of_samples=None, sampling_type=None)[source]¶ Initialization of LatinHypercubeSampling class. Two inputs are required.
Parameters:  data_input (NumPy Array, Pandas Dataframe or list) –
The input data set or range to be sampled.
 When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. The output variable (y) is assumed to be supplied in the last column.
 When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that no range contains no output variable information in this case.
 number_of_samples (int) – The number of samples to be generated. Should be a positive integer less than or equal to the number of entries (rows) in data_input.
 sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.
Returns: self function containing the input information
Raises: ValueError
– The input data (data_input) is the wrong type.Exception
– When number_of_samples is invalid (not an integer, too large, zero, or negative)
 data_input (NumPy Array, Pandas Dataframe or list) –

sample_points
()[source]¶ sample_points
generates or selects Latin Hypercube samples from an input dataset or data range. When called, it: generates samples points from stratified regions by calling the
lhs_points_generation
,  generates potential sample points by random shuffling, and
 when a dataset is provided, selects the closest available samples to the theoretical sample points from within the input data.
Returns: A numpy array or Pandas dataframe containing number_of_samples points selected or generated by LHS. Return type: NumPy Array or Pandas Dataframe  generates samples points from stratified regions by calling the

References¶
[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf
[2] Webpage on low discrepancy sampling methods: http://planning.cs.uiuc.edu/node210.html
[3] Swiler, Laura and Slepoy, Raisa and Giunta, Anthony: “Evaluation of sampling methods in constructing response surface approximations” https://arc.aiaa.org/doi/abs/10.2514/6.20061827