Halton Sampling¶
Halton sampling is a lowdiscrepancy sampling method. It is a deterministic sampling method based on the Halton sequence, a sequence constructed by a set of coprime bases. The Halton sequence is an ndimensional extension of the Van der Corput sequence; each individual Halton sequence is based on a radix inverse function defined on a prime number.
The pysmo.sampling.HaltonSampling
method carries out Halton sampling. This can be done in two modes:
 The samples can be selected from a userprovided dataset, or
 The samples can be generated from a set of provided bounds.
The Halton sampling method is only available for lowdimensional problems \(n \leq 10\). At higher dimensions, the performance of the sampling method has been shown to degrade.
Available Methods¶

class
idaes.surrogate.pysmo.sampling.
HaltonSampling
(data_input, number_of_samples=None, sampling_type=None)[source]¶ A class that performs Halton Sampling.
Halton samples are based on the reversing/flipping the base conversion of numbers using primes.
To generate n samples in a \(p\)dimensional space, the first \(p\) prime numbers are used to generate the samples.
Note
Use of this method is limited to use in lowdimensionality problems (less than 10 variables). At higher dimensions, the performance of the sampling method has been shown to degrade.
To use: call class with inputs, and then
sample_points
function.Example:
# For the first 10 Halton samples in a 2D space: >>> b = rbf.HaltonSampling(data, 10, sampling_type="selection") >>> samples = b.sample_points()

__init__
(data_input, number_of_samples=None, sampling_type=None)[source]¶ Initialization of HaltonSampling class. Two inputs are required.
Parameters:  data_input (NumPy Array, Pandas Dataframe or list) –
The input data set or range to be sampled.
 When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. The output variable (Y) is assumed to be supplied in the last column.
 When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that no range contains no output variable information in this case.
 number_of_samples (int) – The number of samples to be generated. Should be a positive integer less than or equal to the number of entries (rows) in data_input.
 sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.
Returns: self function containing the input information.
Raises: ValueError
– The data_input is the wrong type.Exception
– When the number_of_samples is invalid (not an integer, too large, zero or negative.)
 data_input (NumPy Array, Pandas Dataframe or list) –

sample_points
()[source]¶ The
sample_points
method generates the Halton samples. The steps followed here are: Determine the number of features in the input data.
 Generate the list of primes to be considered by calling
prime_number_generator
from the sampling superclass.  Create the first number_of_samples elements of the Halton sequence for each prime.
 Create the Halton samples by combining the corresponding elements of the Halton sequences for each prime.
 When in “selection” mode, determine the closest corresponding point in the input dataset using Euclidean distance minimization. This is done by calling the
nearest_neighbours
method in the sampling superclass.
Returns: A numpy array or Pandas dataframe containing number_of_samples Halton sample points. Return type: NumPy Array or Pandas Dataframe

References¶
[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf
[2] Webpage on low discrepancy sampling methods: http://planning.cs.uiuc.edu/node210.html