Halton Sampling¶
Halton sampling is a low-discrepancy sampling method. It is a deterministic sampling method based on the Halton sequence, a sequence constructed by a set of co-prime bases. The Halton sequence is an n-dimensional extension of the Van der Corput sequence; each individual Halton sequence is based on a radix inverse function defined on a prime number.
The pysmo.sampling.HaltonSampling
method carries out Halton sampling. This can be done in two modes:
- The samples can be selected from a user-provided dataset, or
- The samples can be generated from a set of provided bounds.
The Halton sampling method is only available for low-dimensional problems \(n \leq 10\). At higher dimensions, the performance of the sampling method has been shown to degrade.
Available Methods¶
-
class
idaes.surrogate.pysmo.sampling.
HaltonSampling
(data_input, number_of_samples=None, sampling_type=None)[source]¶ A class that performs Halton Sampling.
Halton samples are based on the reversing/flipping the base conversion of numbers using primes.
To generate n samples in a \(p\)-dimensional space, the first \(p\) prime numbers are used to generate the samples.
Note
Use of this method is limited to use in low-dimensionality problems (less than 10 variables). At higher dimensions, the performance of the sampling method has been shown to degrade.
To use: call class with inputs, and then
sample_points
function.Example:
# For the first 10 Halton samples in a 2-D space: >>> b = rbf.HaltonSampling(data, 10, sampling_type="selection") >>> samples = b.sample_points()
-
__init__
(data_input, number_of_samples=None, sampling_type=None)[source]¶ Initialization of HaltonSampling class. Two inputs are required.
Parameters: - data_input (NumPy Array, Pandas Dataframe or list) –
The input data set or range to be sampled.
- When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. The output variable (Y) is assumed to be supplied in the last column.
- When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that no range contains no output variable information in this case.
- number_of_samples (int) – The number of samples to be generated. Should be a positive integer less than or equal to the number of entries (rows) in data_input.
- sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.
Returns: self function containing the input information.
Raises: ValueError
– The data_input is the wrong type.Exception
– When the number_of_samples is invalid (not an integer, too large, zero or negative.)
- data_input (NumPy Array, Pandas Dataframe or list) –
-
sample_points
()[source]¶ The
sample_points
method generates the Halton samples. The steps followed here are:- Determine the number of features in the input data.
- Generate the list of primes to be considered by calling
prime_number_generator
from the sampling superclass. - Create the first number_of_samples elements of the Halton sequence for each prime.
- Create the Halton samples by combining the corresponding elements of the Halton sequences for each prime.
- When in “selection” mode, determine the closest corresponding point in the input dataset using Euclidean distance minimization. This is done by calling the
nearest_neighbours
method in the sampling superclass.
Returns: A numpy array or Pandas dataframe containing number_of_samples Halton sample points. Return type: NumPy Array or Pandas Dataframe
-
References¶
[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf
[2] Webpage on low discrepancy sampling methods: http://planning.cs.uiuc.edu/node210.html