Hammersley Sampling

Hammersley Sampling#

Hammersley sampling is a low-discrepancy sampling method based on the Hammersley sequence. The Hammersley sequence is the same as the Halton sequence except in the first dimension where points are located equidistant from each other.

The pysmo.sampling.HammersleySampling method carries out Hammersley sampling. This can be done in two modes:

  • The samples can be selected from a user-provided dataset, or

  • The samples can be generated from a set of provided bounds.

The Hammersley sampling method is only available for low-dimensional problems \(n \leq 10\). At higher dimensions, the performance of the sampling method has been shown to degrade.

Available Methods#

class idaes.core.surrogate.pysmo.sampling.HammersleySampling(data_input, number_of_samples=None, sampling_type=None, xlabels=None, ylabels=None)[source]#

A class that performs Hammersley Sampling.

Hammersley samples are generated in a similar way to Halton samples - based on the reversing/flipping the base conversion of numbers using primes.

To generate \(n\) samples in a \(p\)-dimensional space, the first \(\left(p-1\right)\) prime numbers are used to generate the samples. The first dimension is obtained by uniformly dividing the region into no_samples points.

Note

Use of this method is limited to use in low-dimensionality problems (less than 10 variables). At higher dimensionalities, the performance of the sampling method has been shown to degrade.

To use: call class with inputs, and then sample_points function.

Example:

# For the first 10 Hammersley samples in a 2-D space:
>>> b = rbf.HammersleySampling(data, 10, sampling_type="selection")
>>> samples = b.sample_points()
__init__(data_input, number_of_samples=None, sampling_type=None, xlabels=None, ylabels=None)[source]#

Initialization of HammersleySampling class. Two inputs are required.

Parameters:
  • data_input (NumPy Array, Pandas Dataframe or list) –

    The input data set or range to be sampled.

    • When the aim is to select a set of samples from an existing dataset, the dataset must be a NumPy Array or a Pandas Dataframe and sampling_type option must be set to “selection”. A single output variable (y) is assumed to be supplied in the last column if xlabels and ylabels are not supplied.

    • When the aim is to generate a set of samples from a data range, the dataset must be a list containing two lists of equal lengths which contain the variable bounds and sampling_type option must be set to “creation”. It is assumed that the range contains no output variable information in this case.

  • number_of_samples (int) – The number of samples to be generated. Should be a positive integer less than or equal to the number of entries (rows) in data_input.

  • sampling_type (str) – Option which determines whether the algorithm selects samples from an existing dataset (“selection”) or attempts to generate sample from a supplied range (“creation”). Default is “creation”.

Keyword Arguments:
  • xlabels (list) – List of column names (if data_input is a dataframe) or column numbers (if data_input is an array) for the independent/input variables. Only used in “selection” mode. Default is None.

  • ylabels (list) – List of column names (if data_input is a dataframe) or column numbers (if data_input is an array) for the dependent/output variables. Only used in “selection” mode. Default is None.

  • Returnsself function containing the input information.

  • Raises

    ValueError: The input data (data_input) is the wrong type/dimension, or number_of_samples is invalid (too large, zero, or negative)

    TypeError: When number_of_samples is not the right type, or sampling_type entry is not a string.

    IndexError: When invalid column names are supplied in xlabels or ylabels

sample_points()[source]#

The sampling_type method generates the Hammersley sample points. The steps followed here are:

  1. Determine the number of features \(n_{f}\) in the input data.

  2. Generate the list of \(\left(n_{f}-1\right)\) primes to be considered by calling prime_number_generator.

  3. Divide the space [0,**number_of_samples**-1] into number_of_samples places to obtain the first dimension for the Hammersley sequence.

  4. For the other \(\left(n_{f}-1\right)\) dimensions, create first number_of_samples elements of the Hammersley sequence for each of the \(\left(n_{f}-1\right)\) primes.

  5. Create the Hammersley samples by combining the corresponding elements of the Hammersley sequences created in steps 3 and 4

  6. When in “selection” mode, determine the closest corresponding point in the input dataset using Euclidean distance minimization. This is done by calling the nearest_neighbours method in the sampling superclass.

Returns:

A numpy array or Pandas dataframe containing number_of_samples Hammersley sample points.

Return type:

NumPy Array or Pandas Dataframe

References#

[1] Loeven et al paper titled “A Probabilistic Radial Basis Function Approach for Uncertainty Quantification” https://pdfs.semanticscholar.org/48a0/d3797e482e37f73e077893594e01e1c667a2.pdf

[2] Webpage on low discrepancy sampling methods: http://planning.cs.uiuc.edu/node210.html

[3] Holger Dammertz’s webpage titled “Hammersley Points on the Hemisphere” which discusses Hammersley point set generation in two dimensional spaces, http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html