More Information about PySMO’s Sampling Methods¶
The sampling methods are able to generate samples based from variable bounds or select samples from a userprovided dataset. To use any of the method, the class is first initialized with the required parameters, and then the sample_points
method is called.
Note
The results of the sampling process will be a Numpy array or Pandas dataframe, depending on the format of the input data.
Examples¶
Sample generation from scratch¶
The following code snippet shows basic usage of the package for generating samples from a set of bounds:
# Required imports
>>> from idaes.core.surrogate.pysmo import sampling as sp
# Declaration of lower and upper bounds of 3D space to be sampled
>>> bounds_list = [[0, 0, 0], [1.2, 0.1, 1]]
# Initialize the Halton sampling method and generate 10 samples
>>> space_init = sp.HaltonSampling(bounds_list, sampling_type='creation', number_of_samples=10)
>>> samples = space_init.sample_points()
Sample selection from a dataset¶
The following code snippet shows basic usage of the package for selecting sample points from an existing dataset:
# Required imports
>>> from idaes.core.surrogate.pysmo import sampling as sp
>>> import pandas as pd
# Load dataset from a csv file
>>> xy_data = pd.read_csv('data.csv', header=None, index_col=0)
# Initialize the CVT sampling method and generate 25 samples
>>> space_init = sp.CVTSampling(xy_data, sampling_type='selection', number_of_samples=25)
>>> samples = space_init.sample_points()
Note
In the above case, it is assumed that there is only one output variable AND it is in the last column of the dataset. When there are multiple output columns, the xlabels and ylabels options must be specified (see below example).
The following code snippet shows basic usage of the package for subsampling a dataset with multiple output variables:
# Required imports
>>> from idaes.core.surrogate.pysmo import sampling as sp
>>> import pandas as pd
# Load dataset from a csv file
>>> xy_data = pd.read_csv('data.csv', header=None, index_col=0)
# Initialize the CVT sampling method and generate 25 samples
>>> space_init = sp.CVTSampling(xy_data, sampling_type='selection', number_of_samples=25, xlabels=['x1', 'x2'], ylabels=['y1', 'y2'])
>>> samples = space_init.sample_points()
Warning
The user must take care to be explicit when dealing with the subsampling of multioutput data:
When both
xlabels
andylabels
are specified, any columns not present in them will be dropped from the dataset.If only one of
xlabels
orylabels
is specified, all the columns not present in the provided list are automatically assigned to the unspecified option (ylabels
orxlabels
).
Note
If the input data to be subsampled is in a numpy array, xlabels
and ylabels
should be a list of the column numbers.
Characteristics of sampling methods available in PySMO¶
Deterministic 
Stochastic 
Lowdiscrepancy 
Spacefilling 
Geometric 


LHS 
\(\checkmark\) 
\(\checkmark\) 

Fullfactorial 
\(\checkmark\) 
\(\checkmark\) 

Halton 
\(\checkmark\) 
\(\checkmark\) 

Hammersley 
\(\checkmark\) 
\(\checkmark\) 

CVT 
\(\checkmark\) 
\(\checkmark\) 
\(\checkmark\) 