Profiler Options¶
coding=utf-8 Specify the options when running the data profiler.
-
class
dataprofiler.profilers.profiler_options.BaseOption¶ Bases:
object-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
property
-
class
dataprofiler.profilers.profiler_options.BooleanOption(is_enabled=True)¶ Bases:
dataprofiler.profilers.profiler_options.BaseOptionBoolean option
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.HistogramOption(is_enabled=True, bin_count_or_method='auto')¶ Bases:
dataprofiler.profilers.profiler_options.BooleanOptionOptions for histograms
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
bin_count_or_method (Union[str, int, list(str)]) – bin count or the method with which to calculate histograms
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.BaseInspectorOptions(is_enabled=True)¶ Bases:
dataprofiler.profilers.profiler_options.BooleanOptionBase options for all the columns.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.NumericalOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptionsOptions for the Numerical Stats Mixin
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
property
properties¶ Includes at least: is_enabled: Turns on or off the column.
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.IntOptions¶ Bases:
dataprofiler.profilers.profiler_options.NumericalOptionsOptions for the Int Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Includes at least: is_enabled: Turns on or off the column.
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.PrecisionOptions(is_enabled=True, sample_ratio=None)¶ Bases:
dataprofiler.profilers.profiler_options.BooleanOptionOptions for precision
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
sample_ratio (float) – float option to determine ratio of valid float samples in determining percision. This ratio will override any defaults.
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.FloatOptions¶ Bases:
dataprofiler.profilers.profiler_options.NumericalOptionsOptions for the Float Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Includes at least: is_enabled: Turns on or off the column.
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.TextOptions¶ Bases:
dataprofiler.profilers.profiler_options.NumericalOptionsOptions for the Text Column:
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
vocab (BooleanOption) – boolean option to enable/disable vocab
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Includes at least: is_enabled: Turns on or off the column.
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.DateTimeOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptionsOptions for the Datetime Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.OrderOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptionsOptions for the Order Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.CategoricalOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptionsOptions for the Categorical Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.DataLabelerOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptionsOptions for the Data Labeler Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
data_labeler_dirpath (str) – String to load data labeler from
max_sample_size (BaseDataLabeler) – Int to decide sample size
data_labeler_object – DataLabeler object used in profiler
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.TextProfilerOptions(is_enabled=True, is_case_sensitive=True, stop_words=None)¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptionsConstructs the TextProfilerOption object with default values.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
is_case_sensitive (bool) – option set for case sensitivity.
stop_words (Union[None, list(str)]) – option set for stop words.
words (BooleanOption) – option set for word update.
vocab (BooleanOption) – option set for vocab update.
-
is_prop_enabled(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.StructuredOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseOptionConstructs the StructuredOptions object with default values.
- Variables
int (IntOptions) – option set for int profiling.
float (FloatOptions) – option set for float profiling.
datetime (DateTimeOptions) – option set for datetime profiling.
text (TextOptions) – option set for text profiling.
order (OrderOptions) – option set for order profiling.
category (CategoricalOptions) – option set for category profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
-
property
enabled_profiles¶ Returns a list of the enabled profilers for columns.
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.UnstructuredOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseOptionConstructs the UnstructuredOptions object with default values.
- Variables
text (TextProfilerOptions) – option set for text profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
-
property
enabled_profiles¶ Returns a list of the enabled profilers.
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.ProfilerOptions¶ Bases:
dataprofiler.profilers.profiler_options.BaseOptionInitializes the ProfilerOptions object.
- Variables
structured_options (StructuredOptions) – option set for structured dataset profiling.
unstructured_options (UnstructuredOptions) – option set for unstructured dataset profiling.
-
property
properties¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set(options)¶ Overwrites BaseOption.set since the type (unstructured/structured) may need to be specified if the same options exist within both self.structured_options and self.unstructured_options
- Parameters
options (dict) – Dictionary of options to set
- Return
None
-
validate(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)