Column Profile Compilers¶
-
class
dataprofiler.profilers.column_profile_compilers.BaseCompiler(df_series=None, options=None, pool=None)¶ Bases:
object-
abstract property
profile¶
-
diff(other, options=None)¶ Finds the difference between 2 compilers and returns the report
- Parameters
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
-
update_profile(df_series, pool=None)¶ Updates the profiles from the data frames
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
-
abstract property
-
class
dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler(df_series=None, options=None, pool=None)¶ Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler-
property
profile¶
-
property
selected_data_type¶ Finds the selected data_type in a primitive compiler
- Returns
name of the selected data type
- Return type
str
-
diff(other, options=None)¶ Finds the difference between 2 compilers and returns the report
- Parameters
other (ColumnPrimitiveTypeProfileCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
-
update_profile(df_series, pool=None)¶ Updates the profiles from the data frames
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
-
property
-
class
dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler(df_series=None, options=None, pool=None)¶ Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler-
property
profile¶
-
diff(other, options=None)¶ Finds the difference between 2 compilers and returns the report
- Parameters
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
-
update_profile(df_series, pool=None)¶ Updates the profiles from the data frames
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
-
property
-
class
dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler(df_series=None, options=None, pool=None)¶ Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler-
property
profile¶
-
diff(other, options=None)¶ Finds the difference between 2 compilers and returns the report
- Parameters
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
-
update_profile(df_series, pool=None)¶ Updates the profiles from the data frames
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
-
property
-
class
dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler(df_series=None, options=None, pool=None)¶ Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler-
property
profile¶
-
diff(other, options=None)¶ Finds the difference between 2 compilers and returns the report
- Parameters
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
-
update_profile(df_series, pool=None)¶ Updates the profiles from the data frames
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
-
property