CSV Data¶
-
class
dataprofiler.data_readers.csv_data.CSVData(input_file_path=None, data=None, options=None)¶ Bases:
dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin,dataprofiler.data_readers.base_data.BaseDataSpreadsheetData class to save and load spreadsheet data
Data class for loading datasets of type CSV. Can be specified by passing in memory data or via a file path. Options pertaining the CSV may also be specified using the options dict parameter. Possible Options:
options = dict( delimiter= type: str data_format= type: str, choices: "dataframe", "records" selected_columns= type: list(str) header= type: any )
delimiter: delimiter used to decipher the csv input file data_format: user selected format in which to return data can only be of specified types selected_columns: columns being selected from the entire dataset header: any information pertaining to the file header. :param input_file_path: path to the file being loaded or None :type input_file_path: str :param data: data being loaded into the class instead of an input file :type data: multiple types :param options: options pertaining to the data type :type options: dict :return: None
-
data_type= 'csv'¶
-
property
selected_columns¶
-
property
delimiter¶
-
property
quotechar¶
-
property
header¶
-
property
data¶
-
property
data_format¶
-
property
file_encoding¶
-
get_batch_generator(batch_size)¶
-
info= None¶
-
property
length¶ Returns the length of the dataset which is loaded.
- Returns
length of the dataset
-
classmethod
is_match(file_path, options=None)¶ Test the first 1000 lines of a given file to check if the file has valid delimited format or not.
- Parameters
file_path (str) – path to the file to be examined
options (dict) – delimiter read options dict(delimiter=”,”)
- Returns
is file a csv file or not
- Return type
bool
-
reload(input_file_path=None, data=None, options=None)¶ Reload the data class with a new dataset. This erases all existing data/options and replaces it with the input data/options.
- Parameters
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns
None
-