Quickstart¶
Installation¶
files2db is available on conda-forge, so you can install it with the following command:
conda install -c conda-forge files2db
Table that you will need¶
One excel file with 3 sheets named: Files, FieldsRules, ValuesMap, or 3 different CSV files.
The columns expected for each of these files are the following, but you can find template for each of them in the assets folder.
Files table¶
This file contains the path to each file to aggregate to the database, with information on how to read the data. You therefore need the following columns:
FilePath: the absolute path to the fileSheetName: the sheet name in case of an Excel fileSeparator: character used to separate the columns (e.g.;,\t,), needed for plain text files (e.g..csv,.txt,.tsv) (Note:,cannot be used as it is used to split the modalities in the FieldsRules table)Header: the line number where the column names are writtenLineStart: the starting line of the dataLineEnd: the ending line of the dataColStart: the starting column of the dataColEnd: the ending column of the dataToAdd: a true/false value telling if this file should be aggregated or notAsCorrection: a true/false value telling if this file is a special file to be used for data correctionmeta_*: any column starting withmeta_will be added as an additional value for each row of the file.
| FilePath | SheetName | Separator | Header | LineStart | LineEnd | ColStart | ColEnd | ToAdd | AsCorrection | meta_DateFileCreation | meta_ColE |
|---|---|---|---|---|---|---|---|---|---|---|---|
| test_dataset/test1/fileA.xlsx | Feuil1 | 3 | 5 | 6 | C | E | 1 | 2021-12-01 | |||
| test_dataset/test1/fileB.csv | 1 | 2 | 3 | A | C | 1 | 2024.02.01 | MetaNewValue | |||
| test_dataset/test1/fileC.csv | ; | 1 | 2 | 4 | A | E | 1 | true | 2022.01.10 | MetaOldValue |
FieldsRules table¶
This file contains the different rules needed to normalize each variable in the aggregated data. More information on the different rules can be found in the documentation.
Field: name of the field to normalizeCategory: name of the field categorySep: character used to separate the modalities in the field (e.g.,,;,\t,)SepPattern: regular expression pattern to separate the modalities in the fieldKeepLink: a boolean value telling if the link to the original data should be kept after separating the modalities in the fieldDelMatch: a regular expression pattern to delete in the field (fully match)DelEnd: a regular expression pattern to delete at the end of the fieldDelIn: a regular expression pattern to delete in the field (partially match)DelStart: a regular expression pattern to delete at the start of the fieldStripFrom: a regular expression pattern to strip from the field (fully match)DataType: the data type to assign to the field (e.g.str,int,float,date)Contains: a regular expression pattern to check if the field contains a specific valueMin: the minimum value for the field (if applicable)Max: the maximum value for the field (if applicable)
| Field | Category | Sep | DelMatch | DelIn | DelStart | DelEnd | StripFrom | DataType | Contains | Min | Max | SepPattern | KeepLink |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ColA | ID | or | NA | int | int | 1 | 4 | ||||||
| ColC | Infos | _ | WRONG | prefix | suffix | other | string | (?P |
False | ||||
| DateFileCreation | Other | ???,00 | date | date |
ValuesMap table¶
This files contains modalities correspondance for each field
Field: name of the field to normalizeOriginalValues: modalities list to modify separated by a,NewValue: new value to assign to each modality listed inOriginalValue
| Field | OriginalValues | NewValue |
|---|---|---|
| ValueToMap | A,otherA | MyA |
| ValueToMap | B | MyB |
| ValueToMap2 | A,B,1 | AB1 |
Explanation:¶
- values
AandotherAin fieldValueToMapwill be modify asMyA - values
Bin fieldValueToMapwill be modify asMyB - values
A,Band1in fieldValueToMap2will be modify asAB1