Normalise dataframe for a Ped object
Arguments
- ped_df
- A data.frame with the individuals informations. The minimum columns required are: - idindividual identifiers
- dadidbiological fathers identifiers
- momidbiological mothers identifiers
- sexof the individual
 - The - famidcolumn, if provided, will be merged to the ids field separated by an underscore using the- upd_famid()function.- The following columns are also recognize and will be transformed with the - vect_to_binary()function:- deceasedstatus -> is the individual dead
- availstatus -> is the individual available
- evaluatedstatus -> has the individual a documented evaluation
- consultandstatus -> is the individual the consultand
- probandstatus -> is the individual the proband
- carrierstatus -> is the individual a carrier
- asymptomaticstatus -> is the individual asymptomatic
- adoptedstatus -> is the individual adopted
 - The values recognized for those columns are - 1or- 0,- TRUEor- FALSE.- The - fertilitycolumn will be transformed to a factor using the- fertility_to_factor()function.- infertile_choice_na,- infertile,- fertile- The - miscarriagecolumn will be transformed to a using the- miscarriage_to_factor()function.- SAB,- TOP,- ECT,- FALSE- The - dateofbirthand- dateofdeathcolumns will be transformed to a date object using the- char_to_date()function.
- na_strings
- Vector of strings to be considered as NA values. 
- missid
- A character vector with the missing values identifiers. All the id, dadid and momid corresponding to those values will be set to - NA_character_.
- try_num
- Boolean defining if the function should try to convert all the columns to numeric. 
- cols_used_del
- Boolean defining if the columns that will be used should be deleted. 
- date_pattern
- The pattern of the date 
Value
A dataframe with different variable correctly standardized
and with the errors identified in the error column
Details
Normalise a dataframe and check for columns correspondance to be able to use it as an input to create a Ped object. Multiple test are done and errors are checked.
Will be considered available any individual with no 'NA' values in the
available column.
Duplicated id will nullify the relationship of the individual.
All individuals with errors will be remove from the dataframe and will
be transfered to the error dataframe.
A number of checks are done to ensure the dataframe is correct:
Examples
df <- data.frame(
    id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    dadid = c("A", 0, 1, 3, 0, 4, 1, 0, 6, 6),
    momid = c(0, 0, 2, 2, 0, 5, 2, 0, 8, 8),
    famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
    sex = c(1, 2, "m", "man", "f", "male", "m", 3, NA, "f"),
    fertility = c(
      "TRUE", "FALSE", TRUE, FALSE, 1,
      0, "fertile", "infertile", 1, "TRUE"
    ),
    miscarriage = c("TOB", "SAB", NA, FALSE, "ECT", "other", 1, 0, 1, 0),
    deceased = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0),
    avail = c("A", "1", 0, NA, 1, 0, 1, 0, 1, 0),
    evalutated = c(
        "TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"
    ),
    consultand = c(
        "TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"
    ),
    proband = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"),
    carrier = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"),
    asymptomatic = c(
        "TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"
    ),
    adopted = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"),
    dateofbirth = c(
         "1978-01-01", "1980-01-01", "1982-01-01", "1984-01-01",
         "1986-01-01", "1988-01-01", "1990-01-01", "1992-01-01",
         "1994-01-01", "1996-01-01"
    ),
    dateofdeath = c(
        "2000-01-01", "2002-01-01", "2004-01-01", NA, "date-not-recognize",
        "NA", "", NA, "2008/01/01", NA
    )
)
tryCatch(
    norm_ped(df),
    error = function(e) print(e)
)
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#>      id dadid momid famid     sex fertility miscarriage deceased avail
#> 1   1_1   1_A  <NA>     1    male   fertile       FALSE     TRUE    NA
#> 2   1_2  <NA>  <NA>     1  female infertile         SAB    FALSE  TRUE
#> 3   1_3   1_1   1_2     1    male   fertile       FALSE     TRUE FALSE
#> 4   1_4   1_3   1_2     1    male infertile       FALSE    FALSE    NA
#> 5   1_5  <NA>  <NA>     1  female   fertile         ECT     TRUE  TRUE
#> 6   1_6   1_4   1_5     1    male infertile       FALSE    FALSE FALSE
#> 7   1_7   1_1   1_2     1    male   fertile       FALSE     TRUE  TRUE
#> 8   2_8  <NA>  <NA>     2  female infertile       FALSE    FALSE FALSE
#> 9   2_9   2_6   2_8     2 unknown   fertile       FALSE     TRUE  TRUE
#> 10 2_10   2_6   2_8     2  female   fertile       FALSE    FALSE FALSE
#>    evalutated consultand proband carrier asymptomatic adopted dateofbirth
#> 1        TRUE       TRUE    TRUE    TRUE         TRUE    TRUE  1978-01-01
#> 2       FALSE      FALSE   FALSE   FALSE        FALSE   FALSE  1980-01-01
#> 3        TRUE       TRUE    TRUE    TRUE         TRUE    TRUE  1982-01-01
#> 4       FALSE      FALSE   FALSE   FALSE        FALSE   FALSE  1984-01-01
#> 5           1       TRUE    TRUE    TRUE         TRUE    TRUE  1986-01-01
#> 6           0      FALSE   FALSE   FALSE        FALSE   FALSE  1988-01-01
#> 7        <NA>      FALSE   FALSE      NA           NA   FALSE  1990-01-01
#> 8        <NA>      FALSE   FALSE      NA           NA   FALSE  1992-01-01
#> 9       other      FALSE   FALSE      NA           NA   FALSE  1994-01-01
#> 10          0      FALSE   FALSE   FALSE        FALSE   FALSE  1996-01-01
#>    dateofdeath
#> 1   2000-01-01
#> 2   2002-01-01
#> 3   2004-01-01
#> 4         <NA>
#> 5         <NA>
#> 6         <NA>
#> 7         <NA>
#> 8         <NA>
#> 9         <NA>
#> 10        <NA>
#>                                                                               error
#> 1                                                                one-parent-missing
#> 2  is-infertile-but-is-parent_is-aborted-but-is-parent_is-aborted-but-has-fertility
#> 3                                                                              <NA>
#> 4                                                        is-infertile-but-is-parent
#> 5                                                          is-aborted-but-is-parent
#> 6                                                                              <NA>
#> 7                                                                              <NA>
#> 8                                                        is-infertile-but-is-parent
#> 9                                                                              <NA>
#> 10                                                                             <NA>
#>    evaluated
#> 1      FALSE
#> 2      FALSE
#> 3      FALSE
#> 4      FALSE
#> 5      FALSE
#> 6      FALSE
#> 7      FALSE
#> 8      FALSE
#> 9      FALSE
#> 10     FALSE