Normalise dataframe for a Ped object
Usage
norm_ped(
ped_df,
na_strings = c("NA", ""),
missid = NA_character_,
try_num = FALSE,
cols_used_del = FALSE,
date_pattern = "%Y-%m-%d"
)
Arguments
- ped_df
A data.frame with the individuals informations. The minimum columns required are:
id
individual identifiersdadid
biological fathers identifiersmomid
biological mothers identifierssex
of the individual
The
famid
column, if provided, will be merged to the ids field separated by an underscore using theupd_famid()
function.The following columns are also recognize and will be transformed with the
vect_to_binary()
function:deceased
status -> is the individual deadavail
status -> is the individual availableevaluated
status -> has the individual a documented evaluationconsultand
status -> is the individual the consultandproband
status -> is the individual the probandcarrier
status -> is the individual a carrierasymptomatic
status -> is the individual asymptomaticadopted
status -> is the individual adopted
The values recognized for those columns are
1
or0
,TRUE
orFALSE
.The
fertility
column will be transformed to a factor using thefertility_to_factor()
function.infertile_choice_na
,infertile
,fertile
The
miscarriage
column will be transformed to a using themiscarriage_to_factor()
function.SAB
,TOP
,ECT
,FALSE
The
dateofbirth
anddateofdeath
columns will be transformed to a date object using thechar_to_date()
function.- na_strings
Vector of strings to be considered as NA values.
- missid
A character vector with the missing values identifiers. All the id, dadid and momid corresponding to those values will be set to
NA_character_
.- try_num
Boolean defining if the function should try to convert all the columns to numeric.
- cols_used_del
Boolean defining if the columns that will be used should be deleted.
- date_pattern
The pattern of the date
Value
A dataframe with different variable correctly standardized
and with the errors identified in the error
column
Details
Normalise a dataframe and check for columns correspondance to be able to use it as an input to create a Ped object. Multiple test are done and errors are checked.
Will be considered available any individual with no 'NA' values in the
available
column.
Duplicated id
will nullify the relationship of the individual.
All individuals with errors will be remove from the dataframe and will
be transfered to the error dataframe.
A number of checks are done to ensure the dataframe is correct:
Examples
df <- data.frame(
id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
dadid = c("A", 0, 1, 3, 0, 4, 1, 0, 6, 6),
momid = c(0, 0, 2, 2, 0, 5, 2, 0, 8, 8),
famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
sex = c(1, 2, "m", "man", "f", "male", "m", 3, NA, "f"),
fertility = c(
"TRUE", "FALSE", TRUE, FALSE, 1,
0, "fertile", "infertile", 1, "TRUE"
),
miscarriage = c("TOB", "SAB", NA, FALSE, "ECT", "other", 1, 0, 1, 0),
deceased = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0),
avail = c("A", "1", 0, NA, 1, 0, 1, 0, 1, 0),
evalutated = c(
"TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"
),
consultand = c(
"TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"
),
proband = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"),
carrier = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"),
asymptomatic = c(
"TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"
),
adopted = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, NA, "NA", "other", "0"),
dateofbirth = c(
"1978-01-01", "1980-01-01", "1982-01-01", "1984-01-01",
"1986-01-01", "1988-01-01", "1990-01-01", "1992-01-01",
"1994-01-01", "1996-01-01"
),
dateofdeath = c(
"2000-01-01", "2002-01-01", "2004-01-01", NA, "date-not-recognize",
"NA", "", NA, "2008/01/01", NA
)
)
tryCatch(
norm_ped(df),
error = function(e) print(e)
)
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> id dadid momid famid sex fertility miscarriage deceased avail
#> 1 1_1 1_A 1_0 1 male fertile FALSE TRUE NA
#> 2 1_2 1_0 1_0 1 female infertile SAB FALSE TRUE
#> 3 1_3 1_1 1_2 1 male fertile FALSE TRUE FALSE
#> 4 1_4 1_3 1_2 1 male infertile FALSE FALSE NA
#> 5 1_5 1_0 1_0 1 female fertile ECT TRUE TRUE
#> 6 1_6 1_4 1_5 1 male infertile FALSE FALSE FALSE
#> 7 1_7 1_1 1_2 1 male fertile FALSE TRUE TRUE
#> 8 2_8 2_0 2_0 2 female infertile FALSE FALSE FALSE
#> 9 2_9 2_6 2_8 2 unknown fertile FALSE TRUE TRUE
#> 10 2_10 2_6 2_8 2 female fertile FALSE FALSE FALSE
#> evalutated consultand proband carrier asymptomatic adopted dateofbirth
#> 1 TRUE TRUE TRUE TRUE TRUE TRUE 1978-01-01
#> 2 FALSE FALSE FALSE FALSE FALSE FALSE 1980-01-01
#> 3 TRUE TRUE TRUE TRUE TRUE TRUE 1982-01-01
#> 4 FALSE FALSE FALSE FALSE FALSE FALSE 1984-01-01
#> 5 1 TRUE TRUE TRUE TRUE TRUE 1986-01-01
#> 6 0 FALSE FALSE FALSE FALSE FALSE 1988-01-01
#> 7 <NA> FALSE FALSE NA NA FALSE 1990-01-01
#> 8 <NA> FALSE FALSE NA NA FALSE 1992-01-01
#> 9 other FALSE FALSE NA NA FALSE 1994-01-01
#> 10 0 FALSE FALSE FALSE FALSE FALSE 1996-01-01
#> dateofdeath
#> 1 2000-01-01
#> 2 2002-01-01
#> 3 2004-01-01
#> 4 <NA>
#> 5 <NA>
#> 6 <NA>
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> 10 <NA>
#> error
#> 1 <NA>
#> 2 is-infertile-but-is-parent_is-aborted-but-is-parent_is-aborted-but-has-fertility
#> 3 <NA>
#> 4 is-infertile-but-is-parent
#> 5 is-aborted-but-is-parent
#> 6 <NA>
#> 7 <NA>
#> 8 is-infertile-but-is-parent
#> 9 <NA>
#> 10 <NA>
#> evaluated
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE