Normalise dataframe for a Ped object
Usage
norm_ped(
ped_df,
na_strings = c("NA", ""),
missid = NA_character_,
try_num = FALSE,
cols_used_del = FALSE
)
Arguments
- ped_df
A data.frame with the individuals informations. The minimum columns required are:
indID
individual identifiers ->id
fatherId
biological fathers identifiers ->dadid
motherId
biological mothers identifiers ->momdid
gender
sex of the individual ->sex
family
family identifiers ->famid
The
family
column, if provided, will be merged to the ids field separated by an underscore using theupd_famid()
function.The following columns are also recognize and will be transformed with the
vect_to_binary()
function:sterilisation
status ->steril
available
status ->avail
vitalStatus
, is the individual dead ->status
affection
status ->affected
The values recognized for those columns are
1
or0
,TRUE
orFALSE
.- na_strings
Vector of strings to be considered as NA values.
- missid
A character vector with the missing values identifiers. All the id, dadid and momid corresponding to those values will be set to
NA_character_
.- try_num
Boolean defining if the function should try to convert all the columns to numeric.
- cols_used_del
Boolean defining if the columns that will be used should be deleted.
Value
A dataframe with different variable correctly standardized
and with the errors identified in the error
column
Details
Normalise a dataframe and check for columns correspondance
to be able to use it as an input to create a Ped object.
Multiple test are done and errors are checked.
Sex is calculated based on the gender
column.
The steril
column need to be a boolean either TRUE, FALSE or 'NA'.
Will be considered available any individual with no 'NA' values in the
available
column.
Duplicated indId
will nullify the relationship of the individual.
All individuals with errors will be remove from the dataframe and will
be transfered to the error dataframe.
A number of checks are done to ensure the dataframe is correct:
Examples
df <- data.frame(
indId = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
fatherId = c("A", 0, 1, 3, 0, 4, 1, 0, 6, 6),
motherId = c(0, 0, 2, 2, 0, 5, 2, 0, 8, 8),
gender = c(1, 2, "m", "man", "f", "male", "m", "m", "f", "f"),
available = c("A", "1", 0, NA, 1, 0, 1, 0, 1, 0),
famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
sterilisation = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, "TRUE"),
vitalStatus = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0),
affection = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0)
)
tryCatch(
norm_ped(df),
error = function(e) print(e)
)
#> <simpleError in check_columns(ped_df, cols_need, cols_used, cols_to_use, others_cols = TRUE, cols_to_use_init = TRUE, cols_used_init = TRUE, cols_used_del = cols_used_del): Columns : famid are used by the script and would be overwritten.
#> >