PhD projects
During my thesis I’ve developped different tools that are available publically.
Pedixplorer Bioconductor package
The latest release is available in the v3.20 release of Bioconductor.
Routines to handle family data with a Pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. Also includes a tool for Pedigree drawing which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the Pedigree object with various criteria, and kinship for the X chromosome.
A dedicated website is available at louislenezet.github.io/Pedixplorer
Phaseimpute nf-core pipeline
nf-core/phaseimpute is a bioinformatics pipeline to phase and impute genetic data. The pipeline is constituted of the five following steps:
- Check chromosomes names: Validates the presence of the different contigs in all variants and alignment files, ensuring data compatibility for further processing
- Panel preparation: Perfoms the phasing, QC, variant filtering, variant annotation of the reference panel
- Imputation: Imputes genotypes in the target dataset using the reference panel
- Simulate: Generates simulated datasets from high-quality target data for testing and validation purposes.
- Concordance: Evaluates the accuracy of imputation by comparing the imputed data against a truth dataset.
You can launch the pipeline using:
nextflow run nf-core/phaseimpute \
-profile <docker/singularity/.../institute> \
--input <samplesheet.csv> \
--genome "GRCh38" \
--panel <phased_reference_panel.csv> \
--steps "panelprep,impute" \
--tools "glimpse1" \
--outdir <OUTDIR>
BioShinyModules
I’ve participate to the St-Judes BioHackathon in 2023 and worked on a R package proposing shiny modules often used in biology shiny applications (e.g. import / export data, common graphics, …). The aim is to have a common layout for all the modules to make them easily usable and operable. The package isn’t yet published but we hope to make it available through Bioconductor as a proof of concept to pave the way for nice and reusable shiny components.
Files to Database
My PhD project is based on an already existing project of the IGDR Dog Genetics Team. Therefore multiple people have already worked on this project and numerous files (i.e. xlsx, csv, …) have been created along the years to record the data. However no aggregation was already done and the data needed to be normalized to be able to analyse it.
As the amount of data and normalization to be processed was important I decided to create a python script allowing to automatate the process. This script is now available as a python package but needs to be modified to be generalisable to any project.
This will be done in the near future, I hope !