Skip to content

USask-BINFO/EcoPopDL-GP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 

Repository files navigation

EcoPopDL-DP Framework

Overview

EcoPopDL-DP is a comprehensive framework for environmental-aware and population-informed genomic prediction using deep learning and ChromoMap. It is designed to address challenges in predicting complex traits influenced by genotype-by-environment and genotype-by-location interactions in resource-limited breeding populations. The framework integrates genomic, population structure, and environmental data to improve predictive accuracy for complex traits such as yield, flowering time, and seed weight. Pipeline

Features

  • ChromoMap: A visual-spatial representation of SNP-level genomic variation, chromosome structure, and positional relationships.
  • Deep Learning Integration: Utilizes convolutional neural networks (CNNs) for feature extraction and trait prediction.
  • Linear Mixed Model (LMM): Captures both fixed and random effects to refine predictions and improve interpretability.
  • Hybrid Framework: Combines CNN-derived features with LMM for improved genomic prediction performance.
  • Data Augmentation and Transfer Learning: Enhances model generalizability and robustness with advanced techniques.

Workflow

  1. Preliminary Data Processing:
    • Input Data: Genotypic data, phenotypic data, and environmental variables.
    • Genotypic Data Processing: Minor allele frequency (MAF) filtering and linkage disequilibrium (LD) pruning.
    • Phenotypic Data Processing: Normalization and outlier removal.
  2. Genetic Ancestry Analysis:
    • Incorporates unsupervised and supervised admixture analysis to derive genetic ancestry profiles.
    • Integrates population clusters into predictive models.
  3. Prediction Model Design:
    • ChromoMap Generation: Encodes SNPs into a color-coded image representing the genome.
    • CNN Architecture: Leverages EfficientNet-B0 for trait prediction.
    • Feature Engineering: Extracts and integrates genomic and metadata features.
    • Linear Mixed Model: Adds environmental covariates and population structure.
  4. Benchmarking:
    • Compares against baseline models such as GBLUP, RRBLUP, Bayesian Ridge Regression, Lasso Regression, and SVM.

Installation

To install and run the EcoPopDL-DP framework:

  1. Clone the repository:
    git clone https://git.cs.usask.ca/qnm481/ecopopgp.git
    cd ecopopgp
  2. Prepare input data:
    • Ensure genotypic, phenotypic, and environmental data are in the appropriate formats as outlined in the scripts.

Contact us

For any questions or inquiries, please feel free to open an issue on our repository or contact us at [email protected].

Authors

  • Thulani Hewavithana
  • Sophie Duchesne
  • Bunyamin Tar’an
  • Ian Stavness
  • Steve Shirtliffe
  • Kirstin Bett
  • Isobel A. P. Parkin
  • Lingling Jin

License

This project is licensed under the MIT License.

Contributions

We welcome contributions! To contribute:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature-branch
  3. Commit changes:
    git commit -m 'Add new feature'
  4. Push to the branch:
    git push origin feature-branch
  5. Create a pull request.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published