Skip to content

Scripts used to calculated motifs kaks ratio from Pan-Pep

Notifications You must be signed in to change notification settings

jkfo002/motif_ParaAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

motif_ParaAT

A Python script to identify protein domains in one or more protein sequences (Pan Peptides/Pan-Pep). The script uses the HMMER software to scan query hmmer model against Pan-Pep. All peps related to the domains were extracted, and ParaAT.pl were applied to calculated the kaks ratio. The scripts allows you to identify the domain architecture of the Pan Pep and thus can provide insight into domain conservation.

Ideas from https://github.com/HongboDoll/PotatoPanNLRome

Requirements

  • Python >= 3.10
  • pandas
  • hmmer
  • seqkit
  • ParaAT 2.0
  • multiple sequence alignment like clustalw2 | t_coffee | mafft | muscle (muscle5.1 may raise error from ParaAT.pl)

Usage

usage: motif_ParaAT.py [-h] -i INPUT -o OUTPUT --output_ParaAT OUTPUT_PARAAT --hmmer HMMER 
               [--hmmsearch_path HMMSEARCH_PATH] [--ParaAT_path PARAAT_PATH] 
               [--seqkit_path SEQKIT_PATH] [--msa MSA] [--cpu CPU] 
               [--Eval EVAL] [--tmp TMP] [--random RANDOM] 

Stript used to calculated motif kaks 

options: 
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to Pep/cds dir, pep must end with .pep.fa and cds must end with .cds.fa
  -o OUTPUT, --output OUTPUT
                        Path to ParaAT prepare and summary file
  --output_ParaAT OUTPUT_PARAAT
                        Path to ParaAT result
  --hmmer HMMER         Path to hmmer model to search
  --hmmsearch_path HMMSEARCH_PATH
                        Path to hmmsearch path
  --ParaAT_path PARAAT_PATH
                        Path to ParaAT tools
  --seqkit_path SEQKIT_PATH
                        Path to seqkit tools
  --msa MSA             MSA tools for ParaAT, clustalw2 | t_coffee | mafft | muscle
  --cpu CPU             CPU number used
  --Eval EVAL           E value used
  --tmp TMP             tmp dir, default ./
  --random RANDOM       number for random selected homology pair, default 100

Input

  1. Path to Pep and cds dir. Dir contains all pep and cds files, pep and cds files must end with .pep.fa and .cds.fa respectively. genes in pep and cds file must have same ID
    eg.
    a.cds.fa a.pep.fa b.cds.fa b.pep.fa c.cds.fa c.pep.fa
    
  2. HMM model from Pfam database
    # you can get your hmmer model form Pfam.A with command
    
    grep -B 2 -A $n "ACC[[:space:]]*${pfam_id}$" ${Pfam_data} | \
    awk 'BEGIN{doPrint=1}{if (doPrint==1) {print $0}; if ($0=="//") {doPrint=0}; if ($0=="--") {doPrint=0}}' > ./03.ID_hmmer_detect/${ID}_${pfam_id}.hmm
    
  3. Option input
    Path to hmmsearch, seqkit, ParaAT, MSA tools
    

Output

  1. output path
    {pfam_id}.pep.fa, {pfam_id}.cds.fa, {pfam_id}.hmm_{random}_arrangement(homology pair), {pfam_id}_hmmer_summary.csv(hmmer results), {pfam_id}_kaks_result.xls(kaks results)
    
  2. output_ParaAT
    Path to ParaAT result
    
  3. output, output_ParaAT, tmp must in the same dir

Example

python motif_ParaAT.py -i ./test/ -o ./output/ --output_ParaAT ./ParaAT_output/ --hmmer PF00403.27.hmm

About

Scripts used to calculated motifs kaks ratio from Pan-Pep

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages