motif_ParaAT

A Python script to identify protein domains in one or more protein sequences (Pan Peptides/Pan-Pep). The script uses the HMMER software to scan query hmmer model against Pan-Pep. All peps related to the domains were extracted, and ParaAT.pl were applied to calculated the kaks ratio. The scripts allows you to identify the domain architecture of the Pan Pep and thus can provide insight into domain conservation.

Ideas from https://github.com/HongboDoll/PotatoPanNLRome

Requirements

Python >= 3.10
pandas
hmmer
seqkit
ParaAT 2.0
multiple sequence alignment like clustalw2 | t_coffee | mafft | muscle (muscle5.1 may raise error from ParaAT.pl)

Usage

usage: motif_ParaAT.py [-h] -i INPUT -o OUTPUT --output_ParaAT OUTPUT_PARAAT --hmmer HMMER 
               [--hmmsearch_path HMMSEARCH_PATH] [--ParaAT_path PARAAT_PATH] 
               [--seqkit_path SEQKIT_PATH] [--msa MSA] [--cpu CPU] 
               [--Eval EVAL] [--tmp TMP] [--random RANDOM] 

Stript used to calculated motif kaks 

options: 
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to Pep/cds dir, pep must end with .pep.fa and cds must end with .cds.fa
  -o OUTPUT, --output OUTPUT
                        Path to ParaAT prepare and summary file
  --output_ParaAT OUTPUT_PARAAT
                        Path to ParaAT result
  --hmmer HMMER         Path to hmmer model to search
  --hmmsearch_path HMMSEARCH_PATH
                        Path to hmmsearch path
  --ParaAT_path PARAAT_PATH
                        Path to ParaAT tools
  --seqkit_path SEQKIT_PATH
                        Path to seqkit tools
  --msa MSA             MSA tools for ParaAT, clustalw2 | t_coffee | mafft | muscle
  --cpu CPU             CPU number used
  --Eval EVAL           E value used
  --tmp TMP             tmp dir, default ./
  --random RANDOM       number for random selected homology pair, default 100

Input

Path to Pep and cds dir. Dir contains all pep and cds files, pep and cds files must end with .pep.fa and .cds.fa respectively. genes in pep and cds file must have same ID
```
eg.
a.cds.fa a.pep.fa b.cds.fa b.pep.fa c.cds.fa c.pep.fa
```

HMM model from Pfam database

# you can get your hmmer model form Pfam.A with command

grep -B 2 -A $n "ACC[[:space:]]*${pfam_id}$" ${Pfam_data} | \
awk 'BEGIN{doPrint=1}{if (doPrint==1) {print $0}; if ($0=="//") {doPrint=0}; if ($0=="--") {doPrint=0}}' > ./03.ID_hmmer_detect/${ID}_${pfam_id}.hmm

Option input

Path to hmmsearch, seqkit, ParaAT, MSA tools

Output

output path

{pfam_id}.pep.fa, {pfam_id}.cds.fa, {pfam_id}.hmm_{random}_arrangement(homology pair), {pfam_id}_hmmer_summary.csv(hmmer results), {pfam_id}_kaks_result.xls(kaks results)

output_ParaAT
```
Path to ParaAT result
```
output, output_ParaAT, tmp must in the same dir

Example

python motif_ParaAT.py -i ./test/ -o ./output/ --output_ParaAT ./ParaAT_output/ --hmmer PF00403.27.hmm

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
motif_ParaAT.py		motif_ParaAT.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

motif_ParaAT

Requirements

Usage

Input

Output

Example

About

Releases

Packages

Languages

jkfo002/motif_ParaAT

Folders and files

Latest commit

History

Repository files navigation

motif_ParaAT

Requirements

Usage

Input

Output

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages