Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phables on co-assemblies yields fewer resolved phages compared to per-sample assemblies #62

Open
dujvfac opened this issue Mar 2, 2025 · 1 comment

Comments

@dujvfac
Copy link

dujvfac commented Mar 2, 2025

Hi,

I used Phables on an assembly graph from a co-assembly and the reads used for constructing that graph. In other words, I provided more than two FASTQ files in a single run of Phables.

For comparison, I also ran Phables on per-sample assemblies (each using two FASTQ files). When I pooled the results from the per-sample assemblies and compared them with the results from the co-assembly, I noticed a significant difference:

  • The co-assembly approach resulted in ~65% fewer resolved phages on average, compared to the pooled per-sample results.
  • When I performed the same comparison using another tool (VirSorter), the reduction in resolved phages was only ~20% on average.

Based on the log file, Phables does seem to process the multiple FASTQ files. However, the discrepancy in results suggests that Phables may not be fully optimized for co-assemblies or might handle them differently than per-sample assemblies.

Could you clarify whether Phables is designed to work effectively on co-assemblies? If so, do you have any recommendations for improving phage recovery in this context?

Thanks in advance!

@Vini2
Copy link
Owner

Vini2 commented Mar 21, 2025

Hi @dujvfac!

Thanks for your interest in Phables!

Running Phables on a co-assembly will pool in the coverage values across all samples and then perform the genome resolution. Since the co-assembly already collapses redundant genomes across samples, you will get a reduced number of the genomes. That is why Phables provides a post-processing step to map the reads from each sample to the resolved genomes and give read counts. This information will allow you to see which genomes are present in which samples.

If you really want to get the exact genomes in each sample, I would recommend running Phables on per-sample assemblies. You might find that there are very similar genomes across samples but still, you will be able to get them separately.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants