Output Files

For this example we will use the directory 01_clean_reads previously created with the clean module. We run the following Captus command to assemble our cleaned reads:

captus assemble --reads 01_clean_reads --sample_reads_target 1000000 --max_contig_gc 60.0

We are including the option --sample_reads_target 1000000 to show the expected output even though this option will not be very commonly used. Additionally, the option --max_contig_gc 60.0 is used to filter contigs with GC content over 60% after assembly, the filtered contigs are always saved to removed_contigs.fasta in case the filtering has to be repeated after changing the thresholds.

After the run is finished we should see a new directory called 02_assemblies with the following structure and files:

Assemblies

1. [sample]__captus-asm

A subdirectory ending in __captus-asm is created to contain the assembly of each sample separately (S1, S2, S3, and S4 in the image).


2. 00_subsampled_reads

This directory is only created when the option --sample_reads_target is used. It contains the subsampled reads in FASTQ format that were used for the assembly.


3. 01_assembly

This directory contains the FASTA and FASTG assembly files as well as assembly statistics and logs.


4. assembly.fasta

The main assembly file in FASTA format, this file contains the contigs assembled by MEGAHIT and filtered according --max_contig_gc and --min_contig_depth. The sequence headers are modified by Captus to resemble the headers produced by the assembler Spades.


5. assembly_graph.fastg

The assembly graph in FASTG format. This file can be explored in Bandage or similar software which are able to plot the connections between contigs, loops, circular segments, etc. The graph is based on the original MEGAHIT assembly prior to filtering.


6. megahit_brief.log, megahit_full.log

MEGAHIT program logs, the brief version contains just the screen output from each MEGAHIT run.

Example

7. 01_salmon_quant

This directory contains the results of mapping the reads back to the assembled contigs using Salmon. It is not created when --ignore_mapping is used.


8. salmon.log

Salmon logs, combined for the indexing and quantification steps.


9. removed_contigs.fasta

This file is created after the filtering by GC and/or depth is finished (same format as in 4).


10. contigs_depth.tsv

Table containing depth statistics and contig names with the original depth estimated by MEGAHIT and then recalculated with Salmon.

Information included in the table

11. assembly_stats.tsv

Assembly statistics, before and after filtering:

Information included in the table

12. depth_stats.tsv

Depth statistics, before and after filtering:

Information included in the table

13. length_stats.tsv

Length statistics, before and after filtering:

Information included in the table

14. captus-assemble_assembly_stats.tsv

Assembly statistics compiled across all samples, before and after filtering:

Information included in the table

15. captus-assemble_depth_stats.tsv

Depth statistics compiled across all samples, before and after filtering:

Information included in the table

16. captus-assemble_length_stats.tsv

Length statistics compiled across all samples, before and after filtering:

Information included in the table

17. captus-assemble_report.html

This is the final Assembly report, summarizing statistics across all samples assembled.


18. captus-assemble.log

This is the log from Captus, it contains the command used and all the information shown during the run. If the option --show_less was enabled, the log will also contain all the extra detailed information that was hidden during the run.


Created by Edgardo M. Ortiz (06.08.2021)
Last modified by Edgardo M. Ortiz (23.12.2024)