Output Files

For this example we will use the directory 01_clean_reads previously created with the clean module. We run the following Captus command to assemble our cleaned reads:

captus_assembly assemble --reads 01_clean_reads --sample_reads_target 1000000 --max_contig_gc 60.0

We are including the option --sample_reads_target 1000000 to show the expected output even though this option will not be very commonly used. Additionally, the option --max_contig_gc 60.0 is used to filter contigs with GC content over 60% after assembly, only when this option is used the file filtered_contigs.fasta is produced.

After the run is finished we should see a new directory called 02_assemblies with the following structure and files:

Assemblies

1. [sample]__captus-asm

A subdirectory ending in __captus-asm is created to contain the assembly of each sample separately (S1, S2, S3, and S4 in the image).


2. 00_subsampled_reads

This directory is only created when the option --sample_reads_target is used. It contains the subsampled reads in FASTQ format that were used for the assembly.


3. 01_assembly

This directory contains the FASTA and FASTG assembly files as well as assembly statistics and logs.


4. assembly.fasta

The main assembly file in FASTA format, this file contains the contigs assembled by MEGAHIT. The sequence headers are modified by Captus to resemble the headers produced by the assembler Spades.


5. assembly_graph.fastg

The assembly graph in FASTG format. This file can be explored in Bandage or similar software which are able to plot the connections between contigs, loops, circular segments, etc.


6. filtered_contigs.fasta

This file in only created when the option --max_contig_gc is used, Captus will place any contig exceeding the specified GC content in this FASTA-formatted file (same format as in 4).


7. assembly.stats.tsv, assembly.stats.t.tsv

Assembly statistics, the assembly.stats.t.tsv is just a transposed version of assembly.stats.tsv:

Example

8. megahit.brief.log, megahit.full.log

MEGAHIT program logs, the brief version contains just the screen output from each MEGAHIT run.

Example

9. captus-assembly_assemble.stats.tsv

Statistics tab-separated-values table compiled across all assembled samples.

Information included in the table

10. captus-assembly_assemble.report.html

This is the final Assembly report, summarizing statistics across all samples assembled.


11. captus-assembly_assemble.log

This is the log from Captus, it contains the command used and all the information shown during the run. If the option --show_less was enabled, the log will also contain all the extra detailed information that was hidden during the run.


Created by Edgardo M. Ortiz (06.08.2021)
Last modified by Edgardo M. Ortiz (29.05.2022)