Output Files

For this example we will use the directory 01_clean_reads previously created with the clean module. We run the following Captus command to assemble our cleaned reads:

captus_assembly assemble --reads 01_clean_reads --sample_reads_target 1000000 --max_contig_gc 60.0

We are including the option --sample_reads_target 1000000 to show the expected output even though this option will not be very commonly used. Additionally, the option --max_contig_gc 60.0 is used to filter contigs with GC content over 60% after assembly, only when this option is used the file filtered_contigs.fasta is produced.

After the run is finished we should see a new directory called 02_assemblies with the following structure and files:

Assemblies Assemblies

1. [sample]__captus-asm

A subdirectory ending in __captus-asm is created to contain the assembly of each sample separately (S1, S2, S3, and S4 in the image).


2. 00_subsampled_reads

This directory is only created when the option --sample_reads_target is used. It contains the subsampled reads in FASTQ format that were used for the assembly.


3. 01_assembly

This directory contains the FASTA and FASTG assembly files as well as assembly statistics and logs.


4. assembly.fasta

The main assembly file in FASTA format, this file contains the contigs assembled by MEGAHIT. The sequence headers are modified by Captus to resemble the headers produced by the assembler Spades.


5. assembly_graph.fastg

The assembly graph in FASTG format. This file can be explored in Bandage or similar software which are able to plot the connections between contigs, loops, circular segments, etc.


6. filtered_contigs.fasta

This file in only created when the option --max_contig_gc is used, Captus will place any contig exceeding the specified GC content in this FASTA-formatted file (same format as in 4).


7. assembly.stats.tsv, assembly.stats.t.tsv

Assembly statistics, the assembly.stats.t.tsv is just a transposed version of assembly.stats.tsv:

assembly.stats.tsv

              sample : GenusA_speciesA_CAP
           n_contigs : 1830
 pct_contigs_>=_1kbp : 33.661
 pct_contigs_>=_2kbp : 11.694
 pct_contigs_>=_5kbp : 1.749
pct_contigs_>=_10kbp : 0.984
      longest_contig : 26676
     shortest_contig : 183
        total_length : 2041738
  pct_length_>=_1kbp : 70.061
  pct_length_>=_2kbp : 42.875
  pct_length_>=_5kbp : 18.15
 pct_length_>=_10kbp : 13.836
          avg_length : 1116
       median_length : 659
                 N50 : 1648
          GC_content : 39.028
           avg_depth : 24.1
   pct_contigs_>=_1x : 100.0
   pct_contigs_>=_2x : 70.109
   pct_contigs_>=_5x : 49.563
  pct_contigs_>=_10x : 38.47

8. megahit.brief.log, megahit.full.log

MEGAHIT program logs, the brief version contains just the screen output from each MEGAHIT run.

megahit.brief.log

Captus' MEGAHIT Command:
  megahit -1 /Volumes/Shuttle500G/for_docs_output/02_assemblies/GenusA_speciesA_CAP__captus-asm/00_subsampled_reads/GenusA_speciesA_CAP_R1.fq.gz -2 /Volumes/Shuttle500G/for_docs_output/02_assemblies/GenusA_speciesA_CAP__captus-asm/00_subsampled_reads/GenusA_speciesA_CAP_R2.fq.gz --min-count 2 --k-list 31,39,47,63,79,95,111,127,143,159,175 --merge-level 20,0.95 --prune-level 2 --memory 8504035246 --num-cpu-threads 4 --out-dir /Volumes/Shuttle500G/for_docs_output/02_assemblies/GenusA_speciesA_CAP__captus-asm/01_assembly --min-contig-len 182 --tmp-dir /Users/emortiz/captus_megahit_tmp


2022-04-08 01:09:30 - MEGAHIT v1.2.9
2022-04-08 01:09:30 - Using megahit_core with POPCNT and BMI2 support
2022-04-08 01:09:30 - Convert reads to binary library
2022-04-08 01:09:31 - b'INFO  sequence/io/sequence_lib.cpp  :   77 - Lib 0 (/Volumes/Shuttle500G/for_docs_output/02_assemblies/GenusA_speciesA_CAP__captus-asm/00_subsampled_reads/GenusA_speciesA_CAP_R1.fq.gz,/Volumes/Shuttle500G/for_docs_output/02_assemblies/GenusA_speciesA_CAP__captus-asm/00_subsampled_reads/GenusA_speciesA_CAP_R2.fq.gz): pe, 720090 reads, 151 max length'
2022-04-08 01:09:31 - b'INFO  utils/utils.h                 :  152 - Real: 1.3603\tuser: 0.6603\tsys: 0.2130\tmaxrss: 64000000'
2022-04-08 01:09:31 - Start assembly. Number of CPU threads 4 
2022-04-08 01:09:31 - k list: 31,39,47,63,79,95,111,127,143,159,175 
2022-04-08 01:09:31 - Memory used: 8504035246
2022-04-08 01:09:31 - Extract solid (k+1)-mers for k = 31 
2022-04-08 01:09:41 - Build graph for k = 31 
2022-04-08 01:09:43 - Assemble contigs from SdBG for k = 31
2022-04-08 01:09:46 - Local assembly for k = 31
2022-04-08 01:09:55 - Extract iterative edges from k = 31 to 39 
2022-04-08 01:09:57 - Build graph for k = 39 
2022-04-08 01:09:58 - Assemble contigs from SdBG for k = 39
2022-04-08 01:10:00 - Local assembly for k = 39
2022-04-08 01:10:15 - Extract iterative edges from k = 39 to 47 
2022-04-08 01:10:17 - Build graph for k = 47 
2022-04-08 01:10:18 - Assemble contigs from SdBG for k = 47
2022-04-08 01:10:20 - Local assembly for k = 47
2022-04-08 01:10:39 - Extract iterative edges from k = 47 to 63 
2022-04-08 01:10:41 - Build graph for k = 63 
2022-04-08 01:10:42 - Assemble contigs from SdBG for k = 63
2022-04-08 01:10:44 - Local assembly for k = 63
2022-04-08 01:11:03 - Extract iterative edges from k = 63 to 79 
2022-04-08 01:11:05 - Build graph for k = 79 
2022-04-08 01:11:06 - Assemble contigs from SdBG for k = 79
2022-04-08 01:11:08 - Local assembly for k = 79
2022-04-08 01:11:28 - Extract iterative edges from k = 79 to 95 
2022-04-08 01:11:30 - Build graph for k = 95 
2022-04-08 01:11:31 - Assemble contigs from SdBG for k = 95
2022-04-08 01:11:33 - Local assembly for k = 95
2022-04-08 01:11:49 - Extract iterative edges from k = 95 to 111 
2022-04-08 01:11:50 - Build graph for k = 111 
2022-04-08 01:11:51 - Assemble contigs from SdBG for k = 111
2022-04-08 01:11:53 - Local assembly for k = 111
2022-04-08 01:12:08 - Extract iterative edges from k = 111 to 127 
2022-04-08 01:12:09 - Build graph for k = 127 
2022-04-08 01:12:10 - Assemble contigs from SdBG for k = 127
2022-04-08 01:12:12 - Local assembly for k = 127
2022-04-08 01:12:26 - Extract iterative edges from k = 127 to 143 
2022-04-08 01:12:26 - Build graph for k = 143 
2022-04-08 01:12:27 - Assemble contigs from SdBG for k = 143
2022-04-08 01:12:29 - Local assembly for k = 143
2022-04-08 01:12:40 - Extract iterative edges from k = 143 to 159 
2022-04-08 01:12:40 - Build graph for k = 159 
2022-04-08 01:12:41 - Assemble contigs from SdBG for k = 159
2022-04-08 01:12:43 - Local assembly for k = 159
2022-04-08 01:12:53 - Extract iterative edges from k = 159 to 175 
2022-04-08 01:12:54 - Build graph for k = 175 
2022-04-08 01:12:54 - Assemble contigs from SdBG for k = 175
2022-04-08 01:12:56 - Merging to output final contigs 
2022-04-08 01:12:56 - 1850 contigs, total 2031858 bp, min 182 bp, max 26295 bp, avg 1098 bp, N50 1616 bp
2022-04-08 01:12:56 - ALL DONE. Time elapsed: 206.394586 seconds 

9. captus-assembly_assemble.stats.tsv

Statistics tab-separated-values table compiled across all assembled samples.

Column Description
sample Name of the sample
n_contigs Number of contigs
pct_contigs_>=_1kbp Percentage of contigs over 1kbp
pct_contigs_>=_2kbp Percentage of contigs over 2kbp
pct_contigs_>=_5kbp Percentage of contigs over 5kbp
pct_contigs_>=_10kbp Percentage of contigs over 10kbp
longest_contig Length of longest contig in bp
shortest_contig Length of shortest contig in bp
total_length Cumulative length of all contigs in bp
pct_length_>=_1kbp Percentage of total assembly length in contigs over 1kbp
pct_length_>=_2kbp Percentage of total assembly length in contigs over 2kbp
pct_length_>=_5kbp Percentage of total assembly length in contigs over 5kbp
pct_length_>=_10kbp Percentage of total assembly length in contigs over 10kbp
avg_length Average contig length in bp
median_length Median contig length in bp
N50 Assembly N50 in bp
GC_content Average contig GC content
avg_depth Average contig depth
pct_contigs_>=_1x Percentage of contigs over 1x depth
pct_contigs_>=_2x Percentage of contigs over 2x depth
pct_contigs_>=_5x Percentage of contigs over 5x depth
pct_contigs_>=_10x Percentage of contigs over 10x depth
filtered_n_contigs
filtered_total_length
filtered_avg_length
filtered_gc_content

10. captus-assembly_assemble.report.html

This is the final Assembly report, summarizing statistics across all samples assembled.


11. captus-assembly_assemble.log

This is the log from Captus, it contains the command used and all the information shown during the run. If the option --show_less was enabled, the log will also contain all the extra detailed information that was hidden during the run.


Created by Edgardo M. Ortiz (06.08.2021)
Last modified by Edgardo M. Ortiz (29.05.2022)