Output Files
Imagine we start with a directory called 00_raw_reads
with the following content:
We have a samples with different data types, to distinguish them we added _CAP
to the samples where hybridization-capture was used, _WGS
for high-coverage whole genome sequencing, _RNA
for RNA-Seq reads, and _GSK
for genome skimming data (notice also the difference in file sizes). For this example, we only want to clean the samples in red rectangles corresponding to capture data. We run the following Captus
command:
captus_assembly clean --reads ./00_raw_reads/*_CAP_R?.fq.gz
Notice we are using default settings, the only required argument is the location of the raw reads. The output was written to a new directory called 01_clean_reads
. Let’s take a look at the contents:
1. [sample]_R1.fq.gz
, [sample]_R2.fq.gz
In case of paired-end input we will have a pair of files like in the image, the forward reads are indicated by _R1 and the reverse reads by _R2. Single-end input will only return forward reads. Wikipedia’s entry for the format describes it in more detail.
2. [sample].cleaning.log
This file contains the cleaning command used for bbduk.sh
as well the data shown as screen output, this and other information is compiled in the Cleaning report.
3. [sample].cleaning.stats.txt
List of contaminants found by bbduk.sh
in the input reads, sorted by abundance.
4. captus-assembly_clean.report.html
This is the final Cleaning report, summarizing statistics across all samples analyzed.
5. captus-assembly_clean.log
This is the log from Captus
, it contains the command used and all the information shown during the run. If the option --show_less
was enabled, the log will also contain all the extra detailed information that was hidden during the run.
6. 00_adaptors_trimmed
This is an intermediate directory that contains the FASTQ files without adaptors, prior to quality-trimming and filtering. The directory also stores bbduk.sh
commands and logs for the adaptor trimming stage. If the option --keep_all
was enabled the FASTQs from this intermediate are kept after the run, otherwise they are deleted.
7. 01_qc_stats_before
, 02_qc_stats_after
These directories contain the results from either Falco
or FastQC
, organized in a subdirectory per FASTQ file analyzed.
8. 03_qc_extras
This directory contains all the tab-separated-values tables needed to build the Cleaning report. We provide them separately to allow the user more detailed analyses.
Created by Edgardo M. Ortiz (06.08.2021)
Last modified by Edgardo M. Ortiz (30.05.2022)