Options
clean
To show all available options and their default values you can type in your terminal:
captus_assembly clean --help
Input
-r, --reads
With this option you provide the location of your raw FASTQ files, there are several ways to list them:
-
Directory: the path to the directory containing your FASTQ files is usually the easiest way to tell
Captus
which files to analyze. When you provide a directory,Captus
searches within all its subdirectories for files with valid FASTQ extensions. -
List of files: you can also provide the individual path to each of your FASTQ files separated by spaces. This is useful if you only want to analyze only a couple of samples within a directory with many other samples for example. Another use for lists is when your FASTQ files are located in different directories.
-
UNIX pattern: another easy way to provide lists of files is using the wildcards
*
and?
to match many or just one character respectively.
This argument is required .
Output
-o, --out
With this option you can redirect the output directory to a path of your choice, that path will be created if it doesn’t already exist.
This argument is optional, the default is ./01_clean_reads/
--keep_all
Many intermediate files are created during the read cleanup, some are large (like FASTQ files) while others small (like temporary logs). Captus
deletes all the unnecesary intermediate files unless you enable this flag.
--overwrite
Use this flag with caution, this will replace any previous result within the output directory (for the sample names that match).
Adaptor trimming
--adaptor_set
We have bundled with Captus
adaptor sequences, these options are available:
Illumina
= Adaptor set copied fromBBTools
.BGI
= Including BGISEQ, DNBSEQ, and MGISEQ.ALL
= If you are unsure of the technology used for your sequences this combines both sets of adaptors.
This argument is optional, the default is ALL.
--rna
Enable this flag to trim poly-A tails from RNA-Seq reads.
Quality trimming and filtering
Here you can control PHRED quality score thresholds. BBTools
uses the PHRED algorithm to trim low-quality bases or to discard low-quality reads.
--trimq
Leading and trailing read regions with average PHRED quality score below this value will be trimmed.
Many people raise this value to 20 or even higher but that usually discards lots of useful data for de novo assembly. In general, unless you have really high sequencing depth, don’t increase this threshold beyond ~16.
This argument is optional, the default is 13.
--maq
Once the trimming of low-quality bases from both ends of the reads has been completed, the average PHRED score of the entire read is recalculated and reads that do not have at least this minimum average quality are discarded.
Again, very high thresholds will throw away useful data. In general, set it to at least trimq
or just a couple numbers higher.
This argument is optional, the default is 16.
--ftl
Trim any base to the left of this position. For example, if you want to remove 4 bases from the left of the reads set this number to 5.
This argument is optional, the default is 0 (no ftl
applied).
--ftr
Trim any base to the right of this position. For example, if you want to truncate your reads length to 100 bp set this number to 100
This argument is optional, the default is 0 (no ftr
applied).
QC Statistics
--qc_program
Select the program for obtaining the statistics from your FASTQ files. Both programs should return identical results, but Falco
is much faster. Valid options are:
Falco
FastQC
This argument is optional, the default is Falco.
--skip_qc_stats
This flag disables the Falco
or FastQC
analysis, keep in mind that the final HTML report can’t be created without the results from this analysis.
Other
--bbduk_path
, --falco_path
, --fastqc_path
If you have installed your own copies of bbduk.sh
, Falco
, or FastQC
you can provide the full path to those copies.
These arguments are optional, the defaults are bbduk.sh, falco, and fastqc respectively.
--ram
, --threads
, --concurrent
, --debug
, --show_less
See Parallelization (and other common options)
Created by Edgardo M. Ortiz (06.08.2021)
Last modified by Edgardo M. Ortiz (29.05.2022)