HTML Report

Concept

Proper cleaning is the first step to perform proper analyses on high-throughput sequencing data. To assess the quality of raw reads and how it is improved by the cleaning, the clean module internally runs the famous quality check program, FastQC, or its faster emulator, Falco, on the reads before and after cleaning. Although both programs generate informative reports, they are in separate files for each sample, each read direction (for paired-end), and before and after cleaning. This makes it tedious to review every report, and can lead to overlook some serious problems, such as residual low-quality bases or adaptor sequences, contamination of different samples, and improper setting of cleaning parameters.

Captus summarizes the information in those disparate reports into a single HTML file. All you need to do is open captus-assembly_clean.report.html with your browser (internet connection required) to get a quick overview on all your samples, both reads (for paired-end), and before and after cleaning!

Tip

The entire report is based on tables stored in the 03_qc_extras directory.
All tables and plots in the report are interactive powered by Plotly.
Visit the following sites once to take full advantage of its interactivity:
- https://plotly.com/chart-studio-help/getting-to-know-the-plotly-modebar
- https://plotly.com/chart-studio-help/zoom-pan-hover-controls

The report comprises the following nine sections:

Summary Table
Stats on Reads/Bases
Per Base Quality
Per Read Quality
Read Length Distribution
Per Base Nucleotide Content
Per Read GC Content
Sequence Duplication Level
Adaptor Content

A brief description and interactive example of each section is given below.
By switching the tabs at the top of each plot, you can compare the plot produced by Captus with the corresponding plot from FastQC.

1. Summary Table

This table shows general cleaning statistics for each sample.

Features:

Switch the Sort by dropdown to re-sort the table by any column value.
Cells are color-coded according to value (high = green; low = pink).

Description of each column

Column	Description	Unit
Sample	Sample name	-
Input Reads	Number of reads before cleaning	-
Input Bases	Number of bases before cleaning	bp
Output Reads	Number of reads passed cleaning	-
Output Reads%	= (`Output Reads` / `Input Reads`) * 100	%
Output Bases	Number of bases passed cleaning	bp
Output Bases%	= (`Output Bases` / `Input Bases`) * 100	%
Mean Read Length%	= (Mean read length after cleaning / Mean read length before cleaning) * 100	%
≥Q20 Reads%	Percentage of reads with mean Phred quality score over 20 after cleaning	%
≥Q30 Reads%	Percentage of reads with mean Phred quality score over 30 after cleaning	%
GC%	Mean GC content in the reads after cleaning	%
Adapter%	Percentage of reads containing adaptor sequences before cleaning	%

2. Stats on Reads/Bases

Captus cleans reads through two consecutive rounds of adaptor trimming (Round1, Round2) followed by quality trimming and filtering.
This plot shows changes in the number of reads (left panel) and bases (right panel) at each step of the cleaning process.

Features:

Switch the buttons at the top to choose whether to show counts or percentages.
Samples are sorted by the number or percentage of bases passed cleaning.
Click on the legend to toggle hide/show of each data series.

There is no corresponding plot.

3. Per Base Quality

This plot shows the range of Phred quality score at each position in the reads before and after cleaning.
For more details, read FastQC documentation.

Feature:

Switch the dropdown at the top to change the variable to show, these variables represent the elements of the boxplots in the FastQC report.

4. Per Read Quality

This plot shows the distribution of mean Phred quality score for each read before and after cleaning.
For more details, read FastQC documentation.

5. Read Length Distribution

This plot shows the distribution of read lengths before and after cleaning.
For more details, read FastQC documentation.

6. Per Base Nucleotide Content

This plot shows the composition of each nucleotide (A, T, G, C) at each position in the reads before and after cleaning.
If a particular nucleotide is overrepresented at a certain position in the reads, you will see the color corresponding to that nucleotide; otherwise, the plot will be a uniform grayish color.
For more details, read FastQC documentation.

7. Per Read GC Content

This plot shows the frequency of GC content in the reads before and after cleaning.
Broader or bimodal peaks may indicate contamination with DNA from different organisms.
For more details, read FastQC documentation.

8. Sequence Duplication Level

This plot shows the percentage of sequences with different degrees of duplication before and after cleaning.
For more details, read FastQC documentation.