Context
Trim Galore v2.0 ("Oxidized Edition") is a complete rewrite in Rust — a single binary with no external dependencies. It no longer shells out to Cutadapt; adapter trimming is built in.
The trimming reports still follow the same *_trimming_report.txt naming convention and include the same statistics (total reads, adapter %, quality-trimmed bp, adapter length distribution), but there is no longer embedded Cutadapt stdout — the statistics are generated directly by Trim Galore.
Current situation
MultiQC discovers Trim Galore reports through the cutadapt module, which requires "This is cutadapt" in the first 100 lines. Since v2.0 no longer uses Cutadapt, we emit a backwards-compatibility shim:
Trim Galore 2.0.0 (Oxidized Edition) — adapter trimming built in
This is cutadapt 4.0 (compatible; for MultiQC backwards compatibility)
This makes current MultiQC work, but the Software Versions table shows "Cutadapt 4.0" instead of "Trim Galore 2.0.0", which is misleading.
Proposal
Add a native search pattern for Trim Galore v2.0 reports so they are recognized directly with the correct software name and version. For example:
# search_patterns.yaml
trim_galore:
- contents: "Trim Galore"
contents_re: "Oxidized Edition"
num_lines: 20
The statistics section uses the same format as Cutadapt >= 1.7 (same regexes for Total reads processed:, Reads with adapters:, bp stats, adapter length distribution table), so the existing parsing logic can be reused — only file discovery and version extraction need to change.
The version can be extracted from:
Trim Galore version: 2.0.0 (Oxidized Edition)
which appears in the report header (line 5).
The sample name can be extracted from:
Input filename: nextera_100K.fastq.gz
which appears on line 3.
We will keep the "This is cutadapt" shim in our reports for backwards compatibility with older MultiQC releases, so this change would be purely additive.
Test report
Below is a complete v2.0 trimming report (100K Nextera reads) that can be used as a test fixture:
nextera_100K.fastq.gz_trimming_report.txt
SUMMARISING RUN PARAMETERS
=========================
Input filename: nextera_100K.fastq.gz
Trimming mode: single-end
Trim Galore version: 2.0.0 (Oxidized Edition)
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'CTGTCTCTTATA' (user-specified or auto-detected)
Maximum trimming error rate: 0.1
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length single-end: 20 bp
Output file will be GZIP compressed
Trim Galore 2.0.0 (Oxidized Edition) — adapter trimming built in
This is cutadapt 4.0 (compatible; for MultiQC backwards compatibility)
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a CTGTCTCTTATA nextera_100K.fastq.gz
Processing reads on 1 core in single-end mode ...
=== Summary ===
Total reads processed: 100,000
Reads with adapters: 37,813 (37.8%)
Reads written (passing filters): 99,922 (99.9%)
Total basepairs processed: 5,100,000 bp
Quality-trimmed: 4,374 bp (0.1%)
Total written (filtered): 4,964,599 bp (97.3%)
=== Adapter 1 ===
Sequence: CTGTCTCTTATA; Type: regular 3'; Length: 12; Trimmed: 37813 times.
No. of allowed errors:
1-9 bp: 0; 10-12 bp: 1
Overview of removed sequences
length count expect max.err error counts
1 19383 25000.0 0 19383
2 6510 6250.0 0 6510
3 2169 1562.5 0 2169
4 997 390.6 0 997
5 871 97.7 0 871
6 943 24.4 0 943
7 1020 6.1 0 1020
8 973 1.5 0 973
9 1120 0.4 0 1120
10 1010 0.1 1 1010
11 717 0.0 1 717
12 473 0.0 1 473
13 324 0.0 1 324
14 276 0.0 1 276
15 271 0.0 1 271
16 163 0.0 1 163
17 88 0.0 1 88
18 75 0.0 1 75
19 53 0.0 1 53
20 63 0.0 2 63
21 38 0.0 2 38
22 49 0.0 2 49
23 39 0.0 2 39
24 38 0.0 2 38
25 29 0.0 2 29
26 12 0.0 2 12
27 6 0.0 2 6
28 4 0.0 2 4
29 2 0.0 2 2
30 9 0.0 3 9
31 16 0.0 3 16
32 5 0.0 3 5
33 2 0.0 3 2
34 4 0.0 3 4
37 3 0.0 3 3
38 1 0.0 3 1
39 5 0.0 3 5
40 4 0.0 4 4
41 3 0.0 4 3
42 2 0.0 4 2
43 5 0.0 4 5
44 3 0.0 4 3
45 9 0.0 4 9
46 1 0.0 4 1
48 7 0.0 4 7
49 5 0.0 4 5
50 1 0.0 5 1
51 12 0.0 5 12
RUN STATISTICS FOR INPUT FILE: nextera_100K.fastq.gz
=============================================
100000 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp: 78 (0.1%)
Happy to help with a PR if that would be useful.
Context
Trim Galore v2.0 ("Oxidized Edition") is a complete rewrite in Rust — a single binary with no external dependencies. It no longer shells out to Cutadapt; adapter trimming is built in.
The trimming reports still follow the same
*_trimming_report.txtnaming convention and include the same statistics (total reads, adapter %, quality-trimmed bp, adapter length distribution), but there is no longer embedded Cutadapt stdout — the statistics are generated directly by Trim Galore.Current situation
MultiQC discovers Trim Galore reports through the
cutadaptmodule, which requires"This is cutadapt"in the first 100 lines. Since v2.0 no longer uses Cutadapt, we emit a backwards-compatibility shim:This makes current MultiQC work, but the Software Versions table shows "Cutadapt 4.0" instead of "Trim Galore 2.0.0", which is misleading.
Proposal
Add a native search pattern for Trim Galore v2.0 reports so they are recognized directly with the correct software name and version. For example:
The statistics section uses the same format as Cutadapt >= 1.7 (same regexes for
Total reads processed:,Reads with adapters:, bp stats, adapter length distribution table), so the existing parsing logic can be reused — only file discovery and version extraction need to change.The version can be extracted from:
which appears in the report header (line 5).
The sample name can be extracted from:
which appears on line 3.
We will keep the
"This is cutadapt"shim in our reports for backwards compatibility with older MultiQC releases, so this change would be purely additive.Test report
Below is a complete v2.0 trimming report (100K Nextera reads) that can be used as a test fixture:
nextera_100K.fastq.gz_trimming_report.txt
Happy to help with a PR if that would be useful.