Skip to content

Feature request: Native support for Trim Galore v2.0 (Oxidized Edition) reports #3529

@FelixKrueger

Description

@FelixKrueger

Context

Trim Galore v2.0 ("Oxidized Edition") is a complete rewrite in Rust — a single binary with no external dependencies. It no longer shells out to Cutadapt; adapter trimming is built in.

The trimming reports still follow the same *_trimming_report.txt naming convention and include the same statistics (total reads, adapter %, quality-trimmed bp, adapter length distribution), but there is no longer embedded Cutadapt stdout — the statistics are generated directly by Trim Galore.

Current situation

MultiQC discovers Trim Galore reports through the cutadapt module, which requires "This is cutadapt" in the first 100 lines. Since v2.0 no longer uses Cutadapt, we emit a backwards-compatibility shim:

Trim Galore 2.0.0 (Oxidized Edition) — adapter trimming built in
This is cutadapt 4.0 (compatible; for MultiQC backwards compatibility)

This makes current MultiQC work, but the Software Versions table shows "Cutadapt 4.0" instead of "Trim Galore 2.0.0", which is misleading.

Proposal

Add a native search pattern for Trim Galore v2.0 reports so they are recognized directly with the correct software name and version. For example:

# search_patterns.yaml
trim_galore:
  - contents: "Trim Galore"
    contents_re: "Oxidized Edition"
    num_lines: 20

The statistics section uses the same format as Cutadapt >= 1.7 (same regexes for Total reads processed:, Reads with adapters:, bp stats, adapter length distribution table), so the existing parsing logic can be reused — only file discovery and version extraction need to change.

The version can be extracted from:

Trim Galore version: 2.0.0 (Oxidized Edition)

which appears in the report header (line 5).

The sample name can be extracted from:

Input filename: nextera_100K.fastq.gz

which appears on line 3.

We will keep the "This is cutadapt" shim in our reports for backwards compatibility with older MultiQC releases, so this change would be purely additive.

Test report

Below is a complete v2.0 trimming report (100K Nextera reads) that can be used as a test fixture:

nextera_100K.fastq.gz_trimming_report.txt
SUMMARISING RUN PARAMETERS
=========================
Input filename: nextera_100K.fastq.gz
Trimming mode: single-end
Trim Galore version: 2.0.0 (Oxidized Edition)
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'CTGTCTCTTATA' (user-specified or auto-detected)
Maximum trimming error rate: 0.1
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length single-end: 20 bp
Output file will be GZIP compressed


Trim Galore 2.0.0 (Oxidized Edition) — adapter trimming built in
This is cutadapt 4.0 (compatible; for MultiQC backwards compatibility)
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a CTGTCTCTTATA nextera_100K.fastq.gz
Processing reads on 1 core in single-end mode ...

=== Summary ===

Total reads processed:              100,000
Reads with adapters:                 37,813 (37.8%)
Reads written (passing filters):     99,922 (99.9%)

Total basepairs processed:      5,100,000 bp
Quality-trimmed:                    4,374 bp (0.1%)
Total written (filtered):       4,964,599 bp (97.3%)

=== Adapter 1 ===

Sequence: CTGTCTCTTATA; Type: regular 3'; Length: 12; Trimmed: 37813 times.

No. of allowed errors:
1-9 bp: 0; 10-12 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
1	19383	25000.0	0	19383
2	6510	6250.0	0	6510
3	2169	1562.5	0	2169
4	997	390.6	0	997
5	871	97.7	0	871
6	943	24.4	0	943
7	1020	6.1	0	1020
8	973	1.5	0	973
9	1120	0.4	0	1120
10	1010	0.1	1	1010
11	717	0.0	1	717
12	473	0.0	1	473
13	324	0.0	1	324
14	276	0.0	1	276
15	271	0.0	1	271
16	163	0.0	1	163
17	88	0.0	1	88
18	75	0.0	1	75
19	53	0.0	1	53
20	63	0.0	2	63
21	38	0.0	2	38
22	49	0.0	2	49
23	39	0.0	2	39
24	38	0.0	2	38
25	29	0.0	2	29
26	12	0.0	2	12
27	6	0.0	2	6
28	4	0.0	2	4
29	2	0.0	2	2
30	9	0.0	3	9
31	16	0.0	3	16
32	5	0.0	3	5
33	2	0.0	3	2
34	4	0.0	3	4
37	3	0.0	3	3
38	1	0.0	3	1
39	5	0.0	3	5
40	4	0.0	4	4
41	3	0.0	4	3
42	2	0.0	4	2
43	5	0.0	4	5
44	3	0.0	4	3
45	9	0.0	4	9
46	1	0.0	4	1
48	7	0.0	4	7
49	5	0.0	4	5
50	1	0.0	5	1
51	12	0.0	5	12


RUN STATISTICS FOR INPUT FILE: nextera_100K.fastq.gz
=============================================
100000 sequences processed in total
Sequences removed because they became shorter than the length cutoff of 20 bp:	78 (0.1%)

Happy to help with a PR if that would be useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions