Skip to content

Reading gzip headers can lead to OOM for adversarial inputs #149945

@KowalskiThomas

Description

@KowalskiThomas

Bug report

Bug description:

In short

_read_until_null (added a few days ago in dd94457), appends every byte from a stream into a bytearray with no upper bound (here). Any gzip stream that sets FNAME or FCOMMENT together with any other flag (bypassing the early-exit at the flag == FNAME fast-path) and never emits a NUL byte will cause the header bytearray to grow until memory is exhausted. This occurs before any decompression takes place, so existing mitigations for zip-bomb-like inputs that bound decompressed output size do not apply.

Proposed fix

Cap _read_until_null at a reasonable per-field limit and raise BadGzipFile if exceeded:

_MAX_GZIP_HEADER_FIELD_SIZE = 65536  # consistent with max FEXTRA length (16-bit)

def _read_until_null(fp, append_to: bytearray) -> None:
    for _ in range(_MAX_GZIP_HEADER_FIELD_SIZE + 1):
        s = fp.read(1)
        append_to += s
        if not s or s == b'\000':
            return
    raise BadGzipFile('Header field exceeds maximum size '
                      f'({_MAX_GZIP_HEADER_FIELD_SIZE} bytes)')

(65535 bytes was picked to match the maximum FEXTRA field size (a 16-bit length prefix), making all variable-length header fields consistent.)

I have a branch that adds a mitigation here in my fork.

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Labels

3.15pre-release feature fixes, bugs and security fixes3.16new features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions