Skip to content

Commit 7a087ae

Browse files
authored
docs: Update HttpCrawler docs about mimetype handling (#3356)
1 parent 565fc34 commit 7a087ae

4 files changed

Lines changed: 10 additions & 9 deletions

File tree

packages/cheerio-crawler/src/internals/cheerio-crawler.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,8 @@ export type CheerioRequestHandler<
120120
* ]
121121
* ```
122122
*
123-
* By default, `CheerioCrawler` only processes web pages with the `text/html`
124-
* and `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
123+
* By default, `CheerioCrawler` only processes web pages with the `text/html`, `application/xhtml+xml`, `text/xml`, `application/xml`,
124+
* and `application/json` MIME content types (as reported by the `Content-Type` HTTP header),
125125
* and skips pages with other content types. If you want the crawler to process other content types,
126126
* use the {@apilink CheerioCrawlerOptions.additionalMimeTypes} constructor option.
127127
* Beware that the parsing behavior differs for HTML, XML, JSON and other types of content.

packages/http-crawler/src/internals/http-crawler.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,8 @@ export interface HttpCrawlerOptions<Context extends InternalHttpCrawlingContext
137137

138138
/**
139139
* An array of [MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types)
140-
* you want the crawler to load and process. By default, only `text/html` and `application/xhtml+xml` MIME types are supported.
140+
* you want the crawler to load and process. By default, only `text/html`, `application/xhtml+xml`, `text/xml`, `application/xml`,
141+
* and `application/json` MIME types are supported.
141142
*/
142143
additionalMimeTypes?: string[];
143144

@@ -291,8 +292,8 @@ export type HttpRequestHandler<
291292
* ]
292293
* ```
293294
*
294-
* By default, this crawler only processes web pages with the `text/html`
295-
* and `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
295+
* By default, this crawler only processes web pages with the `text/html`, `application/xhtml+xml`, `text/xml`, `application/xml`,
296+
* and `application/json` MIME content types (as reported by the `Content-Type` HTTP header),
296297
* and skips pages with other content types. If you want the crawler to process other content types,
297298
* use the {@apilink HttpCrawlerOptions.additionalMimeTypes} constructor option.
298299
* Beware that the parsing behavior differs for HTML, XML, JSON and other types of content.

packages/jsdom-crawler/src/internals/jsdom-crawler.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,8 @@ export type JSDOMRequestHandler<
139139
* ]
140140
* ```
141141
*
142-
* By default, `JSDOMCrawler` only processes web pages with the `text/html`
143-
* and `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
142+
* By default, `JSDOMCrawler` only processes web pages with the `text/html`, `application/xhtml+xml`, `text/xml`, `application/xml`,
143+
* and `application/json` MIME content types (as reported by the `Content-Type` HTTP header),
144144
* and skips pages with other content types. If you want the crawler to process other content types,
145145
* use the {@apilink JSDOMCrawlerOptions.additionalMimeTypes} constructor option.
146146
* Beware that the parsing behavior differs for HTML, XML, JSON and other types of content.

packages/linkedom-crawler/src/internals/linkedom-crawler.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,8 @@ export type LinkeDOMRequestHandler<
131131
* ]
132132
* ```
133133
*
134-
* By default, `LinkeDOMCrawler` only processes web pages with the `text/html`
135-
* and `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
134+
* By default, `LinkeDOMCrawler` only processes web pages with the `text/html`, `application/xhtml+xml`, `text/xml`, `application/xml`,
135+
* and `application/json` MIME content types (as reported by the `Content-Type` HTTP header),
136136
* and skips pages with other content types. If you want the crawler to process other content types,
137137
* use the {@apilink LinkeDOMCrawlerOptions.additionalMimeTypes} constructor option.
138138
* Beware that the parsing behavior differs for HTML, XML, JSON and other types of content.

0 commit comments

Comments
 (0)