Skip to content

Update to v1.5.6#3

Open
GChuf wants to merge 424 commits into
pfsense:masterfrom
simplepie:master
Open

Update to v1.5.6#3
GChuf wants to merge 424 commits into
pfsense:masterfrom
simplepie:master

Conversation

@GChuf
Copy link
Copy Markdown

@GChuf GChuf commented Feb 4, 2021

Mostly bug and issue fixes, some new features.

Redmine issue: https://redmine.pfsense.org/issues/11508

Art4 and others added 30 commits March 20, 2023 09:46
Type hint prevents anything else being passed.
As pointed out by PHPStan, it did not expect them:

    Method SimplePie\Sanitize::pass_file_data() invoked with 5 parameters, 0-4 required

This has been broken since the support for passing curl options was introduced in
51f9f08
Needed for PHPStan level 2 to pass.

This is not complete (there are more magic properties) and the types are conservative (e.g. path accepts setting `null` which will be converted to an empty string) but this class is going away in the future and this is sufficient for our internal use.
…#821)

PHPStan would complain:

    Method Mock_CacheLegacy::get_handler() should return SimplePie\Cache\Base but return statement is missing.

Let’s throw an exception to properly terminate the method body.
Because we support `psr/simple-cache` 1.0, which gets installed on PHP 7,
and `Psr\SimpleCache\InvalidArgumentException` interface does not extend
`Throwable` in that version, PHPStan would complain:

    PHPDoc tag @throws with type Psr\SimpleCache\InvalidArgumentException is not subtype of Throwable

Let’s declare we actually throw a `Throwable` using an intersection type.
* Fix coding style

* php-cs-fixer: Allow risky rules

We are already using them and without this, php-cs-fixer will fail to run.

* composer: Add scripts for php-cs-fixer

So that we can just run `composer fix` to change the code to match coding style.

* ci: Run coding style checks

To avoid merging changes with broken syntax.
…gh static (#823)

PHPStan will rightly complain:

    Unsafe access to private property SimplePie\Tests\Fixtures\Cache\BaseCacheWithCallbacksMock::$touchCallback through static::…

It only works because there is no subclass. While at it, let’s make the class final.
PHPStan will not know about those and complain:

    Call to protected static method change_encoding_mbstring() of class SimplePie\Misc.
    Call to protected static method change_encoding_iconv() of class SimplePie\Misc.
    Call to protected static method change_encoding_uconverter() of class SimplePie\Misc.

We could use `@method` PHPDoc annotation but at this point explicit methods are not that much longer.
* SimplePie: Use factory method for creating Items

* Item: Accept Sanitize object

`SimplePie::$sanitize` is a private member so we should not access it.
It only works because the visibility is currently described using PHPDoc annotation.
The PR introducing this breakage sneaked through before coding style check was added to CI.
Split SimplePie\File class into HTTP client and response
There are correctness issues as well as disorganized commit history.
The fixed changes are re-applied in the commits that follow.
Dividing a number may return a float, which will cause a `TypeError` once we add type hints.

The value was `1` originally, when timeouts were introduced along with `SimplePie_File` in
5bf1814

We could consider hardcoding it again but we still want to keep it proportional to the master timeout
to allow people with slow connection to increase it, as introduced in
fe7e333

In fact, reducing the timeout probably does not make sense in the first place.
Even if we assume majority of cached resources will not get outdated,
waiting for server to reply `HTTP 304 Not Modified` can still take time
(domain name resolution can be particularly slow).
And if it turns out that the cached version is outdated,
the web server will try to send an updated version.
Having too low timeout can mean the request will be terminated,
only to be started again with the original timeout.

The timeout reduction was presumably done to prevent
fetching a cached resource from an unresponsive server without `force_cache_fallback`
from essentially doubling the timeout due to re-running the failed request.

Let’s use the original timeout and prevent repeat requests instead.
- Drop unused variable
- Add white space for clarity
PSR-7 `MessageInterface` expects values of each header is always a list.
But concatenating multiple instances of the same header like we do now
is a lossy operation so we cannot confidently split them.

To allow us to use PSR-7-like semantics internally,
let’s add PSR-7 mode to the HTTP parser that will avoid the imploding.
`Client` is sort of like `ClientInterface` from PSR-18 but the `request()` method expects request parameters as discrete arguments rather than wrapped in PSR-7 `RequestInterface` like `ClientInterface::sendRequest()` does.

`Response` is a pared-down `ResponseInterface` from PSR-7. There are also other notable differences from PSR-7:
- Method names use snake_case.
- The class also provides the URI of the final HTTP request after redirects through `get_final_requested_uri()` and permanent URI of the requested resource through `get_permanent_uri()`
- Body is returned as a string rather than stream in `get_body_content()`.
- `get_headers()` does not have to preserve the exact case in which headers were originally specified.

https://www.php-fig.org/psr/psr-7/
https://www.php-fig.org/psr/psr-18/

Acked-by: Jan Tojnar <jtojnar@gmail.com>
Implementing the PSR-7 inspired `Response` interface gets us one step
closer to pluggable HTTP implementation.

Most of the complexity comes from the fact that PSR-7 `MessageInterface`
expects values of each header to always be a list, while `File` stored
each as a single strings. But we cannot change the type of the `$headers`
property without breaking backwards compatibility (especially for `File`
subclasses that modify the headers), and we cannot just confidently split
the header lines in the methods, since the concatenating multiple instances
of the same header like `HTTP\Parser` currently does is a lossy operation.

As a result, we have to check if `$headers` property changed every time
we access the internal header state.

Co-authored-by: Jan Tojnar <jtojnar@gmail.com>
Alkarex and others added 30 commits September 30, 2025 10:32
This will slightly simplify the code.
It is an ancient technology for using custom fonts based on Flash.
It will not work in any modern browser.
The test runner was removed in 0fcde72.
Multifeed has been deprecated since ff1b513.
This is no-op outside multi-feed mode, which has already been deprecated.
The CURLOPT_ENCODING option is obsolete since cURL 7.21.6, in favor of
CURLOPT_ACCEPT_ENCODING.

I've put the updated option name behind a cURL version check, as
SimplePie supports older PHP versions which in turn support older cURL
versions which don't have the updated option name yet.

See:
https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
These files aren't needed in the Composer package.
PHP 7.2 requires at least libcurl 7.10.5 so the check is always true on all supported PHP versions:
https://github.com/php/php-src/blob/php-7.2.0/ext/curl/config.m4#L34

PHP 7.3 bumps the minimum to 7.15.5 but, even when we drop support for PHP < 7.4, that won’t help us with cleanups:
https://github.com/php/php-src/blob/php-7.3.0/ext/curl/config.m4#L32

PHP 8.0 further bumps this to 7.29.0, which would allow us to remove the remaining conditionals, but we still support 7.2.0 for now and will support 7.4.0 for a while.
https://github.com/php/php-src/blob/php-8.0.0/ext/curl/config.m4#L7
Fixes date parsing behavior when encountering invalid or localized date formats.

Before (on PHP 7.4):
- Invalid dates could throw errors/warnings due to `strict_types`

Before (on PHP 8.*):
- Invalid dates returned current date when not properly parsed (`date` uses current time when passed null)

After:
- Invalid dates gracefully return null
- Consistent behavior for different PHP versions

Changes:
- Improved error handling in date parsing
- Added test coverage for RFC 2822 formatted dates
- Added test for custom format parameter
Currently, we only use these to test code paths that do not involve parsing
but it would be nice to be able to reuse the tests for other simple tests.

Unfortunately, `Parser::parse` will actually fail with `error_string == "Unknown"`.

That is because `xml_get_error_code()` returns 201, which is not one of XML_ERROR_* codes:
https://www.php.net/manual/en/xml.error-codes.php

It looks like it is actually `XML_NS_ERR_UNDEFINED_NAMESPACE` leaking from libxml2.
I can confirm it with `libxml_use_internal_errors(true);` and dumping `libxml_get_errors()`:

    Namespace prefix media on thumbnail is not defined

Since we never use it anywhere, we can just drop the element.
- Improve typing and fix issues that will trip up PHPStan 2.0.
- Fix insufficient regex escaping in `SimplePie::get_links()`.
- Add editor config for .neon files.
- Add PHPStan results cache to .gitignore.
- Check our PHPStan extension.
- Use `assertNull` in tests instead of `assertSame`.
`get_headers()` actually return `parsed_headers` property which we did not previously set.
…_url

Make it clearer what request it is from and match the getter method name.
It is slightly cleaner.
…ted_url

Make it clearer what request it is from and match the getter method name.
In 1.9.0, we extended `Sniffer` to take any `Response`, not just `File`, and had `Locator` pass it `Response`s. Unfortunately, `Sniffer` is not final so that change might have broken existing third-party subclasses only accepting `File`. (Though it will at least not fail with fatal contravariance error since PHP does not enforce Liskov Substitution Principle for constructors.)

Let’s go back to `Sniffer` only accepting `File`.
This should make it easier to parse for humans.
On PHP < 8.1.21, we will not be able to disable `CURLOPT_ACCEPT_ENCODING` by setting it to `NULL`:
php/php-src#11433
`curl_setopt` with such values will be a no-op so the secound `curl_exec` will still fail with `CURLE_BAD_CONTENT_ENCODING` and return an empty body.
Even worse, it will still report 200 in `CURLINFO_HTTP_CODE` so `FileClient` will not throw an exception because the `status_code` property is not 0.

Let’s extract the code so that we can create a fresh `CurlHandle` and later make the `CURLOPT_ACCEPT_ENCODING` conditional.

Unfortunately, since PHP 8.0 changed `curl_init` to return `CurlHandle` instead of `resource`, and PHPStan has no simple way to make the return type depend on PHP version, we have to resort to an ugly hack using `typeAliases`.
Setting the CURLOPT_ENCODING to 'none' is not officially supported by
cURL. Setting it in case of cURL receiving an invalid Content-Encoding
header from the server and re-trying would not fix the issue.

The current implementation already sets cURL up to send an Accept-Encoding
header with all supported encodings a couple of lines above this change.
If the fetch still returns a BAD_CONTENT_ENCODING error, the server already
ignored the Accept-Encoding headers once.

This change, instead of sending 'none' in a re-try, disables cURL's content
encoding handling (in practice, handling compression).

Co-authored-by: Jan Tojnar <jtojnar@gmail.com>
Solve warning in the case of <media:thumbnail />

Downstream: fix FreshRSS/FreshRSS#8614
channels, duration were wrongly parsed as string instead of int.

```
PHP Fatal error:  Uncaught TypeError: SimplePie\Enclosure::__construct(): Argument #12 ($duration) must be of type ?int, string given in simplepie/src/Enclosure.php:199
```

And add tests (there were none for those attributes)

Downstream issue:
* FreshRSS/FreshRSS#8701
```
PHP Fatal error: Uncaught TypeError:
SimplePie\Enclosure::__construct(): Argument #20 ($player) must be of type ?string, array given
in /opt/freshrss/lib/simplepie/simplepie/src/Enclosure.php:199
```

This happens when there is no`url` parameter.

Example of feed:
* https://feeds.feedburner.com/crunchyroll/rss/anime?lang=deDE

Downstream issue:
* FreshRSS/FreshRSS#8892
Only atom:icon and a square rss2:image can be used to represent a favicon.
The existing get_image_url() is more for logos / banners, not favicon.
Example of real-world feeds:
* https://feedpress.me/frandroid
* https://www.lesnumeriques.com/rss.xml

Downstream:
* FreshRSS/FreshRSS#5518
* FreshRSS/FreshRSS#8633
…huge feed (#977)

PHP 8.4.0 introduced a new option[1], which makes it behave differently.

>  XML_OPTION_PARSE_HUGE (int)
>     Available as of PHP 8.4.0. When libxml2 < 2.7.0 is used (e.g. on PHP 7.x), this option is enabled by default and cannot be disabled.

The default value of this new option is `0`[2]. And this option is only effective when `libxml2` is used. `libexpat` totally ignores it[3].

This patch sets this option to `1`, the old default value, when this option is defined. So everything is consistent.

Yes, this brings more risks of potential DoS attack, but it should be defended somewhere else. Because the whole XML string is already downloaded into the memory when parsing, bars could be placed during downloading, no matter which PHP version and XML lib is used.

Downstream issue:
FreshRSS/FreshRSS#8516
FreshRSS/FreshRSS#8710

1. https://www.php.net/manual/en/xml.constants.php#constant.xml-option-parse-huge
2. https://github.com/php/php-src/blob/90744ec52a50e0eacab47d5a51d82c1b26607b76/ext/xml/xml.c#L1116
3. https://wiki.php.net/rfc/xml_option_parse_huge

Signed-off-by: Kidd Lee <LeeXiaolan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.