feat(rewrite): add OpenGraph and Twitter Card preview rules#4295
Open
ChrisJr404 wants to merge 1 commit into
Open
feat(rewrite): add OpenGraph and Twitter Card preview rules#4295ChrisJr404 wants to merge 1 commit into
ChrisJr404 wants to merge 1 commit into
Conversation
Adds two new content rewrite rules — `add_open_graph` and `add_twitter_card` — that prepend the entry content with values pulled from the scraped page's `<head>` meta tags. This is useful for sites whose RSS body is sparse but whose linked page exposes rich preview metadata (Bluesky, Mastodon link posts, social previews of single-page apps, ...). The scraper now buffers the fetched HTML once and exposes the collected OG/Twitter values via a new `ScrapeResult.Metadata` map alongside the existing extracted content. The processor passes the map down to the rewrite layer through a new `RewriteContext` struct so individual rules can consume it without re-fetching the page. Both rules accept either bare property suffixes (`description`, `image`, `title`, ...) or fully-qualified keys (`og:description`, `twitter:image`). With no arguments they default to `description` + `image`. When the scraper is disabled or the requested keys are missing the rules are no-ops. Closes miniflux#4291.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4291.
What
Adds two content rewrite rules that pull values from the scraped page's
<head>:add_open_graph(\"description\", \"image\", ...)— readsog:*meta tagsadd_twitter_card(\"description\", \"image\", ...)— readstwitter:*meta tagsBoth accept either bare suffixes (
description,image,title,site_name, ...) or fully-qualified keys (og:description,twitter:image). Called without arguments they default todescription+image.When
descriptionandimageare both available the rule renders a<figure>with the image and a caption; otherwise it falls back to a paragraph. Other suffixes are rendered as a labelled paragraph (<p><strong>site_name:</strong> Example</p>). All metadata values are HTML-escaped before being written into the entry content.Why
Some sites lean so heavily on JS that scraping returns very little, but their
<head>exposes rich preview metadata. Bluesky is the example in the issue: an RSS item points at a post but only carries a short snippet, while the linked page hasog:description,og:image,twitter:description,twitter:image, etc. The new rules let users opt into using those values for the entry body.Example feed-side configuration (custom rewrite rules field):
or for a Twitter-Card-only site:
How
The scraper already fetches the page once when the crawler is enabled. The change buffers the fetched HTML so it can be parsed twice — once for the existing readability/custom-rules extraction, once for
<head>meta tags — without an extra HTTP request. The collected map is exposed on a newScrapeResultstruct (replacing the old multi-return signature onScrapeWebsite) and threaded into the rewrite layer through a newRewriteContext.When the crawler is disabled or no requested key is present the rules are no-ops, so existing feeds that do not opt in are unaffected.
Notes / open questions for reviewers
add_open_graph/add_twitter_card) follows the existingadd_*naming. Happy to rename if you prefer something more compact.description+image) was chosen to match the Bluesky-style use case in the issue. Easy to extend the defaults or expose a third helper that pulls everything available.ScrapeWebsitereturn type changed from three values to aScrapeResultstruct since metadata makes a fourth value awkward; the only callers are insideinternal/reader/processorso no external API is affected.Tests
internal/reader/scraper/metadata_test.go— covers OpenGraph extraction, Twitter Cards using bothnameandpropertyattributes, ignoring unrelated meta, first-value-wins on duplicates, and rejection of empty/whitespace content.internal/reader/rewrite/preview_meta_test.go— covers prepending image+description, default arg fallback, fully-qualified keys, family-mismatch rejection, no-metadata no-op, missing-property no-op, the labelled-paragraph fallback, and HTML escaping of attacker-controlled meta values.content_rewrite_test.goupdated for the newApplyContentRewriteRulessignature.go test ./...andgo vet ./...pass locally.