Skip to content

feat: set HTML lang attribute from feed-declared language#4330

Open
bramd wants to merge 5 commits into
miniflux:mainfrom
bramd:feature/feed-language
Open

feat: set HTML lang attribute from feed-declared language#4330
bramd wants to merge 5 commits into
miniflux:mainfrom
bramd:feature/feed-language

Conversation

@bramd

@bramd bramd commented May 14, 2026

Copy link
Copy Markdown

Description

Sets the HTML lang attribute on rendered article markup based on the language declared by the feed (or by the individual entry when it overrides the feed), so user agents can apply the right hyphenation, spell-check dictionary, and (for assistive tech / TTS) the right voice when reading articles.

  • Parse language declarations at both feed and entry level from:
    • RSS: <language> on the channel, <dc:language> on items
    • Atom 1.0 / 0.3: xml:lang on <feed> and <entry>
    • JSON Feed: top-level "language" and per-item "language"
  • Persist on new feeds.language and entries.language columns (migrations appended at the end of the array).
  • Expose via the existing Feed/Entry JSON marshalling ("language" field on /v1/feeds and /v1/entries).
  • Emit lang="..." on the entry title (<h1> in detail view, <h2> in every list view) and on the entry content <article>. The entry-level language takes precedence over the feed-level language; if both are empty no lang= attribute is emitted at all.
  • The refresh path treats language as feed-/entry-declared metadata and always trusts the latest fetched value.

The value is normalized at parse time (trim + lower-case + _-) so it is directly usable as an HTML lang attribute. No strict BCP-47 validation is performed: many real feeds use loose values, and silently dropping them yields worse downstream behaviour than passing them through.

Motivation

Today Miniflux already parses RSS's <language> element but discards it — it never reaches the model, the storage layer, or the rendered HTML. Atom's xml:lang is not parsed at all. As a result, every rendered article carries the user's UI locale on <html lang="..."> even when its content is in a different language. This change makes a single read of the feed's own language declaration flow all the way to the rendered surface, and exposes it on the API for clients that want to use it.

Testing

  • New unit tests cover the normalizer (internal/model/language_test.go), RSS parsing with and without <language> and <dc:language> (internal/reader/rss/parser_test.go), Atom parsing with and without xml:lang at both feed and entry level (internal/reader/atom/atom_10_test.go), and JSON Feed parsing with and without "language" at both feed and item level (internal/reader/json/parser_test.go).
  • go test ./... and make lint both clean.
  • End-to-end verified on a local dev build against a real Postgres DB cloned from production:
    • Both migrations apply cleanly (feeds.language and entries.language columns appear with not null default '').
    • Subscribed to/refreshed real RSS feeds (Tweakers nl-nl, NOS nl) and a real Atom feed (Invidious en-us); the API returns the expected feed-level "language" value.
    • Per-entry: with a hand-set entries.language='fr-fr' on a feed whose feed.language='nl-nl', the API and rendered HTML use the entry-level value, confirming the precedence path.
    • Rendered HTML in the web reader carries the resolved language on the corresponding <h1>, <h2>, and <article> elements.
    • Negative case verified: an entry/feed without any language declaration renders the element with no lang= attribute at all (not lang="").

Breaking changes

None. The new columns have a NOT NULL DEFAULT '', so existing rows are unaffected and the API fields are empty until the next refresh populates them.

Out of scope (intentional)

  • Feed description rendering. Feed.Description is only exposed via an editable <textarea> in the feed-settings page, not rendered as readable text anywhere, so there's no surface to attach a lang= to.
  • OPML. Outlines don't carry language metadata; no change.
  • Sanitizer. Untouched — lang= is emitted on template-level outer elements, outside the sanitized content region.

Related issues

(none — feel free to link if there's an existing tracking issue.)


Have you followed these guidelines?

@bramd bramd marked this pull request as ready for review May 14, 2026 14:50

@jvoisin jvoisin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Isn't language also per-item instead of only per-feed? Is this something that was purposefully left out from this pull-request?
  • What about the json feed format?

This is a pretty cool change, thank you for thinking about it and implementing it!

Comment thread internal/reader/handler/handler.go Outdated
Comment thread internal/model/language.go Outdated
@bramd bramd force-pushed the feature/feed-language branch from 668af90 to 43971cc Compare May 15, 2026 14:15
@bramd

bramd commented May 15, 2026

Copy link
Copy Markdown
Author

Thanks for the quick read! Both addressed in the force-pushed update:

  • Per-entry language: now parsed from Atom 1.0/0.3 <entry xml:lang>, RSS <dc:language>, and JSON Feed per-item "language", with the entry value taking precedence over the feed value at render time. Added entries.language column + tests for each format. I'd initially kept this out because in real-world feeds I've seen it's almost always declared at the top level only — per-item language tags are pretty rare in the wild — and I wanted to keep the first cut minimal. Happy to have it in now though. While in there I noticed Atom 0.3 wasn't reading feed-level xml:lang either, so that's now wired up too for parity.
  • JSON Feed: feed-level and per-item language both wired through the adapter + tests. Leaving this out was just an oversight on my part — apologies for missing it.
  • Inline nits applied (strings.ToLower(strings.TrimSpace(s)) ordering, dropped the explanatory comment in handler.go).

PR body updated to reflect the wider scope.

@jvoisin jvoisin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent pull-request

Comment thread internal/database/migrations.go
bramd added 2 commits May 17, 2026 20:04
…m, and JSON Feed

Reads the language declared by each feed and entry at parse time, persists
it on new `feeds.language` and `entries.language` columns, and exposes both
via the existing Feed/Entry JSON marshalling.

Sources:
- RSS feed: <language>
- RSS item: <dc:language>
- Atom 1.0 feed/entry: xml:lang
- Atom 0.3 feed/entry: xml:lang
- JSON Feed feed/item: "language"

Values are normalized at parse time (trim + lower-case + _ -> -) so they
are directly usable as an HTML lang attribute. No strict BCP-47 validation
is performed: many real feeds use loose values, and silently dropping them
yields worse downstream behaviour than passing them through.

The refresh path treats language as feed/entry-declared metadata and always
trusts the latest fetched value.
Renders lang="..." on the entry title (<h1> in detail view, <h2> in every
list view) and on the entry content <article>. The attribute prefers the
entry-level language and falls back to the feed-level language; if both
are empty, no lang= attribute is emitted (rather than lang="").
@bramd bramd force-pushed the feature/feed-language branch from 43971cc to 8c181b1 Compare May 17, 2026 18:06
@bramd

bramd commented May 17, 2026

Copy link
Copy Markdown
Author

@jvoisin Migrations are collapsed in one tx and conflicts are resolved by merging with main. Thanks for all the quick feedback on this PR, Go is not one of the languages I usually code in.

Comment thread internal/database/migrations.go Outdated
The merge of main into the branch collided the language migration with
the new DROP INDEX migration from bdd7f4f, producing a single malformed
function with an unterminated SQL string. Restore the index-drop
migration unchanged and append the language migration as the new last
element of the migrations array.
@bramd bramd requested a review from fguillot June 9, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants