Skip to content

Refactor to ES modules#334

Draft
vict0rsch wants to merge 131 commits intomasterfrom
refactor-es-modules
Draft

Refactor to ES modules#334
vict0rsch wants to merge 131 commits intomasterfrom
refactor-es-modules

Conversation

@vict0rsch
Copy link
Copy Markdown
Owner

Large PR to take a step towards more robust SWE & testing practices by refactoring the entire codebase to use ES Modules

vict0rsch added 30 commits July 20, 2025 16:19
Don't bundle, expose, re-bundle but use `octokit` as a dependency
Because of the debug bundle used in DEV mode
Not complete, difficult to disambiguate functional or Github Cache issues
vict0rsch and others added 30 commits November 23, 2025 02:10
* chore: cherry-pick non-build changes from build-extension-js PR #337

- fix: chemrxiv parsing (add doi pattern)
- feat: puppeteer-extra with stealth plugin
- test: add manual console scripts for URL testing
- chore: Prettier formatting (trailing commas, ternary alignment)
- fix: ar5iv modal cancel button (JS listener instead of onclick)
- fix: popup URL detection for /action/ paths
- fix: .catch() on initSyncAndState for graceful failure
- fix: try/catch around sampleAsciiArt with fallback
- fix: try/catch around makePaper with logError

Made-with: Cursor

* feat: migrate build system from Rollup to WXT

- Replace Rollup + web-ext + Python release script with WXT
- Create wxt.config.js with manifest, aliases, and HTML include plugin
- Restructure entry points into src/entrypoints/ (WXT convention)
- Install jquery/select2 via npm, create jquery-setup.js for globals
- Move static assets to public/ (CSS, data JSON, theme.js, icons)
- Update chrome.runtime.getURL paths for WXT output structure
- Wrap background.js in defineBackground() for WXT compatibility
- Wrap content_script.js in defineContentScript() with CSS imports
- Import debug.js module in all HTML entry points for PMDebug global
- Update test paths to use .output/chrome-mv3/ directory
- Simplify CI: single npm run build produces Chrome + Firefox
- Update package.json scripts for WXT dev/build/zip commands
- Remove rollup.config.js, committed bundles, vendored jQuery/select2
- Add .output/ and .wxt/ to .gitignore

Made-with: Cursor

* fix: resolve ESM timing issues in popup tests after WXT migration

With WXT's ESM bundling, popup module loading is async, so tests
that reload the popup must wait for full initialization before
interacting with UI elements.

- Add window.__pmPopupReady flag set at end of popupMain() to signal
  when all click handlers are attached (including early return path)
- Add .catch() on fire-and-forget popupMain() call to set flag on error
- Update test helpers (waitForPopupReady, setPreference,
  setupPopupWithMockedTab, setupPageWithData, setPreferencesAndReload)
  to wait for __pmPopupReady instead of just networkidle0 or PMDebug
- Fix dark mode test: check for dark.css instead of dark.min.css
  (WXT serves unminified CSS in public/)
- Fix select2 initialization in jquery-setup.js for ESM default export

Made-with: Cursor

* docs: rewrite contributing.md for WXT build system

- Replace all Rollup references with WXT equivalents
- Add "What is WXT?" section explaining the build tool
- Document new build commands (dev, build, zip for both browsers)
- Add step-by-step "Loading the extension" instructions for Chrome/Firefox
- Explain WXT entry point convention (src/entrypoints/)
- Document jquery-setup.js, HTML includes, static assets in public/
- Update file structure tree to reflect current layout
- Simplify PMDebug docs (available in all pages, not dev-only)
- Clean up Tests section (remove duplication, update commands)
- Fix setPreferencesAndReload to wait for popup readiness after reload

Made-with: Cursor

* chore: run prettier on src/ and fix invalid HTML

Format all files in src/ with prettier. Fix invalid HTML that
prettier couldn't parse: remove void element end tags (</input>),
replace <p> tags wrapping block elements (<ul>, <ol>) with <div>.

Made-with: Cursor

* chore: change build output directory from .output to dist/

Made-with: Cursor

* refactor: eliminate entrypoint code duplication

src/background/background.js and src/content_scripts/content_script.js
now export initBackground() and initContentScript() respectively.
The WXT entrypoints are reduced to thin wrappers that call these
functions, making the originals the single source of truth.

Also fills the contributing.md Prettier section and updates the
entrypoints documentation to reflect the new structure.

Made-with: Cursor

* fix: three critical runtime bugs

- bibMatcher: fix broken options link (remove relative path prefix so
  WXT-flattened output resolves correctly)
- parsers/makePaper: re-throw 'Unknown paper source' errors instead of
  swallowing them silently, preserving the intentional failure signal
- parsers/findCellPii: guard against undefined venue when no Cell
  journal matches the parsed ISSN (prevents TypeError on .split())

Made-with: Cursor

* chore: clean up dead code, deps, and minor fixes

- Remove 12 Rollup/Babel/web-ext devDependencies (dead since rollup.config.js deleted)
- Delete duplicate src/data/ JSON files (public/data/ is the runtime source)
- Delete dead src/shared/js/theme.js (replaced by public/theme.js with correct WXT paths)
- Derive __DEV__ from WXT build command instead of hardcoding false
- Tighten popup URL guard: pathname.startsWith("/action") instead of broad href.includes
- Sanitize path separators in download filename to prevent directory traversal

Made-with: Cursor

* chore: apply Prettier formatting

Made-with: Cursor

* chore: ignore logs

* fix: missing css & feedback in dev mode

* fix: regenerate package-lock.json for npm ci compatibility

Made-with: Cursor
The lockfile was generated with npm 11 (Node 25) which uses different
peer-dependency semantics. npm 10 (Node 22, used in CI) couldn't
resolve it, causing `npm ci` to fail with missing @emnapi/* packages.

Made-with: Cursor
* fix: Array.prototype.last mutating original array

The .last() implementation called this.reverse() which mutates the
array in place. Every call permanently reversed the array, corrupting
data when called more than once. Use index-based access instead.

Made-with: Cursor

* fix: OpenReview regex, dead code, and MV3 executeScript in background.js

- Fix regex capturing only last character of OpenReview IDs by moving
  the + quantifier inside the capture group.
- Remove unreachable dead code in fetchPWCData (after early return).
- Replace undefined setTitleCode/setFaviconCode with MV3-compatible
  chrome.scripting.executeScript using func + args.
- Remove unused identifier variable in pushSyncPapers.

Made-with: Cursor

* fix: variable declarations and scoping bugs across files

- data.js: add const to major/minor in versionToSemantic (implicit globals)
- parsers.js: replace undefined error() with logError() in extractCrossrefData
- content_script.js: add const to abstractH2 in huggingfacePapers
- options.js: add i parameter to validateImportPaper, add const to pwcMatch

Made-with: Cursor

* fix: mergePapers addDate logic and contentScriptCallbacks default

- Fix inverted addDate comparison in syncMerge: use < instead of >
  to keep the oldest add date as intended by the comment.
- Add done() to default contentScriptCallbacks to prevent TypeError
  when callers don't provide it.

Made-with: Cursor

* fix: prepareOverwriteData map destructuring bug

Object.entries().map((k, v) => ...) was using the index as v instead
of the value. Fix to destructure as ([k, v]) and filter empty arrays.

Made-with: Cursor

* fix: prevent infinite loop in makeTitle

Replace while(1) with a bounded loop (max 20 iterations) to prevent
the function from running indefinitely for the lifetime of the tab.

Made-with: Cursor

* security: sanitize HTML in content_script.js to prevent XSS

- Add escapeHtml utility to functions.js for shared use.
- Escape paper.venue and bibtex year in makePaperMemoryHTMLDiv,
  displayPaperVenue, and huggingfacePapers.
- Validate codeLink protocol and escape in displayPaperCode to
  prevent javascript: URI injection.
- Escape paper.id in updateCompleteSecretHTML content attribute.

Made-with: Cursor

* security: sanitize HTML in popup templates and options page

- templates.js: escape paper.title, note, codeLink, author, and tag
  names before HTML insertion in getMemoryItemHTML and getPopupEditFormHTML.
- options.js: escape title/authors in getAutoTagHTML value attributes,
  paper.title in addPreprintUpdate, URLs in import feedback, update
  content values in startMatching, and error objects in sync feedback.

Made-with: Cursor

* security: sanitize HTML in functions.js, data.js, and bibMatcher.js

- functions.js: escape URL and title in copyHyperLinkToClipboard,
  escape error stack lines in stringifyError.
- data.js: escape validation warning keys and values in
  prepareOverwriteData.
- bibMatcher.js: escape paper titles, citation keys, venues, and
  sources in updateMatchedTitles and matchBibliography status output.

Made-with: Cursor

* security: add sender validation and download sanitization in background.js

- Validate sender.id matches extension ID on all messages to reject
  messages from external extensions/web pages.
- Improve filename sanitization to strip all filesystem-unsafe chars,
  collapse consecutive dots, and limit length.
- Validate pdfUrl protocol before initiating download.

Made-with: Cursor

* security: fix PAT logging and autoTag ReDoS

- Redact GitHub PAT from console log output in options.js.
- Replace regex-based autoTag matching with case-insensitive string
  includes to eliminate ReDoS risk from user-supplied patterns.

Made-with: Cursor

* fix: address code review and security review findings

Made-with: Cursor

* ci: add build step before running tests

The WXT migration removed pre-committed bundles, so
dist/chrome-mv3 must be built before Puppeteer can load the extension.

Made-with: Cursor
* chore: ignore env files

* build: update firefox id

* docs: add firefox source code review

* build: add wxt submit scripts and bump wxt to 0.20.25

Add submit, submit:dry-run, and per-browser variants (submit:chrome,
submit:firefox, and their :dry-run counterparts) so credentials and
individual store uploads can be exercised locally without going through
CI.

Made-with: Cursor

* chore: read manifest version from package.json

The manifest version was previously hardcoded in wxt.config.js, so
bumping a release meant updating two files in lockstep. Read it from
package.json via createRequire instead — package.json is now the single
source of truth for the extension version.

Made-with: Cursor

* ci: extract reusable build and test-matrix workflows

Move the body of build.yml and test.yml's test-matrix job into
workflow_call-only files (_build.yml, _test-matrix.yml) so the submit
workflow can reuse them as gating jobs without duplicating configuration.
build.yml and test.yml now just delegate to the reusable workflows;
behavior is unchanged for push and pull_request events.

The leading underscore in the filenames signals "not a top-level
workflow" — they only run via workflow_call.

Made-with: Cursor

* ci: overhaul submit-to-stores workflow

Rewrite submit.yml around wxt submit with safer defaults:

- Manual dispatch only (no auto-submission on release) so publishing to
  the stores is always a deliberate act.
- New target input (both / chrome / firefox) and dry_run input
  (default true) so credentials can be validated without uploading and
  partial failures can be retried for a single store.
- Gate submission on the reusable _test-matrix.yml and _build.yml
  workflows so store submissions can't ship a broken build.
- Split Chrome and Firefox into separate steps, each scoped to only
  its own secrets.
- Use npm ci (not npm install) and enable npm cache for deterministic,
  fast builds.
- Quote zip globs so wxt does the expansion, not the shell.
- Add a submit-stores concurrency group to queue overlapping dispatches.

Made-with: Cursor

* docs: document GitHub workflows in contributing.md

Add a Github Workflows section after Tests describing the four workflow
files, the submit.yml dispatch inputs (target, dry_run), required
secrets grouped per store, local npm script equivalents, and the
concurrency guard.

Made-with: Cursor

* style: auto-fix lint and formatting

Made-with: Cursor

* fix: restore valid firefox extension id

The gecko.id must match what AMO has on file for the existing
paper-memory listing (version 1.0.3). The previous change to
"paper-memory" is not a valid Mozilla extension id (must be either
email-style or a UUID in braces) and would have been rejected by
AMO, or orphaned existing users if it had gone through.

Made-with: Cursor

* test: update meta test for reusable workflow matrix

The test-matrix job's strategy.matrix.include moved from test.yml into
the reusable _test-matrix.yml. Parse both workflow files so the meta
test keeps catching test-*.js files that are not wired into CI.

Made-with: Cursor

* ci: drop redundant build gate from submit workflow

The submit job already runs `npm ci && npm run zip`, which covers
everything the separate _build.yml job did. Keeping it as a needs:
gate only duplicated work and created the false invariant that the
validated zip was the submitted one (it wasn't \u2014 submit rebuilt
from scratch). The real gate is test-matrix.

Made-with: Cursor

* build: quote glob patterns in submit npm scripts

Without quotes, the shell expands `dist/*-chrome.zip` before wxt sees
it. If dist/ contains multiple versioned zips (likely after repeated
local runs), a single --chrome-zip flag would receive multiple paths
and wxt would misparse or pick the wrong version. Matches the quoting
already used in submit.yml.

Made-with: Cursor

* ci: scope secrets passed to reusable workflows

Replace `secrets: inherit` with an explicit pass-through of only the
secret `_test-matrix.yml` declares as required. Also move the
github_pat interpolation out of the `command:` string into the
step's `env:` block, which avoids interpolating secrets directly
into shell strings.

Made-with: Cursor

* ci: sha-pin nick-fields/retry action

This action receives VICT0RSCH_GITHUB_PAT via env, so pinning to a
mutable major tag (v3) is a supply-chain risk. Pin to the immutable
commit SHA for v3.0.2 with a trailing tag comment for future
upgrades.

Made-with: Cursor

* ci: declare least-privilege permissions on workflows

Default GITHUB_TOKEN permissions depend on repo settings and can
include write access. Declare `contents: read` at the top of every
workflow file so the token is guaranteed read-only regardless of
repo defaults.

Made-with: Cursor

* chore: tighten .env gitignore and sources-zip excludes

- Add bare `.env` to .gitignore (`.env.*` glob requires at least
  one char after the dot, so a plain `.env` would NOT be matched).
- Add defensive excludes for `.env*`, `keys.json`, `credentials*`,
  `*.pem`, `*.key` to wxt's excludeSources so any stray secret file
  in the workspace is kept out of the Firefox sources zip uploaded
  to AMO.

Made-with: Cursor

* docs: sync workflow docs with refactor

- Correct workflow file count (four \u2192 five) in the workflow table.
- Clarify that submit.yml gates on test-matrix only (build gate was
  dropped as redundant with the submit job's own zip step).
- Fix wording about which ref the submit workflow uses.
- Update a matrix entry label now that the matrix definition lives
  in _test-matrix.yml, not test.yml.

Made-with: Cursor
* refactor(sources): introduce BasePaperSource registry, one module per source

Stateless `BasePaperSource` subclass per paper source under
`src/shared/js/utils/sources/` plus a central registry
(`sources/index.js`) exposing `getSource`, `allSources`, `matchUrl`,
`sourceFromIs`, `preprintSources`, and a `knownPaperPages` shim.

`paper.js`, `urls.js`, `parsers.js`, `data.js`, and `functions.js` now
dispatch through the registry instead of large per-source switch / if
chains. `parsers.js` keeps only shared HTTP/DOM/BibTeX helpers; the
`make<Source>Paper` family is gone. `preprintMatching.js` extracts
`tryPWCMatch`, `tryPreprintMatch`, `tryCrossRef`, `tryDBLP`,
`trySemanticScholar`, `tryGoogleScholar`, `tryUnpaywall`, and
`decodeHtml` from the old `parsers.js`. `initPaper` and `autoTagPaper`
move to `paper.js` as lifecycle (not parsing) helpers.

Bundled cleanups:

- `getDisplayId(paper)` (was `getDisplayId(id)`) — removes a
  `state.papers` global read; nature/acs/iop suffixes flow through
  `BasePaperSource.displayId`.
- `sourceExtras.springer.types`, `overrideORConfs`,
  `overridePMLRConfs` colocated on their respective source classes.
- `preprintSources` array replaced by `static isPreprint = true` on
  arxiv/biorxiv; registry exposes `preprintSources()`.
- Drop unreachable `default: xkcd/1969` branches in
  `paperToAbs`/`paperToPDF`.
- `config.js` keeps DBLP overrides only and re-exports
  `knownPaperPages` / `preprintSources` from the registry for
  back-compat consumers (e.g. `PMUtils.config.knownPaperPages`).
- `PMDebug` exposes `sources` and `preprintMatching` modules.

Made-with: Cursor

* fix(urls): tolerate undefined ids from matched sources in parseIdFromUrl

Legacy `parseIdFromUrl` returned `undefined` when a publisher source
matched (acm, ieee, springer, aip, …) but the paper was not in
storage yet; `addOrUpdatePaper` handled that by falling through to
`makePaper`. The registry rewrite started throwing in that case,
breaking every first-visit publisher URL with
"`parseIdFromUrl` failed, unknown paper url".

Track whether any source matched; only throw when no source matched
and the URL is not a localFile / parsedWebsite.

Made-with: Cursor

* docs: document the source registry workflow and deferred items

contributing.md:

- Rewrite "Adding a paper source" as a single-file step targeting
  `sources/<name>.js` + registry registration.
- Add a "Why stateless handlers, not class instances" rationale
  (POJO storage, MV3 service-worker lifecycle).
- Update PMDebug module list (`sources`, `preprintMatching`) and
  refresh "Key source files" to mention `sources/` and
  `preprintMatching.js`.
- Note that the Puppeteer suites (`test-duplicates`, `test-storage`,
  `test-sync`) are smoke checks and not expected to be fully green.

todo.md (new): out-of-scope follow-ups (wiley key overwrite, science
publisher, IOP m513, parsers split, isPaper redesign, per-source UI
hooks, missing website fixture, residual `state.papers` reads, and
the integration-test policy).

Made-with: Cursor

* chore: minor manual-open-urls fixups and docs whitespace

- test/manual-open-urls.js: correct APS sample URL and resolve the
  per-row id prefix from the row object so `idPrefix` overrides work.
- docs/features.md: re-indent keyboard-nav <img> wrapper.

Made-with: Cursor

* fix(debug): make getPapers a function

* logs(memory): improve log error in memory search failure

* feat(bibtex): also ignore `abstractNote`

* feat(bibtex): also consider `booktitle` for the venue
in fetchBibtexToPaper

* fix: manual test script

* test(manual): move to global execution not anonymous function
And standardize async/await

* docs: update contributions guidelines

* refactor(sources): standardize `fetchBibtexToPaper` usage

* refactor: tiny fixes from todo.md (deleted)

* style: auto-fix prettier formatting

Made-with: Cursor

* chore: format doc file

* fix: address adversarial review findings

- arxiv, biorxiv: drop venue() overrides that returned "" unconditionally so
  they inherit BasePaperSource.venue (extractVenueFromNote fallback), matching
  legacy makeVenue "case arxiv/biorxiv: break;" semantics for the m450 migration
- sources/index: revert preprintSources from a function to an eager array so
  external consumers of PMUtils.config.preprintSources keep working; update the
  one internal caller in paper.js:findFuzzyPaperMatch accordingly
- functions.getDisplayId: accept either a paper object or a bare id string for
  backward compatibility with pre-refactor callers (debug helpers, userscripts)
- sources/base: document the functions.js <-> sources/index.js circular import
  constraint so future source modules don't TDZ at load time
- test/manual-open-urls: clarify the testOnly filter usage comment
- contributing.md: fix ALL_SOURCES -> BY_NAME / SOURCE_DISPATCH_ORDER doc drift

Made-with: Cursor

* refactor(sources): less redundancy, more programmatic handling

* chore: format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant