|
| 1 | +# HTMLview.mcc Improvement Plan |
| 2 | + |
| 3 | +Multi-session plan. Each phase is independently mergeable. Work top-down |
| 4 | +unless a later phase is explicitly marked as dependency-free. |
| 5 | + |
| 6 | +**Status legend:** `[ ]` not started · `[~]` in progress · `[x]` done |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Phase 1 — Extract net hook into reusable static library `[x]` |
| 11 | + |
| 12 | +**Goal.** Every consumer currently has to copy the ~600-line hook out of |
| 13 | +`mcc/test_image_hook.h`. Move it into a proper library so bug fixes |
| 14 | +propagate and later phases (TLS verify, cache, gzip, cookies) land in one |
| 15 | +place. |
| 16 | + |
| 17 | +**Touches.** |
| 18 | +- New: `mcc/net_hook/htmlview_nethook.h` — public API. |
| 19 | +- New: `mcc/net_hook/htmlview_nethook.c` — implementation (code moved |
| 20 | + from `test_image_hook.h`, no behavioural change). |
| 21 | +- `mcc/Makefile` — add `libhtmlview_nethook.a` per-OS; link test programs |
| 22 | + against it. |
| 23 | +- `mcc/SimpleTest.c`, `mcc/LibLoad_Test.c` — drop `#include |
| 24 | + "test_image_hook.h"`, call the new API. |
| 25 | +- Delete `mcc/test_image_hook.h`. |
| 26 | + |
| 27 | +**Public API (initial).** |
| 28 | +```c |
| 29 | +ULONG HTMLviewNet_HookFunc(struct Hook *hook, APTR obj, |
| 30 | + struct HTMLview_LoadMsg *msg); |
| 31 | +void HTMLviewNet_InitHook(struct Hook *hook); /* sets h_Entry per-platform */ |
| 32 | +``` |
| 33 | +
|
| 34 | +**Acceptance.** |
| 35 | +- OS3, OS4, MorphOS build clean via the Docker images in `MEMORY.md`. |
| 36 | +- SimpleTest + LibLoad_Test still load local images, plain HTTP, HTTPS on |
| 37 | + OS3/OS4 (MorphOS: HTTP only, unchanged). |
| 38 | +- `libhtmlview_nethook.a` present in `bin_<os>/` after a build. |
| 39 | +
|
| 40 | +**Notes for the next session.** |
| 41 | +- The hook already uses task-local bsdsocket bases; the host task never |
| 42 | + needs to open bsdsocket. Drop those `OpenLibrary("bsdsocket.library")` |
| 43 | + calls from both test programs. |
| 44 | +- `HAVE_AMISSL` should be a per-object compile flag on the lib's `.o`, not |
| 45 | + on every caller. Keep `AMISSL_SDK_READY` depending on the lib. |
| 46 | +
|
| 47 | +--- |
| 48 | +
|
| 49 | +## Phase 2 — Certificate verification `[x]` |
| 50 | +
|
| 51 | +**Depends on:** Phase 1. |
| 52 | +
|
| 53 | +**Goal.** Current TLS uses `SSL_VERIFY_NONE`. Anyone on the network can |
| 54 | +MITM. Wire a CA bundle and flip to `VERIFY_PEER`. |
| 55 | +
|
| 56 | +**Delivered.** |
| 57 | +- `HTMLviewNet_SetCABundle(path)` — NULL restores auto-discovery. |
| 58 | +- `HTMLviewNet_SetVerifyMode(mode)` — `AUTO` (default), `NONE`, `PEER`. |
| 59 | +- AUTO discovers from: user override → `AmiSSL:Certs/curl-ca-bundle.crt` |
| 60 | + → `ENVARC:AmiSSL/Certs/...` → `ENV:AmiSSL/Certs/...` |
| 61 | + → `SYS:Storage/AmiSSL/Certs/...`. |
| 62 | +- When verifying: `SSL_CTX_load_verify_locations` + `SSL_VERIFY_PEER` |
| 63 | + and hostname check via `SSL_set1_host` (RFC 6125). |
| 64 | +- Failure diagnostics log `X509_verify_cert_error_string()` so the user |
| 65 | + can tell a cert issue from a handshake issue. |
| 66 | +- Setters compile in non-AmiSSL builds too (MorphOS) to keep |
| 67 | + cross-platform callers link-clean. |
| 68 | +
|
| 69 | +**Notes for next session.** |
| 70 | +- Cert store is re-loaded per-request (each transfer builds its own |
| 71 | + SSL_CTX). Cheap enough for now; revisit if latency matters. |
| 72 | +- `T:htmlview_hook.log` carries the verify decision ("CA bundle loaded |
| 73 | + from …" or "no CA bundle found, continuing without verification"). |
| 74 | +
|
| 75 | +--- |
| 76 | +
|
| 77 | +## Phase 3 — On-disk image/page cache `[x]` (v1) |
| 78 | +
|
| 79 | +**Depends on:** Phase 1. |
| 80 | +
|
| 81 | +**Goal.** The hook refetches every image on every reload. On Amiga |
| 82 | +networking, this is painful; a tiny keyed cache gives huge wins. |
| 83 | +
|
| 84 | +**Delivered (v1).** |
| 85 | +- `HTMLviewNet_SetCacheDir(path)` / `HTMLviewNet_SetCacheTTL(secs)`. |
| 86 | +- Disabled by default; `SetCacheDir(NULL)` clears. |
| 87 | +- Key = 16-char hex FNV-1a-64 of URL; entries stored as `<key>.body` + |
| 88 | + `<key>.meta` file pairs in the configured directory. |
| 89 | +- Meta format: `URL=<url>\nExpires=<seconds since 1978>\n`. |
| 90 | +- Read path: if body+meta exist and `now < Expires`, serve the body file |
| 91 | + directly (opened as `st->file`) with no network I/O. |
| 92 | +- Write path: on 200 with Content-Length <= 8 MB and not chunked, drain |
| 93 | + the whole body, persist body+meta with TTL = `HVN_CacheTTL`. |
| 94 | +- Log entries: `cache: HIT`, `cache: STORE`, size-skip reasons. |
| 95 | +
|
| 96 | +**Deferred to v2.** |
| 97 | +- `ETag` / `Last-Modified` capture + `If-None-Match`/`If-Modified-Since` |
| 98 | + revalidation (serve 304 cheaply). |
| 99 | +- `Cache-Control: max-age=` + `Expires` server hints. |
| 100 | +- Chunked-transfer caching. |
| 101 | +- Eviction policy (cache dir currently grows unbounded). |
| 102 | +
|
| 103 | +**Acceptance.** `SetCacheDir("T:htmlcache/")`, reload a page, look for |
| 104 | +`cache: HIT` in `T:htmlview_hook.log` on the second fetch. |
| 105 | +
|
| 106 | +--- |
| 107 | +
|
| 108 | +## Phase 4 — UTF-8 / charset handling `[ ]` |
| 109 | +
|
| 110 | +**Goal.** Pages with `<meta charset="utf-8">` or HTTP `Content-Type: |
| 111 | +text/html; charset=utf-8` currently render mojibake. Minimum viable: |
| 112 | +detect charset, transliterate UTF-8 → Latin-1 (replacing unrepresentable |
| 113 | +codepoints with `?` or the nearest entity). |
| 114 | +
|
| 115 | +**Touches.** |
| 116 | +- `mcc/Entities.cpp`, `mcc/ParseMessage.cpp` most likely. |
| 117 | +- Possibly new `mcc/Charset.cpp` for the transliteration table. |
| 118 | +- Hook contract gains: net hook passes the `Content-Type` charset hint |
| 119 | + into the parser (either via `HTMLview_LoadMsg` extension or side-channel |
| 120 | + tag — needs a small design call). |
| 121 | +
|
| 122 | +**Acceptance.** A UTF-8 page with `© é ñ` renders them as the Latin-1 |
| 123 | +byte equivalents, not as two-byte sequences. |
| 124 | +
|
| 125 | +--- |
| 126 | +
|
| 127 | +## Phase 5 — CSS subset `[ ]` |
| 128 | +
|
| 129 | +**Goal.** Stop dropping `style=` on the floor. Scope: inline + `<style>` |
| 130 | +block. |
| 131 | +
|
| 132 | +**Supported properties.** `color`, `background-color`, `font-family`, |
| 133 | +`font-size`, `font-weight`, `font-style`, `text-align`, `margin`, |
| 134 | +`padding`, `border` on block elements. |
| 135 | +
|
| 136 | +**Supported selectors.** type, `.class`, `#id`, descendant. No pseudo- |
| 137 | +classes, no cascade beyond "most specific wins by source order". |
| 138 | +
|
| 139 | +**Not supporting (document clearly).** Flexbox/grid, media queries, the |
| 140 | +full cascade, pseudo-elements, `!important`. |
| 141 | +
|
| 142 | +**Touches.** Core parser + layout. New `mcc/CSS.cpp`, `mcc/CSS.h`. |
| 143 | +
|
| 144 | +**Acceptance.** A curated test page exercising the subset renders with |
| 145 | +the expected colours, margins, alignment. |
| 146 | +
|
| 147 | +--- |
| 148 | +
|
| 149 | +## Phase 6 — `picture.datatype` fallback for unknown formats `[ ]` |
| 150 | +
|
| 151 | +**Goal.** Built-in decoders cover GIF/JPEG/PNG. Everything else (WebP, |
| 152 | +SVG, AVIF) currently fails. Fall through to `picture.datatype` when no |
| 153 | +internal decoder matches; fix the "only works with P96" wrinkle noted in |
| 154 | +the README. |
| 155 | +
|
| 156 | +**Touches.** `mcc/ImageManager.cpp`, `mcc/_ImageDecoder.c`, |
| 157 | +`mcc/IM_Output.cpp`. |
| 158 | +
|
| 159 | +**Acceptance.** A page referencing a `.webp` image renders correctly on a |
| 160 | +system with `webp.datatype` installed. |
| 161 | +
|
| 162 | +--- |
| 163 | +
|
| 164 | +## Phase 7 — Three long-standing known bugs `[ ]` |
| 165 | +
|
| 166 | +From `doc/MCC_HTMLview.readme`: |
| 167 | +
|
| 168 | +1. **Text outside `<body>` is not shown.** HTML 4 does not require the |
| 169 | + tag. Remove the gate in the parser. Touches `ParseMessage.cpp` / |
| 170 | + `classes/BodyClass.cpp`. |
| 171 | +2. **Entities without trailing `;` are ignored.** Implement HTML5's |
| 172 | + named-entity matching for the common set (`&`, ` `, `©`, |
| 173 | + `<`, `>`). Touches `Entities.cpp`. |
| 174 | +3. **Floyd–Steinberg tile seams.** The dither currently leaks error |
| 175 | + across tile edges making backgrounds look striped. Clamp error |
| 176 | + distribution at tile boundaries. Touches `IM_Dither.cpp`. |
| 177 | +
|
| 178 | +Each sub-task is independently small; can be three commits. |
| 179 | +
|
| 180 | +--- |
| 181 | +
|
| 182 | +## Phase 8 — Headless rendering regression tests `[ ]` |
| 183 | +
|
| 184 | +**Goal.** CI currently only proves the thing *builds*. Add golden-file |
| 185 | +tests that diff the parse tree / layout output for known HTML inputs. |
| 186 | +
|
| 187 | +**Mechanism.** Build a small Linux host harness (not Amiga) that compiles |
| 188 | +the parser + layout and dumps an intermediate representation. Snapshot |
| 189 | +against `testdata/golden/*.txt`. |
| 190 | +
|
| 191 | +**Touches.** New `tests/` directory, new CI job. |
| 192 | +
|
| 193 | +**Acceptance.** `make test` (on Linux host) exits 0 for unchanged HTML |
| 194 | +and non-zero when any golden file no longer matches. |
| 195 | +
|
| 196 | +--- |
| 197 | +
|
| 198 | +## Phase 9 — Memory-safety pass `[ ]` |
| 199 | +
|
| 200 | +**Goal.** 1990s C++ with manual buffer management. Get a Linux host build |
| 201 | +running under AddressSanitizer; feed it normal and malformed HTML. Fix |
| 202 | +whatever falls out. |
| 203 | +
|
| 204 | +**Scope.** `ParseMessage.cpp`, `ScanArgs.cpp`, `Entities.cpp`, and |
| 205 | +anything else the fuzzer reaches. The recent `snprintf` fix in the image |
| 206 | +hook suggests siblings exist. |
| 207 | +
|
| 208 | +**Acceptance.** ASan-clean for the test corpus; afl fuzzer runs for 1h |
| 209 | +with no new crashes. |
| 210 | +
|
| 211 | +--- |
| 212 | +
|
| 213 | +## Phase 10 — gzip `Content-Encoding` in the net hook `[ ]` |
| 214 | +
|
| 215 | +**Depends on:** Phase 1. |
| 216 | +
|
| 217 | +**Goal.** Modern web servers default to gzip. Without it, we get garbled |
| 218 | +bodies or have to request `identity` (which some CDNs ignore). Inflate |
| 219 | +via xadmaster.library if present, else bundled miniz (~40 KB). |
| 220 | +
|
| 221 | +**Touches.** Net hook only. |
| 222 | +
|
| 223 | +**Acceptance.** A gzip-encoded response decodes transparently; no |
| 224 | +regression on identity responses. |
| 225 | +
|
| 226 | +--- |
| 227 | +
|
| 228 | +## Phase 11 — Cookie jar hook (opt-in) `[ ]` |
| 229 | +
|
| 230 | +**Goal.** Enable logged-in content without the class owning storage. New |
| 231 | +`MUIA_HTMLview_CookieHook`: called with the URL + `Set-Cookie` lines on |
| 232 | +response, called to fetch applicable cookies before request. |
| 233 | +
|
| 234 | +**Touches.** `mcc/HTMLview_mcc.h` (new attr), net hook (Set-Cookie / |
| 235 | +Cookie wiring), classes that trigger requests. |
| 236 | +
|
| 237 | +**Acceptance.** A curated test server that requires a cookie loads |
| 238 | +successfully when the app persists it; not loading when the app doesn't. |
| 239 | +
|
| 240 | +--- |
| 241 | +
|
| 242 | +## Phase 12 — `target="_blank"` notification `[ ]` |
| 243 | +
|
| 244 | +**Goal.** Currently `MUIA_HTMLview_Target` is passive — hosts must parse |
| 245 | +target strings. Add a dedicated notify carrying URL + target so hosts can |
| 246 | +trivially open a new tab/window. |
| 247 | +
|
| 248 | +**Touches.** `mcc/HTMLview_mcc.h` (new attr or method), `Dispatcher.cpp`, |
| 249 | +anchor handling. |
| 250 | +
|
| 251 | +--- |
| 252 | +
|
| 253 | +## Phase 13 — Modernize docs `[x]` |
| 254 | +
|
| 255 | +**Delivered.** |
| 256 | +- `README` rewritten: GitHub URL, Docker build commands for all three |
| 257 | + platforms, description of the net-hook static library + its optional |
| 258 | + runtime knobs. |
| 259 | +- `TODO` replaced with a short bucket list that points at this file. |
| 260 | +- `doc/MCC_HTMLview.readme` updated: v13.6, GitHub URL, "prototype" |
| 261 | + language removed, each long-standing limitation is now cross- |
| 262 | + referenced to its IMPROVEMENTS phase. |
| 263 | +
|
| 264 | +**Deferred.** Adding a `mcc/examples/README.md` for SimpleTest + |
| 265 | +LibLoad_Test -- out of scope for this pass; the test programs are |
| 266 | +self-documenting via their HTML payloads. |
| 267 | +
|
| 268 | +--- |
| 269 | +
|
| 270 | +## Build / test cheat sheet |
| 271 | +
|
| 272 | +```bash |
| 273 | +# OS3 |
| 274 | +docker run --rm -v "$(pwd):/work" -w /work/mcc \ |
| 275 | + sacredbanana/amiga-compiler:m68k-amigaos sh -c "make -f makefile OS=os3" |
| 276 | +
|
| 277 | +# OS4 |
| 278 | +docker run --rm -v "$(pwd):/work" -w /work/mcc \ |
| 279 | + sacredbanana/amiga-compiler:ppc-amigaos sh -c "make -f makefile OS=os4" |
| 280 | +
|
| 281 | +# MorphOS |
| 282 | +docker run --rm -v "$(pwd):/work" -w /work/mcc \ |
| 283 | + sacredbanana/amiga-compiler:ppc-morphos sh -c "make -f makefile OS=mos" |
| 284 | +``` |
0 commit comments