diff --git a/domain-skills/x/search.md b/domain-skills/x/search.md new file mode 100644 index 00000000..3198fdd5 --- /dev/null +++ b/domain-skills/x/search.md @@ -0,0 +1,145 @@ +# X (Twitter) — web search against the logged-in session + +Drive the user's **logged-in** Chrome at `x.com/search` to read search results with +the **full advanced-search operator set**. This reaches windows the public v2 +"recent search" API can't (that API is ~7 days); the web index + `since:`/`until:` +reach much further back. No cookie-stealing — it's the user's real session. + +## URL pattern + +``` +https://x.com/search?q=&f=live +``` + +- `f=live` → Latest (chronological). `f=top` → Top (engagement-ranked). `f=user` → people. +- Always URL-encode `q`. Operators below go **inside** `q`. +- `&src=typed_query` is appended by the UI but not required. + +## Advanced search operators (this is the whole point) + +Put these in `q` — combine freely: + +| Operator | Effect | +|---|---| +| `"exact phrase"` | exact match | +| `from:user` / `to:user` / `@user` | author / replies-to / mentions | +| `since:YYYY-MM-DD` `until:YYYY-MM-DD` | **arbitrary date window** (not limited to 7d) | +| `min_faves:N` `min_retweets:N` `min_replies:N` | engagement floors — great for cutting noise | +| `filter:links` `filter:media` `filter:images` `filter:videos` | content type | +| `-is:retweet` `-is:reply` `is:verified` `filter:quote` | post-type filters | +| `lang:en` | language | +| `(a OR b) c` | boolean groups | +| `url:domain.com` | links to a domain | + +Example — high-signal posts on a topic over a specific month: +`"claude code" since:2026-05-01 until:2026-06-01 min_faves:25 -is:retweet lang:en` + +## Login wall — STOP, don't auth + +If the tab lands on `/login` or `/i/flow/login`, the profile isn't logged into X. +**Stop and tell the user** — never type credentials from a screenshot. Detect it: + +```python +info = page_info() +if "/login" in info.get("url", "") or "/i/flow/login" in info.get("url", ""): + raise SystemExit("LOGIN_WALL: not logged into X in this Chrome profile") +``` + +## Loading results — scroll, it's virtualized + +The timeline lazy-loads on scroll and **recycles** DOM nodes (off-screen tweets are +removed). So extract *as you scroll*, accumulating by tweet id — don't scroll to the +bottom and then read, or you'll only see the last window. ~6–10 `scroll(dy=2400)` +steps with a short wait between covers a few hundred posts. Keep query volume modest; +aggressive scrolling on a real session can trip rate/bot walls. + +## Selectors (stable as of 2026) + +- Tweet: `article[data-testid="tweet"]` +- Body text: `[data-testid="tweetText"]` (`.innerText`) +- Permalink + timestamp + handle + id: `a[href*="/status/"] > time` — the ``'s + `href` is `//status/`; the `