Skip to content

Commit 68378e0

Browse files
authored
Merge pull request #23 from jtrackingai/dev/zzc/v1.0.17
Improve preview click verification and subdomain crawling
2 parents 2b30036 + 9ac28e6 commit 68378e0

17 files changed

Lines changed: 363 additions & 27 deletions

SKILL.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ Once `site-analysis.json` indicates Shopify, keep discovery and grouping shared,
7070
- Use `./event-tracking runs <output-root>` when the artifact directory is unknown but the output root is known.
7171
- Prefer high-level entry commands for user-facing flows: `run-new-setup`, `run-tracking-update`, `run-upkeep`, `run-health-audit`.
7272
- Telemetry consent is a required user-choice checkpoint for richer diagnostics. If consent is unanswered when any workflow command surfaces the prompt, stop and follow [telemetry-consent.md](references/telemetry-consent.md) as the single-source interaction contract. Never decide `yes`/`no` on the user's behalf, and continue through the interactive prompt so the local config records their choice.
73+
- Explain the purpose, what `yes` does, what `no` does, and the remaining privacy tradeoff before asking the user to reply `yes` or `no`. Do not ask a bare `yes`/`no` question with no context.
7374
- Treat workflow mode metadata as an internal workflow-state layer, not a user-facing command surface.
7475
- `analyze`, `validate-schema --check-selectors`, `preview`, and `sync` each need outbound HTTP and a real Chromium; `sync` additionally needs a local loopback callback on `127.0.0.1` for Google's OAuth consent redirect. Run them in an environment that permits those capabilities so Playwright and the OAuth callback can complete.
7576
- Run prompt-driven GTM sync with an interactive TTY from the start unless exact `--account-id`, `--container-id`, and `--workspace-id` values are already confirmed.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.0.16
1+
1.0.17

package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "analytics-tracking-automation",
3-
"version": "1.0.16",
3+
"version": "1.0.17",
44
"description": "Automated web event tracking skill for GA4 + GTM",
55
"main": "dist/cli.js",
66
"bin": {
@@ -18,7 +18,7 @@
1818
"install:skills": "node scripts/install-skills.mjs",
1919
"sync:skill-docs": "node scripts/sync-skill-docs.mjs",
2020
"test": "npm run build && npm run test:built",
21-
"test:built": "node --test tests/workflow-state.test.mjs tests/live-gtm-parser.test.mjs tests/skill-family.test.mjs tests/skill-contract.test.mjs tests/auto-update.test.mjs tests/workflow-enhancements.test.mjs tests/telemetry.test.mjs tests/oauth-embedded-config.test.mjs",
21+
"test:built": "node --test tests/workflow-state.test.mjs tests/live-gtm-parser.test.mjs tests/skill-family.test.mjs tests/skill-contract.test.mjs tests/auto-update.test.mjs tests/workflow-enhancements.test.mjs tests/telemetry.test.mjs tests/oauth-embedded-config.test.mjs tests/crawler-url-utils.test.mjs",
2222
"postinstall": "playwright install chromium"
2323
},
2424
"dependencies": {

references/crawl-guide.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@ Set these expectations with the user before crawling:
1111
- Only a limited number of child pages are sampled within the same section or path, so deeper links may not be captured
1212
- Large sites may hit the **40-page total cap**; partial coverage is normal
1313
- **E-commerce sites** trigger smarter URL ordering — category/listing pages before product pages
14+
- Same registered-domain subdomains are in scope when they carry business-critical journeys. Prioritize linked subdomains such as `app.example.com`, `account.example.com`, `auth.example.com`, checkout/billing/payment hosts, login, signup, trial, pricing, contact-sales, and password-reset pages before broad blog or resource sampling.
15+
- Do not blindly deep-crawl every subdomain. Support/docs/help-center subdomains, WAF-protected pages, and unrelated operational hosts should be listed as skipped or grouped separately only when they are reachable and relevant to the tracking goal.
1416

1517
## Partial Mode Rules
1618

17-
- All URLs must belong to the same domain as the root `<URL>`
19+
- All URLs must belong to the same registered domain as the root `<URL>`; subdomains such as `app.example.com/login` are valid when they share the root registered domain.
1820
- Maximum 20 URLs per call — split into batches or use full-site mode for more
1921

2022
## Execution Environment
@@ -36,6 +38,7 @@ Set these expectations with the user before crawling:
3638

3739
- How many pages were analyzed
3840
- Any skipped URLs and why (login pages, WAF blocks, errors)
41+
- Any same-domain business subdomains that were analyzed or intentionally skipped, especially app/login/signup/checkout flows
3942
- Any `crawlWarnings`
4043
- Detected platform (`generic` or `shopify`) and detection signals when available
4144
- **Detected dataLayer events** — if the site already pushes events via `dataLayer.push()` (e.g. e-commerce `purchase`, `add_to_cart`), list them. These can be leveraged as `custom` triggerType events in Step 2 instead of re-instrumenting with click triggers.

references/event-schema-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ Use this coverage checklist when deciding whether the plan is complete enough:
8080
- **Live baseline first**: when `existingTrackingBaseline` is present, treat it as the current production baseline. Reuse existing live event names when the intent already matches, and add new events only where the live setup has real gaps.
8181
- **Solve live problems, do not just add events**: each new event or parameter upgrade should address a specific live tracking issue such as missing coverage, inconsistent naming, sparse context, or fragmented reporting targets.
8282
- **Global elements first**: Process the `global_elements` group (contentType `global`) **before** other groups. Shared header, footer, and nav elements get `pageUrlPattern: ""` and are generated **exactly once**. Other groups **skip** elements with `parentSection` of `header`, `footer`, or `nav`.
83+
- **Locale-aware page scopes**: when crawl evidence shows localized URL prefixes such as `/en`, `/zh`, `/de`, or hreflang/alternate links, generate one language-agnostic event for the shared business intent and make `pageUrlPattern` accept the optional locale prefix. Do not create separate events per language, and do not scope the event only to the default-language path unless the interaction truly exists only there. Prefer explicit discovered locale alternation, for example `example\\.com/(?:(?:en|zh|de)/)?affiliate`, over a broad `.*` pattern.
84+
- **Subdomain-aware page scopes**: when login, signup, app onboarding, checkout, billing, or account flows live on a same registered-domain subdomain such as `app.example.com`, create one business-intent event that spans the relevant root and subdomain pages as needed. Use explicit host-aware `pageUrlPattern` values such as `^https://app\\.example\\.com/(?:login|signup)(?:[/?#]|$)` instead of defaulting to only the marketing host.
8385
- Do **not** generate default `page_view` or `scroll` events. The GTM configuration tag already sends `page_view`, and `scroll` is usually auto-collected by GA4 Enhanced Measurement.
8486
- Only add a custom scroll-depth event if there is a clear analysis need. If you do, use a distinct custom event name such as `scroll_depth`, not the reserved `scroll`.
8587
- Click events for meaningful buttons: login, signup, CTA, download, share, outbound links

references/page-grouping-guide.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,9 @@ Typical site-wide shared elements include login or sign-up buttons in the top na
4747
## Coverage Expectations
4848

4949
- **Language**: Groups reflect the site's **main language**. Other language versions may not fully appear because sitemaps usually favor pages in the site's primary language.
50-
- **Can't group what wasn't crawled**: Deep links, gated content, login-protected pages won't appear.
50+
- **Business subdomains**: Same registered-domain subdomains that host login, signup, app onboarding, checkout, billing, account, trial, or contact-sales journeys should be grouped when they were discovered or manually verified as reachable. Put them in explicit groups such as `auth_pages`, `checkout_pages`, or `app_pages` rather than hiding them inside generic marketing groups.
51+
- **Unreachable subdomains**: Do not create groups for support/docs/help-center subdomains that are WAF-blocked, timeout, or are not relevant to the tracking goal. Keep them as skipped evidence instead.
52+
- **Can't group what wasn't crawled or verified**: Deep links, gated content, and login-protected pages won't appear unless they were crawled or manually verified and added with enough representative HTML/elements for schema generation.
5153
- **Don't claim completeness**: Groups represent crawled and sampled pages, not every URL on the site.
5254

5355
## Compute urlPattern

scripts/export-skills.mjs

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -167,10 +167,6 @@ function buildCliPackageJson() {
167167
}
168168

169169
function copyBundledCliRuntime(outputPath, profile) {
170-
if (profile === EXPORT_PROFILE_CLAWHUB) {
171-
return;
172-
}
173-
174170
copyDirectory('runtime/cli-runtime', path.join(outputPath, 'runtime', 'cli-runtime'));
175171

176172
const cliPackagePath = path.join(outputPath, 'runtime', 'cli-package');

scripts/skill-bundles.mjs

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -163,10 +163,6 @@ function listBundledCliFiles(repoRoot) {
163163
}
164164

165165
function listBundledCliFilesForProfile(repoRoot, profile) {
166-
if (profile === EXPORT_PROFILE_CLAWHUB) {
167-
return [];
168-
}
169-
170166
return listBundledCliFiles(repoRoot);
171167
}
172168

src/crawler/page-analyzer.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ import { chromium, Browser, Page, Response } from 'playwright';
33
import {
44
extractDomain, isSameDomain, normalizeUrl, getSectionPrefix,
55
sampleUrlsBySection, detectEcommerceSite, reorderForEcommerce,
6+
prioritizeBusinessCriticalUrls,
67
discoverNavLinks, discoverAllLinks,
78
} from './url-utils';
89
import { extractCleanedHtml } from './html-cleaner';
@@ -701,6 +702,7 @@ export async function analyzeSite(
701702
if (isEcommerce) {
702703
level1Candidates = reorderForEcommerce(level1Candidates);
703704
}
705+
level1Candidates = prioritizeBusinessCriticalUrls(level1Candidates, rootDomain);
704706

705707
const level1Urls = sampleUrlsBySection(
706708
level1Candidates,
@@ -727,6 +729,7 @@ export async function analyzeSite(
727729
if (isEcommerce) {
728730
level2Filtered = reorderForEcommerce(level2Filtered);
729731
}
732+
level2Filtered = prioritizeBusinessCriticalUrls(level2Filtered, rootDomain);
730733

731734
const level2Urls = sampleUrlsBySection(
732735
level2Filtered,

0 commit comments

Comments
 (0)