Skip to content

Commit 996c053

Browse files
committed
Update README
1 parent 071c486 commit 996c053

1 file changed

Lines changed: 6 additions & 10 deletions

File tree

README.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ exp_heatmap compare chr15_results/ --start-1 47924019 --end-1 48924019 --start-2
333333

334334
#### 9. Top-Region Extraction - `regions`
335335

336-
>Extract top-scoring windows into a TSV for downstream review or manuscript figure planning.
336+
>Extract top-scoring windows into a TSV.
337337
338338
```bash
339339
exp_heatmap regions [OPTIONS] <input_dir>
@@ -479,8 +479,6 @@ docker build -t exp-heatmap-local .
479479
docker run --rm exp-heatmap-local --help
480480
```
481481

482-
See [docs/LOCAL_REPRODUCIBILITY.md](docs/LOCAL_REPRODUCIBILITY.md) for details.
483-
484482
---
485483

486484
## Input File Formats
@@ -779,7 +777,7 @@ data = create_plot_input("results/", start=47000000, end=49000000, rank_scores="
779777

780778
## Benchmarks
781779

782-
All benchmarks were run locally on a macOS 14 / arm64 (Apple M4 Pro) workstation with 14 logical CPUs and 48 GB RAM. Python 3.12, zarr 2.x. Full benchmark scripts are provided under [`scripts/benchmarks/`](scripts/benchmarks). Raw logs, machine specs, and TSV summaries are kept in this repository's manuscript handoff bundle (see `temp_review/hand_off/benchmark_package/`).
780+
All benchmarks were run locally on a macOS 14 / arm64 (Apple M4 Pro) workstation with 14 logical CPUs and 48 GB RAM. Python 3.12, zarr 2.x. Full benchmark scripts are provided under [`scripts/benchmarks/`](scripts/benchmarks).
783781

784782
### Full pipeline on GGVP chr21
785783

@@ -824,8 +822,6 @@ python scripts/benchmarks/run_population_scaling.py \
824822
--out-dir local_data/benchmarks/population_scaling
825823
```
826824

827-
See [`docs/BENCHMARKS.md`](docs/BENCHMARKS.md) for the full reference.
828-
829825
## Validation on Public Data
830826

831827
ExP Heatmap ships with two reproducible 1000 Genomes Project validation pipelines under [`scripts/validation/`](scripts/validation). Both scripts start from public Phase 3 release URLs and run end-to-end through `filter-vcf``prepare``compute``plot`.
@@ -838,15 +834,15 @@ The `run_1kg_chr15_slc24a5.sh` script reproduces the pigmentation-locus showcase
838834
|-------|-----------:|
839835
| `filter-vcf` | 99.59 s (biallelic-SNP filter; retains 6,456,568 / 6,477,157 records) |
840836
| `prepare` | 252.98 s (VCF → Zarr) |
841-
| `compute` | 3034.42 s (XP-EHH, all 26 × 26 ordered pairs) |
837+
| `compute` | 3034.42 s (XP-EHH, all 650 ordered population pairs from 26 populations) |
842838
| `plot` (static) | 28.43 s |
843839
| `plot` (interactive) | 14.04 s |
844840

845841
Output artifacts include a 21.95 GB filtered VCF, 499.69 MB Zarr store, 2.94 GB of pairwise TSV results, and a 442 KB PNG / 21.54 MB HTML heatmap of the SLC24A5 window (chr15:47,924,019-48,924,019).
846842

847843
### chr2 / LCT locus (region-scoped reconstruction)
848844

849-
The `run_1kg_chr2_lct_reconstruction.sh` script reconstructs the canonical lactase-persistence locus as a region-scoped public-data run. Because the main manuscript LCT figure uses an archived author-prepared bundle, the script does not try to re-run whole-chromosome compute; instead it filters chromosome 2 to the plotted LCT window plus 1 Mb of flanking sequence on each side before preparing and computing.
845+
The `run_1kg_chr2_lct_reconstruction.sh` script reconstructs the canonical lactase-persistence locus as a region-scoped public-data run.The script does not try to re-run whole-chromosome compute; instead it filters chromosome 2 to the plotted LCT window plus 1 Mb of flanking sequence on each side before preparing and computing.
850846

851847
| Stage | Wall-clock |
852848
|-------|-----------:|
@@ -856,7 +852,7 @@ The `run_1kg_chr2_lct_reconstruction.sh` script reconstructs the canonical lacta
856852
| `plot` (static) | 4.61 s |
857853
| `plot` (interactive) | 2.11 s |
858854

859-
This scales the whole-locus reconstruction to ~7 minutes of wall time end-to-end while keeping the plotted window and interpretation identical. See [`docs/VALIDATION_1KG_PIPELINES.md`](docs/VALIDATION_1KG_PIPELINES.md) for details, and [`scripts/validation/summarize_validation_run.py`](scripts/validation/summarize_validation_run.py) to regenerate the JSON/Markdown summaries from log files.
855+
This scales the whole-locus reconstruction to ~7 minutes of wall time end-to-end while keeping the plotted window and interpretation identical. See [`scripts/validation/summarize_validation_run.py`](scripts/validation/summarize_validation_run.py) to regenerate the JSON/Markdown summaries from log files.
860856

861857
## 1000 Genomes Population Reference
862858

@@ -1002,7 +998,7 @@ The suite currently contains 28 tests covering rank-score generation with ties a
1002998

1003999
### Building documentation-facing assets
10041000

1005-
Reproducibility scripts, benchmark drivers, and validation pipelines live under [`scripts/`](scripts) and are covered in [`docs/`](docs).
1001+
Reproducibility scripts, benchmark drivers, and validation pipelines live under [`scripts/`](scripts).
10061002

10071003
## License
10081004

0 commit comments

Comments
 (0)