Skip to content

seismind/Optimo

Repository files navigation

OPTIMO

Optimo - Deterministic OCR Semantic Refinement Engine in Rust.

Optimo is a deterministic OCR and document analysis pipeline built in Rust. It focuses on replay invariance, semantic refinement, and auditable state transitions for degraded technical documents. Same input -> same replay -> same ROIPlan -> same semantic diff.

Keywords: deterministic OCR, Rust OCR pipeline, semantic refinement, replay invariance, auditability.

Built to explore repeatable workflows where noisy inputs (OCR today, structured docs tomorrow) are processed through a deterministic core.

Current Focus

  • reducer determinism
  • replayability
  • pipeline stability
  • adversarial tests

What's inside

  • Algebraic fold with proven properties (commutativity, idempotence, monotonicity)
  • Fuzzy clustering with Cyrillic homoglyph guardrail
  • Hard idempotency via source fingerprint deduplication
  • collision_rate_bps as a runtime convergence metric
  • Configurable similarity threshold per ingestion profile
  • Image preprocessing pipeline with Otsu thresholding
  • 114 tests across unit, property, adversarial, replay, and integration suites

Stack

Rust • Tokio • Rayon • Tesseract • JSONL

Why

Many document processes depend on unclear transformations and hard-to-audit decisions.

Optimo explores a simpler model:

Input → Normalize → Reduce → Observe → Persist

Status

Active prototype under test.

See docs/ for architecture and decisions. Full technical notes and module history: docs/LOGBOOK.md. Architecture and perimeter threat model diagrams: docs/ARCHITECTURE_AND_THREAT_MODEL.md.

About

Deterministic OCR semantic refinement engine in Rust. Event-driven pipeline with replay invariance, ROI planning, semantic diff telemetry, and auditable document analysis under degradation.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages