Doco — Local Document Intelligence Platform

Doco is a privacy-first, fully local document intelligence platform inspired by LandingAI. It enables high-fidelity ingestion, structured data extraction, and semantic querying of PDFs and images—all without calling any external APIs.

By combining layout-aware VLM OCR pipelines with self-correcting agentic JSON extraction and hybrid RAG, Doco runs entirely on local hardware, keeping sensitive documents 100% secure.

Interface Preview

Here is the Doco workspace in action:

Key Features

High-Fidelity Document Processing: Ingests multi-page PDFs and images using SuryaOCR (running via llama.cpp) for layout analysis, bounding box coordinates, reading-order alignment, and high-accuracy text recognition.
Agentic JSON Extraction:
- Interactive Schema Builder: Manually edit, upload a custom .json schema, or query the local VLM to automatically suggest a schema based on the document's structure.
- Self-Correcting Critique Loop: Validates LLM extractions against the target JSON schema using jsonschema. If validation fails, it feeds the exact parser errors back to the model for correction (up to 3 attempts).
- Threshold-Based Routing: Automatically routes documents based on character count to optimize processing paths (Direct VLM Extraction vs Chunked fallbacks).
Local RAG Chat Interface:
- Hybrid Search: Leverages a combined vector search (FAISS) and keyword retrieval (BM25) ensemble retriever.
- Cross-Encoder Re-ranking: Uses ms-marco-MiniLM-L-6-v2 to re-rank chunks for high-relevance search context.
- SSE Streaming: Answers user questions using Server-Sent Events (SSE) for real-time token-by-token streaming in the UI.
Dynamic Memory Optimization: Automatically loads and unloads heavy Surya OCR models and Ollama services to prevent memory leaks and run efficiently on standard consumer hardware.

Technology Stack

Backend: Python 3.11+, FastAPI, LangChain, Pydantic, jsonschema, PyPDFium2, FAISS, rank-bm25, SentenceTransformers.
Local Models:
- SuryaOCR (OCR, layout detection)
- Ollama (qwen2.5vl:7b, qwen3-embedding:0.6b, glm-ocr)
Frontend: Vanilla HTML5, CSS3, JavaScript (SSE streaming, JSON validator, responsive panes).

Getting Started

1. Prerequisites

Ensure you have Ollama installed on your system. Pull the required models:

ollama pull qwen3-embedding:0.6b
ollama pull qwen2.5vl:7b

2. Ingest dependencies

Clone the repository and set up a Python virtual environment:

# Set up virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install packages
pip install -r requirements.txt

3. Configuration (Optional)

The platform uses sensible defaults, but you can override models and other settings using an environment file. Create a .env file in the root directory:

# Example .env overrides
LLM_MODEL=qwen2.5vl:7b
EMBEDDING_MODEL=qwen3-embedding:0.6b
RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
EXTRACTION_THRESHOLD=30000

4. Start the Platform

Run the FastAPI development server:

python main.py

Open your browser and navigate to http://localhost:8000/.

Roadmap

Map-Reduce Chunked Extraction: Fully implement map-reduce aggregation for extracting schemas from massive documents that exceed VLM context boundaries.
Multi-Document Indexes: Run cross-document comparisons and search queries across the entire processed document library.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
screenshots		screenshots
scripts		scripts
static		static
.gitignore		.gitignore
README.md		README.md
config.py		config.py
extraction.py		extraction.py
main.py		main.py
rag.py		rag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doco — Local Document Intelligence Platform

Interface Preview

Key Features

Technology Stack

Getting Started

1. Prerequisites

2. Ingest dependencies

3. Configuration (Optional)

4. Start the Platform

Roadmap

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Doco — Local Document Intelligence Platform

Interface Preview

Key Features

Technology Stack

Getting Started

1. Prerequisites

2. Ingest dependencies

3. Configuration (Optional)

4. Start the Platform

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages