Doco is a privacy-first, fully local document intelligence platform inspired by LandingAI. It enables high-fidelity ingestion, structured data extraction, and semantic querying of PDFs and images—all without calling any external APIs.
By combining layout-aware VLM OCR pipelines with self-correcting agentic JSON extraction and hybrid RAG, Doco runs entirely on local hardware, keeping sensitive documents 100% secure.
Here is the Doco workspace in action:
- High-Fidelity Document Processing: Ingests multi-page PDFs and images using SuryaOCR (running via llama.cpp) for layout analysis, bounding box coordinates, reading-order alignment, and high-accuracy text recognition.
- Agentic JSON Extraction:
- Interactive Schema Builder: Manually edit, upload a custom
.jsonschema, or query the local VLM to automatically suggest a schema based on the document's structure. - Self-Correcting Critique Loop: Validates LLM extractions against the target JSON schema using
jsonschema. If validation fails, it feeds the exact parser errors back to the model for correction (up to 3 attempts). - Threshold-Based Routing: Automatically routes documents based on character count to optimize processing paths (Direct VLM Extraction vs Chunked fallbacks).
- Interactive Schema Builder: Manually edit, upload a custom
- Local RAG Chat Interface:
- Hybrid Search: Leverages a combined vector search (FAISS) and keyword retrieval (BM25) ensemble retriever.
- Cross-Encoder Re-ranking: Uses
ms-marco-MiniLM-L-6-v2to re-rank chunks for high-relevance search context. - SSE Streaming: Answers user questions using Server-Sent Events (SSE) for real-time token-by-token streaming in the UI.
- Dynamic Memory Optimization: Automatically loads and unloads heavy Surya OCR models and Ollama services to prevent memory leaks and run efficiently on standard consumer hardware.
- Backend: Python 3.11+, FastAPI, LangChain, Pydantic, jsonschema, PyPDFium2, FAISS, rank-bm25, SentenceTransformers.
- Local Models:
- SuryaOCR (OCR, layout detection)
- Ollama (
qwen2.5vl:7b,qwen3-embedding:0.6b,glm-ocr)
- Frontend: Vanilla HTML5, CSS3, JavaScript (SSE streaming, JSON validator, responsive panes).
Ensure you have Ollama installed on your system. Pull the required models:
ollama pull qwen3-embedding:0.6b
ollama pull qwen2.5vl:7bClone the repository and set up a Python virtual environment:
# Set up virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install packages
pip install -r requirements.txtThe platform uses sensible defaults, but you can override models and other settings using an environment file.
Create a .env file in the root directory:
# Example .env overrides
LLM_MODEL=qwen2.5vl:7b
EMBEDDING_MODEL=qwen3-embedding:0.6b
RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
EXTRACTION_THRESHOLD=30000Run the FastAPI development server:
python main.pyOpen your browser and navigate to http://localhost:8000/.
- Map-Reduce Chunked Extraction: Fully implement map-reduce aggregation for extracting schemas from massive documents that exceed VLM context boundaries.
- Multi-Document Indexes: Run cross-document comparisons and search queries across the entire processed document library.



