Document Copilot
An internal AI chatbot that lets analysts query a corpus of documents in plain English and get sourced, citable answers.
The client
Driftwood Capital — fictional independent investment research firm. Their analysts spend half their week reading 10-Ks and 10-Qs before they can produce any original analysis. Document Copilot eats that intake work so they can skip straight to insight.
Full brief: docs/client-brief.md
Stack
| Layer | Choice |
|---|---|
| Backend | Python + FastAPI |
| Frontend | Vite + React SPA + TypeScript |
| Database | Supabase Postgres (users, chats, documents, chunks) |
| Migrations | SQLAlchemy models + Alembic |
| Retrieval | Supabase pgvector + Postgres full-text search |
| Auth | Supabase Auth (email only) |
| Hosting | Railway |
| LLM + embeddings | OpenAI |
Repo layout
document-copilot/ ├── AGENTS.md # agent instructions (read first) ├── README.md # this file ├── data/ # local corpus + download script (payloads gitignored) ├── docs/ │ └── client-brief.md # the client one-pager ├── backend/ # FastAPI service └── frontend/ # React SPA (Vite)
Prerequisites
Install these before setting up backend/ or frontend/:
| Tool | Version | Used for | Install |
|---|---|---|---|
| Python | 3.12+ | Backend runtime | OS package manager or python.org |
| uv | latest | Backend deps + data/download.py | curl -LsSf https://astral.sh/uv/install.sh | sh |
| Node.js | 20+ (LTS) | Frontend toolchain | nodejs.org or nvm install --lts |
| pnpm | latest | Frontend package manager | corepack enable && corepack prepare pnpm@latest --activate |
You also need accounts/keys for external services once the app is wired up. Start with docs/guides/supabase-setup.md (account + project), then create an OpenAI API key when the LLM layer is wired up.
Running locally
To be added during the build. Setup guides:
Sample SEC data
Use the standalone downloader to fetch a small local 10-K sample from SEC EDGAR.
Edit the params at the top of data/download.py, especially USER_AGENT, then run:
uv run data/download.py
By default this downloads the latest 5 10-K filings for AAPL, MSFT, NVDA, AMZN, and GOOGL into year folders under data/downloads/ and writes a manifest.json.
Downloaded files are gitignored; the data/ folder itself stays in git for the script and notes.