Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

Document Copilot

An internal AI chatbot that lets analysts query a corpus of documents in plain English and get sourced, citable answers.

The client

Driftwood Capital — fictional independent investment research firm. Their analysts spend half their week reading 10-Ks and 10-Qs before they can produce any original analysis. Document Copilot eats that intake work so they can skip straight to insight.

Full brief: docs/client-brief.md

Stack

LayerChoice
BackendPython + FastAPI
FrontendVite + React SPA + TypeScript
DatabaseSupabase Postgres (users, chats, documents, chunks)
MigrationsSQLAlchemy models + Alembic
RetrievalSupabase pgvector + Postgres full-text search
AuthSupabase Auth (email only)
HostingRailway
LLM + embeddingsOpenAI

Repo layout

document-copilot/ ├── AGENTS.md # agent instructions (read first) ├── README.md # this file ├── data/ # local corpus + download script (payloads gitignored) ├── docs/ │ └── client-brief.md # the client one-pager ├── backend/ # FastAPI service └── frontend/ # React SPA (Vite)

Prerequisites

Install these before setting up backend/ or frontend/:

ToolVersionUsed forInstall
Python3.12+Backend runtimeOS package manager or python.org
uvlatestBackend deps + data/download.pycurl -LsSf https://astral.sh/uv/install.sh | sh
Node.js20+ (LTS)Frontend toolchainnodejs.org or nvm install --lts
pnpmlatestFrontend package managercorepack enable && corepack prepare pnpm@latest --activate

You also need accounts/keys for external services once the app is wired up. Start with docs/guides/supabase-setup.md (account + project), then create an OpenAI API key when the LLM layer is wired up.

Running locally

To be added during the build. Setup guides:

Sample SEC data

Use the standalone downloader to fetch a small local 10-K sample from SEC EDGAR. Edit the params at the top of data/download.py, especially USER_AGENT, then run:

uv run data/download.py

By default this downloads the latest 5 10-K filings for AAPL, MSFT, NVDA, AMZN, and GOOGL into year folders under data/downloads/ and writes a manifest.json. Downloaded files are gitignored; the data/ folder itself stays in git for the script and notes.

关于 About

No description, website, or topics provided.

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
4
Total Commits
峰值: 4次/周
Less
More

核心贡献者 Contributors