* Fix race condition in concurrent crawling with unique source IDs - Add unique hash-based source_id generation to prevent conflicts - Separate source identification from display with three fields: - source_id: 16-char SHA256 hash for unique identification - source_url: Original URL for tracking - source_display_name: Human-friendly name for UI - Add comprehensive test suite validating the fix - Migrate existing data with backward compatibility * Fix title generation to use source_display_name for better AI context - Pass source_display_name to title generation function - Use display name in AI prompt instead of hash-based source_id - Results in more specific, meaningful titles for each source * Skip AI title generation when display name is available - Use source_display_name directly as title to avoid unnecessary AI calls - More efficient and predictable than AI-generated titles - Keep AI generation only as fallback for backward compatibility * Fix critical issues from code review - Add missing os import to prevent NameError crash - Remove unused imports (pytest, Mock, patch, hashlib, urlparse, etc.) - Fix GitHub API capitalization consistency - Reuse existing DocumentStorageService instance - Update test expectations to match corrected capitalization Addresses CodeRabbit review feedback on PR #472 * Add safety improvements from code review - Truncate display names to 100 chars when used as titles - Document hash collision probability (negligible for <1M sources) Simple, pragmatic fixes per KISS principle * Fix code extraction to use hash-based source_ids and improve display names - Fixed critical bug where code extraction was using old domain-based source_ids - Updated code extraction service to accept source_id as parameter instead of extracting from URL - Added special handling for llms.txt and sitemap.xml files in display names - Added comprehensive tests for source_id handling in code extraction - Removed unused urlparse import from code_extraction_service.py This fixes the foreign key constraint errors that were preventing code examples from being stored after the source_id architecture refactor. Co-Authored-By: Claude <noreply@anthropic.com> * Fix critical variable shadowing and source_type determination issues - Fixed variable shadowing in document_storage_operations.py where source_url parameter was being overwritten by document URLs, causing incorrect source_url in database - Fixed source_type determination to use actual URLs instead of hash-based source_id - Added comprehensive tests for source URL preservation - Ensure source_type is correctly set to "file" for file uploads, "url" for web crawls The variable shadowing bug was causing sitemap sources to have the wrong source_url (last crawled page instead of sitemap URL). The source_type bug would mark all sources as "url" even for file uploads due to hash-based IDs not starting with "file_". Co-Authored-By: Claude <noreply@anthropic.com> * Fix URL canonicalization and document metrics calculation - Implement proper URL canonicalization to prevent duplicate sources - Remove trailing slashes (except root) - Remove URL fragments - Remove tracking parameters (utm_*, gclid, fbclid, etc.) - Sort query parameters for consistency - Remove default ports (80 for HTTP, 443 for HTTPS) - Normalize scheme and domain to lowercase - Fix avg_chunks_per_doc calculation to avoid division by zero - Track processed_docs count separately from total crawl_results - Handle all-empty document sets gracefully - Show processed/total in logs for better visibility - Add comprehensive tests for both fixes - 10 test cases for URL canonicalization edge cases - 4 test cases for document metrics calculation This prevents database constraint violations when crawling the same content with URL variations and provides accurate metrics in logs. * Fix synchronous extract_source_summary blocking async event loop - Run extract_source_summary in thread pool using asyncio.to_thread - Prevents blocking the async event loop during AI summary generation - Preserves exact error handling and fallback behavior - Variables (source_id, combined_content) properly passed to thread Added comprehensive tests verifying: - Function runs in thread without blocking - Error handling works correctly with fallback - Multiple sources can be processed - Thread safety with variable passing * Fix synchronous update_source_info blocking async event loop - Run update_source_info in thread pool using asyncio.to_thread - Prevents blocking the async event loop during database operations - Preserves exact error handling and fallback behavior - All kwargs properly passed to thread execution Added comprehensive tests verifying: - Function runs in thread without blocking - Error handling triggers fallback correctly - All kwargs are preserved when passed to thread - Existing extract_source_summary tests still pass * Fix race condition in source creation using upsert - Replace INSERT with UPSERT for new sources to prevent PRIMARY KEY violations - Handles concurrent crawls attempting to create the same source - Maintains existing UPDATE behavior for sources that already exist Added comprehensive tests verifying: - Concurrent source creation doesn't fail - Upsert is used for new sources (not insert) - Update is still used for existing sources - Async concurrent operations work correctly - Race conditions with delays are handled This prevents database constraint errors when multiple crawls target the same URL simultaneously. * Add migration detection UI components Add MigrationBanner component with clear user instructions for database schema updates. Add useMigrationStatus hook for periodic health check monitoring with graceful error handling. * Integrate migration banner into main app Add migration status monitoring and banner display to App.tsx. Shows migration banner when database schema updates are required. * Enhance backend startup error instructions Add detailed Docker restart instructions and migration script guidance. Improves user experience when encountering startup failures. * Add database schema caching to health endpoint Implement smart caching for schema validation to prevent repeated database queries. Cache successful validations permanently and throttle failures to 30-second intervals. Replace debug prints with proper logging. * Clean up knowledge API imports and logging Remove duplicate import statements and redundant logging. Improves code clarity and reduces log noise. * Remove unused instructions prop from MigrationBanner Clean up component API by removing instructions prop that was accepted but never rendered. Simplifies the interface and eliminates dead code while keeping the functional hardcoded migration steps. * Add schema_valid flag to migration_required health response Add schema_valid: false flag to health endpoint response when database schema migration is required. Improves API consistency without changing existing behavior. --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|---|---|---|
| .claude | ||
| .github | ||
| archon-ui-main | ||
| docs | ||
| migration | ||
| PRPs/templates | ||
| python | ||
| .dockerignore | ||
| .env.example | ||
| .gitignore | ||
| check-env.js | ||
| CLAUDE.md | ||
| CONTRIBUTING.md | ||
| docker-compose.docs.yml | ||
| docker-compose.yml | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
Power up your AI coding assistants with your own custom knowledge base and task management as an MCP server
Quick Start • What's Included • Architecture
🎯 What is Archon?
Archon is currently in beta! Expect things to not work 100%, and please feel free to share any feedback and contribute with fixes/new features! Thank you to everyone for all the excitement we have for Archon already, as well as the bug reports, PRs, and discussions. It's a lot for our small team to get through but we're committed to addressing everything and making Archon into the best tool it possibly can be!
Archon is the command center for AI coding assistants. For you, it's a sleek interface to manage knowledge, context, and tasks for your projects. For the AI coding assistant(s), it's a Model Context Protocol (MCP) server to collaborate on and leverage the same knowledge, context, and tasks. Connect Claude Code, Kiro, Cursor, Windsurf, etc. to give your AI agents access to:
- Your documentation (crawled websites, uploaded PDFs/docs)
- Smart search capabilities with advanced RAG strategies
- Task management integrated with your knowledge base
- Real-time updates as you add new content and collaborate with your coding assistant on tasks
- Much more coming soon to build Archon into an integrated environment for all context engineering
This new vision for Archon replaces the old one (the agenteer). Archon used to be the AI agent that builds other agents, and now you can use Archon to do that and more.
It doesn't matter what you're building or if it's a new/existing codebase - Archon's knowledge and task management capabilities will improve the output of any AI driven coding.
🔗 Important Links
- GitHub Discussions - Join the conversation and share ideas about Archon
- Contributing Guide - How to get involved and contribute to Archon
- Introduction Video - Getting started guide and vision for Archon
- Archon Kanban Board - Where maintainers are managing issues/features
- Dynamous AI Mastery - The birthplace of Archon - come join a vibrant community of other early AI adopters all helping each other transform their careers and businesses!
Quick Start
Prerequisites
- Docker Desktop
- Node.js 18+ (for hybrid development mode)
- Supabase account (free tier or local Supabase both work)
- OpenAI API key (Gemini and Ollama are supported too!)
- (OPTIONAL) Make (see Installing Make below)
Setup Instructions
-
Clone Repository:
git clone https://github.com/coleam00/archon.gitcd archon -
Environment Configuration:
cp .env.example .env # Edit .env and add your Supabase credentials: # SUPABASE_URL=https://your-project.supabase.co # SUPABASE_SERVICE_KEY=your-service-key-hereNOTE: Supabase introduced a new type of service key but use the legacy one (the longer one).
OPTIONAL: If you want to enable the reranking RAG strategy, uncomment lines 20-22 in
python\requirements.server.txt. This will significantly increase the size of the Archon Server container which is why it's off by default. -
Database Setup: In your Supabase project SQL Editor, copy, paste, and execute the contents of
migration/complete_setup.sql -
Start Services (choose one):
Full Docker Mode (Recommended for Normal Archon Usage)
docker compose up --build -d # or, to match a previously used explicit profile: docker compose --profile full up --build -d # or make dev-docker # (Alternative: Requires make installed )This starts all core microservices in Docker:
- Server: Core API and business logic (Port: 8181)
- MCP Server: Protocol interface for AI clients (Port: 8051)
- Agents (coming soon!): AI operations and streaming (Port: 8052)
- UI: Web interface (Port: 3737)
Ports are configurable in your .env as well!
-
Configure API Keys:
- Open http://localhost:3737
- Go to Settings → Select your LLM/embedding provider and set the API key (OpenAI is default)
- Test by uploading a document or crawling a website
🚀 Quick Command Reference
| Command | Description |
|---|---|
make dev |
Start hybrid dev (backend in Docker, frontend local) ⭐ |
make dev-docker |
Everything in Docker |
make stop |
Stop all services |
make test |
Run all tests |
make lint |
Run linters |
make install |
Install dependencies |
make check |
Check environment setup |
make clean |
Remove containers and volumes (with confirmation) |
🔄 Database Reset (Start Fresh if Needed)
If you need to completely reset your database and start fresh:
⚠️ Reset Database - This will delete ALL data for Archon!
-
Run Reset Script: In your Supabase SQL Editor, run the contents of
migration/RESET_DB.sql⚠️ WARNING: This will delete all Archon specific tables and data! Nothing else will be touched in your DB though.
-
Rebuild Database: After reset, run
migration/complete_setup.sqlto create all the tables again. -
Restart Services:
docker compose --profile full up -d -
Reconfigure:
- Select your LLM/embedding provider and set the API key again
- Re-upload any documents or re-crawl websites
The reset script safely removes all tables, functions, triggers, and policies with proper dependency handling.
🛠️ Installing Make (OPTIONAL)
Make is required for the local development workflow. Installation varies by platform:
Windows
# Option 1: Using Chocolatey
choco install make
# Option 2: Using Scoop
scoop install make
# Option 3: Using WSL2
wsl --install
# Then in WSL: sudo apt-get install make
macOS
# Make comes pre-installed on macOS
# If needed: brew install make
Linux
# Debian/Ubuntu
sudo apt-get install make
# RHEL/CentOS/Fedora
sudo yum install make
⚡ Quick Test
Once everything is running:
- Test Web Crawling: Go to http://localhost:3737 → Knowledge Base → "Crawl Website" → Enter a doc URL (such as https://ai.pydantic.dev/llms-full.txt)
- Test Document Upload: Knowledge Base → Upload a PDF
- Test Projects: Projects → Create a new project and add tasks
- Integrate with your AI coding assistant: MCP Dashboard → Copy connection config for your AI coding assistant
📚 Documentation
Core Services
| Service | Container Name | Default URL | Purpose |
|---|---|---|---|
| Web Interface | archon-ui | http://localhost:3737 | Main dashboard and controls |
| API Service | archon-server | http://localhost:8181 | Web crawling, document processing |
| MCP Server | archon-mcp | http://localhost:8051 | Model Context Protocol interface |
| Agents Service | archon-agents | http://localhost:8052 | AI/ML operations, reranking |
What's Included
🧠 Knowledge Management
- Smart Web Crawling: Automatically detects and crawls entire documentation sites, sitemaps, and individual pages
- Document Processing: Upload and process PDFs, Word docs, markdown files, and text documents with intelligent chunking
- Code Example Extraction: Automatically identifies and indexes code examples from documentation for enhanced search
- Vector Search: Advanced semantic search with contextual embeddings for precise knowledge retrieval
- Source Management: Organize knowledge by source, type, and tags for easy filtering
🤖 AI Integration
- Model Context Protocol (MCP): Connect any MCP-compatible client (Claude Code, Cursor, even non-AI coding assistants like Claude Desktop)
- 10 MCP Tools: Comprehensive yet simple set of tools for RAG queries, task management, and project operations
- Multi-LLM Support: Works with OpenAI, Ollama, and Google Gemini models
- RAG Strategies: Hybrid search, contextual embeddings, and result reranking for optimal AI responses
- Real-time Streaming: Live responses from AI agents with progress tracking
📋 Project & Task Management
- Hierarchical Projects: Organize work with projects, features, and tasks in a structured workflow
- AI-Assisted Creation: Generate project requirements and tasks using integrated AI agents
- Document Management: Version-controlled documents with collaborative editing capabilities
- Progress Tracking: Real-time updates and status management across all project activities
🔄 Real-time Collaboration
- WebSocket Updates: Live progress tracking for crawling, processing, and AI operations
- Multi-user Support: Collaborative knowledge building and project management
- Background Processing: Asynchronous operations that don't block the user interface
- Health Monitoring: Built-in service health checks and automatic reconnection
Architecture
Microservices Structure
Archon uses true microservices architecture with clear separation of concerns:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend UI │ │ Server (API) │ │ MCP Server │ │ Agents Service │
│ │ │ │ │ │ │ │
│ React + Vite │◄──►│ FastAPI + │◄──►│ Lightweight │◄──►│ PydanticAI │
│ Port 3737 │ │ SocketIO │ │ HTTP Wrapper │ │ Port 8052 │
│ │ │ Port 8181 │ │ Port 8051 │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │ │
└────────────────────────┼────────────────────────┼────────────────────────┘
│ │
┌─────────────────┐ │
│ Database │ │
│ │ │
│ Supabase │◄──────────────┘
│ PostgreSQL │
│ PGVector │
└─────────────────┘
Service Responsibilities
| Service | Location | Purpose | Key Features |
|---|---|---|---|
| Frontend | archon-ui-main/ |
Web interface and dashboard | React, TypeScript, TailwindCSS, Socket.IO client |
| Server | python/src/server/ |
Core business logic and APIs | FastAPI, service layer, Socket.IO broadcasts, all ML/AI operations |
| MCP Server | python/src/mcp/ |
MCP protocol interface | Lightweight HTTP wrapper, 10 MCP tools, session management |
| Agents | python/src/agents/ |
PydanticAI agent hosting | Document and RAG agents, streaming responses |
Communication Patterns
- HTTP-based: All inter-service communication uses HTTP APIs
- Socket.IO: Real-time updates from Server to Frontend
- MCP Protocol: AI clients connect to MCP Server via SSE or stdio
- No Direct Imports: Services are truly independent with no shared code dependencies
Key Architectural Benefits
- Lightweight Containers: Each service contains only required dependencies
- Independent Scaling: Services can be scaled independently based on load
- Development Flexibility: Teams can work on different services without conflicts
- Technology Diversity: Each service uses the best tools for its specific purpose
🔧 Configuring Custom Ports & Hostname
By default, Archon services run on the following ports:
- archon-ui: 3737
- archon-server: 8181
- archon-mcp: 8051
- archon-agents: 8052
- archon-docs: 3838 (optional)
Changing Ports
To use custom ports, add these variables to your .env file:
# Service Ports Configuration
ARCHON_UI_PORT=3737
ARCHON_SERVER_PORT=8181
ARCHON_MCP_PORT=8051
ARCHON_AGENTS_PORT=8052
ARCHON_DOCS_PORT=3838
Example: Running on different ports:
ARCHON_SERVER_PORT=8282
ARCHON_MCP_PORT=8151
Configuring Hostname
By default, Archon uses localhost as the hostname. You can configure a custom hostname or IP address by setting the HOST variable in your .env file:
# Hostname Configuration
HOST=localhost # Default
# Examples of custom hostnames:
HOST=192.168.1.100 # Use specific IP address
HOST=archon.local # Use custom domain
HOST=myserver.com # Use public domain
This is useful when:
- Running Archon on a different machine and accessing it remotely
- Using a custom domain name for your installation
- Deploying in a network environment where
localhostisn't accessible
After changing hostname or ports:
- Restart Docker containers:
docker compose down && docker compose --profile full up -d - Access the UI at:
http://${HOST}:${ARCHON_UI_PORT} - Update your AI client configuration with the new hostname and MCP port
🔧 Development
Quick Start
# Install dependencies
make install
# Start development (recommended)
make dev # Backend in Docker, frontend local with hot reload
# Alternative: Everything in Docker
make dev-docker # All services in Docker
# Stop everything (local FE needs to be stopped manually)
make stop
Development Modes
Hybrid Mode (Recommended) - make dev
Best for active development with instant frontend updates:
- Backend services run in Docker (isolated, consistent)
- Frontend runs locally with hot module replacement
- Instant UI updates without Docker rebuilds
Full Docker Mode - make dev-docker
For all services in Docker environment:
- All services run in Docker containers
- Better for integration testing
- Slower frontend updates
Testing & Code Quality
# Run tests
make test # Run all tests
make test-fe # Run frontend tests
make test-be # Run backend tests
# Run linters
make lint # Lint all code
make lint-fe # Lint frontend code
make lint-be # Lint backend code
# Check environment
make check # Verify environment setup
# Clean up
make clean # Remove containers and volumes (asks for confirmation)
Viewing Logs
# View logs using Docker Compose directly
docker compose logs -f # All services
docker compose logs -f archon-server # API server
docker compose logs -f archon-mcp # MCP server
docker compose logs -f archon-ui # Frontend
Note: The backend services are configured with --reload flag in their uvicorn commands and have source code mounted as volumes for automatic hot reloading when you make changes.
🔍 Troubleshooting
Common Issues and Solutions
Port Conflicts
If you see "Port already in use" errors:
# Check what's using a port (e.g., 3737)
lsof -i :3737
# Stop all containers and local services
make stop
# Change the port in .env
Docker Permission Issues (Linux)
If you encounter permission errors with Docker:
# Add your user to the docker group
sudo usermod -aG docker $USER
# Log out and back in, or run
newgrp docker
Windows-Specific Issues
- Make not found: Install Make via Chocolatey, Scoop, or WSL2 (see Installing Make)
- Line ending issues: Configure Git to use LF endings:
git config --global core.autocrlf false
Frontend Can't Connect to Backend
- Check backend is running:
curl http://localhost:8181/health - Verify port configuration in
.env - For custom ports, ensure both
ARCHON_SERVER_PORTandVITE_ARCHON_SERVER_PORTare set
Docker Compose Hangs
If docker compose commands hang:
# Reset Docker Compose
docker compose down --remove-orphans
docker system prune -f
# Restart Docker Desktop (if applicable)
Hot Reload Not Working
- Frontend: Ensure you're running in hybrid mode (
make dev) for best HMR experience - Backend: Check that volumes are mounted correctly in
docker-compose.yml - File permissions: On some systems, mounted volumes may have permission issues
📈 Progress
📄 License
Archon Community License (ACL) v1.2 - see LICENSE file for details.
TL;DR: Archon is free, open, and hackable. Run it, fork it, share it - just don't sell it as-a-service without permission.
