Archon/archon-ui-main
Wirasm 3e204b0be1
Fix race condition in concurrent crawling with unique source IDs (#472)
* Fix race condition in concurrent crawling with unique source IDs

- Add unique hash-based source_id generation to prevent conflicts
- Separate source identification from display with three fields:
  - source_id: 16-char SHA256 hash for unique identification
  - source_url: Original URL for tracking
  - source_display_name: Human-friendly name for UI
- Add comprehensive test suite validating the fix
- Migrate existing data with backward compatibility

* Fix title generation to use source_display_name for better AI context

- Pass source_display_name to title generation function
- Use display name in AI prompt instead of hash-based source_id
- Results in more specific, meaningful titles for each source

* Skip AI title generation when display name is available

- Use source_display_name directly as title to avoid unnecessary AI calls
- More efficient and predictable than AI-generated titles
- Keep AI generation only as fallback for backward compatibility

* Fix critical issues from code review

- Add missing os import to prevent NameError crash
- Remove unused imports (pytest, Mock, patch, hashlib, urlparse, etc.)
- Fix GitHub API capitalization consistency
- Reuse existing DocumentStorageService instance
- Update test expectations to match corrected capitalization

Addresses CodeRabbit review feedback on PR #472

* Add safety improvements from code review

- Truncate display names to 100 chars when used as titles
- Document hash collision probability (negligible for <1M sources)

Simple, pragmatic fixes per KISS principle

* Fix code extraction to use hash-based source_ids and improve display names

- Fixed critical bug where code extraction was using old domain-based source_ids
- Updated code extraction service to accept source_id as parameter instead of extracting from URL
- Added special handling for llms.txt and sitemap.xml files in display names
- Added comprehensive tests for source_id handling in code extraction
- Removed unused urlparse import from code_extraction_service.py

This fixes the foreign key constraint errors that were preventing code examples
from being stored after the source_id architecture refactor.

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix critical variable shadowing and source_type determination issues

- Fixed variable shadowing in document_storage_operations.py where source_url parameter
  was being overwritten by document URLs, causing incorrect source_url in database
- Fixed source_type determination to use actual URLs instead of hash-based source_id
- Added comprehensive tests for source URL preservation
- Ensure source_type is correctly set to "file" for file uploads, "url" for web crawls

The variable shadowing bug was causing sitemap sources to have the wrong source_url
(last crawled page instead of sitemap URL). The source_type bug would mark all
sources as "url" even for file uploads due to hash-based IDs not starting with "file_".

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix URL canonicalization and document metrics calculation

- Implement proper URL canonicalization to prevent duplicate sources
  - Remove trailing slashes (except root)
  - Remove URL fragments
  - Remove tracking parameters (utm_*, gclid, fbclid, etc.)
  - Sort query parameters for consistency
  - Remove default ports (80 for HTTP, 443 for HTTPS)
  - Normalize scheme and domain to lowercase

- Fix avg_chunks_per_doc calculation to avoid division by zero
  - Track processed_docs count separately from total crawl_results
  - Handle all-empty document sets gracefully
  - Show processed/total in logs for better visibility

- Add comprehensive tests for both fixes
  - 10 test cases for URL canonicalization edge cases
  - 4 test cases for document metrics calculation

This prevents database constraint violations when crawling the same
content with URL variations and provides accurate metrics in logs.

* Fix synchronous extract_source_summary blocking async event loop

- Run extract_source_summary in thread pool using asyncio.to_thread
- Prevents blocking the async event loop during AI summary generation
- Preserves exact error handling and fallback behavior
- Variables (source_id, combined_content) properly passed to thread

Added comprehensive tests verifying:
- Function runs in thread without blocking
- Error handling works correctly with fallback
- Multiple sources can be processed
- Thread safety with variable passing

* Fix synchronous update_source_info blocking async event loop

- Run update_source_info in thread pool using asyncio.to_thread
- Prevents blocking the async event loop during database operations
- Preserves exact error handling and fallback behavior
- All kwargs properly passed to thread execution

Added comprehensive tests verifying:
- Function runs in thread without blocking
- Error handling triggers fallback correctly
- All kwargs are preserved when passed to thread
- Existing extract_source_summary tests still pass

* Fix race condition in source creation using upsert

- Replace INSERT with UPSERT for new sources to prevent PRIMARY KEY violations
- Handles concurrent crawls attempting to create the same source
- Maintains existing UPDATE behavior for sources that already exist

Added comprehensive tests verifying:
- Concurrent source creation doesn't fail
- Upsert is used for new sources (not insert)
- Update is still used for existing sources
- Async concurrent operations work correctly
- Race conditions with delays are handled

This prevents database constraint errors when multiple crawls target
the same URL simultaneously.

* Add migration detection UI components

Add MigrationBanner component with clear user instructions for database schema updates. Add useMigrationStatus hook for periodic health check monitoring with graceful error handling.

* Integrate migration banner into main app

Add migration status monitoring and banner display to App.tsx. Shows migration banner when database schema updates are required.

* Enhance backend startup error instructions

Add detailed Docker restart instructions and migration script guidance. Improves user experience when encountering startup failures.

* Add database schema caching to health endpoint

Implement smart caching for schema validation to prevent repeated database queries. Cache successful validations permanently and throttle failures to 30-second intervals. Replace debug prints with proper logging.

* Clean up knowledge API imports and logging

Remove duplicate import statements and redundant logging. Improves code clarity and reduces log noise.

* Remove unused instructions prop from MigrationBanner

Clean up component API by removing instructions prop that was accepted but never rendered. Simplifies the interface and eliminates dead code while keeping the functional hardcoded migration steps.

* Add schema_valid flag to migration_required health response

Add schema_valid: false flag to health endpoint response when database schema migration is required. Improves API consistency without changing existing behavior.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-29 14:54:16 +03:00
..
__mocks__ The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
docs The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
public Updating the Logo for Archon 2025-08-18 13:59:49 -05:00
src Fix race condition in concurrent crawling with unique source IDs (#472) 2025-08-29 14:54:16 +03:00
test Add a test 2025-08-25 10:34:08 +03:00
.dockerignore The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
.eslintrc.cjs The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
.gitignore The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
Dockerfile Improve development environment with Docker Compose profiles (#435) 2025-08-22 17:18:10 +03:00
index.html Updating the Logo for Archon 2025-08-18 13:59:49 -05:00
package-lock.json The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
package.json The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
postcss.config.js The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
README.md The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
tailwind.config.js The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
tsconfig.json The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
tsconfig.node.json The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00
vite.config.ts fix: resolve container startup race condition in agents service (#451) (#503) 2025-08-29 14:42:04 +03:00
vitest.config.ts Add comprehensive test coverage for document CRUD operations 2025-08-18 13:27:20 +03:00

Archon UI - Knowledge Engine Web Interface

A modern React-based web interface for the Archon Knowledge Engine MCP Server. Built with TypeScript, Vite, and Tailwind CSS.

🎨 UI Overview

Archon UI provides a comprehensive dashboard for managing your AI's knowledge base:

UI Architecture

Key Features

  • 📊 MCP Dashboard: Monitor and control the MCP server
  • ⚙️ Settings Management: Configure credentials and RAG strategies
  • 🕷️ Web Crawling: Crawl documentation sites and build knowledge base
  • 📚 Knowledge Management: Browse, search, and organize knowledge items
  • 💬 Interactive Chat: Test RAG queries with real-time responses
  • 📈 Real-time Updates: WebSocket-based live updates across the UI

🏗️ Architecture

Technology Stack

  • React 18.3: Modern React with hooks and functional components
  • TypeScript: Full type safety and IntelliSense support
  • Vite: Fast build tool and dev server
  • Tailwind CSS: Utility-first styling
  • Framer Motion: Smooth animations and transitions
  • Lucide Icons: Beautiful and consistent iconography
  • React Router: Client-side routing

Project Structure

archon-ui-main/
├── src/
│   ├── components/          # Reusable UI components
│   │   ├── ui/             # Base UI components (Button, Card, etc.)
│   │   ├── layouts/        # Layout components (Sidebar, Header)
│   │   └── animations/     # Animation components
│   ├── pages/              # Page components
│   │   ├── MCPPage.tsx     # MCP Dashboard
│   │   ├── Settings.tsx    # Settings page
│   │   ├── Crawl.tsx       # Web crawling interface
│   │   ├── KnowledgeBase.tsx # Knowledge management
│   │   └── Chat.tsx        # RAG chat interface
│   ├── services/           # API and service layers
│   │   ├── api.ts          # Base API configuration
│   │   ├── mcpService.ts   # MCP server communication
│   │   └── chatService.ts  # Chat/RAG service
│   ├── contexts/           # React contexts
│   │   └── ToastContext.tsx # Toast notifications
│   ├── hooks/              # Custom React hooks
│   │   └── useStaggeredEntrance.ts # Animation hook
│   ├── types/              # TypeScript type definitions
│   └── lib/                # Utility functions
├── public/                 # Static assets
└── test/                   # Test files

📄 Pages Documentation

1. MCP Dashboard (/mcp)

The central control panel for the MCP server.

Components:

  • Server Control Panel: Start/stop server, view status, select transport mode
  • Server Logs Viewer: Real-time log streaming with auto-scroll
  • Available Tools Table: Dynamic tool discovery and documentation
  • MCP Test Panel: Interactive tool testing interface

Features:

  • Dual transport support (SSE/stdio)
  • Real-time status polling (5-second intervals)
  • WebSocket-based log streaming
  • Copy-to-clipboard configuration
  • Tool parameter validation

2. Settings (/settings)

Comprehensive configuration management.

Sections:

  • Credentials:
    • OpenAI API key (encrypted storage)
    • Supabase connection details
    • MCP server configuration
  • RAG Strategies:
    • Contextual Embeddings toggle
    • Hybrid Search toggle
    • Agentic RAG (code extraction) toggle
    • Reranking toggle

Features:

  • Secure credential storage with encryption
  • Real-time validation
  • Toast notifications for actions
  • Default value management

3. Web Crawling (/crawl)

Interface for crawling documentation sites.

Components:

  • URL Input: Smart URL validation
  • Crawl Options: Max depth, concurrent sessions
  • Progress Monitoring: Real-time crawl status
  • Results Summary: Pages crawled, chunks stored

Features:

  • Intelligent URL type detection
  • Sitemap support
  • Recursive crawling
  • Batch processing

4. Knowledge Base (/knowledge)

Browse and manage your knowledge items.

Components:

  • Knowledge Grid: Card-based knowledge display
  • Search/Filter: Search by title, type, tags
  • Knowledge Details: View full item details
  • Actions: Delete, refresh, organize

Features:

  • Pagination support
  • Real-time updates via WebSocket
  • Type-based filtering (technical/business)
  • Metadata display

5. RAG Chat (/chat)

Interactive chat interface for testing RAG queries.

Components:

  • Chat Messages: Threaded conversation view
  • Input Area: Query input with source selection
  • Results Display: Formatted RAG results
  • Source Selector: Filter by knowledge source

Features:

  • Real-time streaming responses
  • Source attribution
  • Markdown rendering
  • Copy functionality

🧩 Component Library

Base UI Components

Button

<Button 
  variant="primary|secondary|ghost" 
  size="sm|md|lg"
  accentColor="blue|green|purple|orange|pink"
  onClick={handleClick}
>
  Click me
</Button>

Card

<Card accentColor="blue" className="p-6">
  <h3>Card Title</h3>
  <p>Card content</p>
</Card>

LoadingSpinner

<LoadingSpinner size="sm|md|lg" />

Layout Components

Sidebar

  • Collapsible navigation
  • Active route highlighting
  • Icon + text navigation items
  • Responsive design

Header

  • Dark mode toggle
  • User menu
  • Breadcrumb navigation

Animation Components

PageTransition

Wraps pages with smooth fade/slide animations:

<PageTransition>
  <YourPageContent />
</PageTransition>

🔌 Services

mcpService

Handles all MCP server communication:

  • startServer(): Start the MCP server
  • stopServer(): Stop the MCP server
  • getStatus(): Get current server status
  • streamLogs(): WebSocket log streaming
  • getAvailableTools(): Fetch MCP tools

api

Base API configuration with:

  • Automatic error handling
  • Request/response interceptors
  • Base URL configuration
  • TypeScript generics

chatService

RAG query interface:

  • sendMessage(): Send RAG query
  • streamResponse(): Stream responses
  • getSources(): Get available sources

🎨 Styling

Tailwind Configuration

  • Custom color palette
  • Dark mode support
  • Custom animations
  • Responsive breakpoints

Theme Variables

--primary: Blue accent colors
--secondary: Gray/neutral colors
--success: Green indicators
--warning: Orange indicators
--error: Red indicators

🚀 Development

Setup

# Install dependencies
npm install

# Start dev server
npm run dev

# Build for production
npm run build

# Run tests
npm test

Environment Variables

VITE_API_URL=http://localhost:8080

Hot Module Replacement

Vite provides instant HMR for:

  • React components
  • CSS modules
  • TypeScript files

🧪 Testing

Unit Tests

  • Component testing with React Testing Library
  • Service mocking with MSW
  • Hook testing with @testing-library/react-hooks

Integration Tests

  • Page-level testing
  • API integration tests
  • WebSocket testing

📦 Build & Deployment

Docker Support

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
EXPOSE 5173
CMD ["npm", "run", "preview"]

Production Optimization

  • Code splitting by route
  • Lazy loading for pages
  • Image optimization
  • Bundle size analysis

🔧 Configuration Files

vite.config.ts

  • Path aliases
  • Build optimization
  • Development server config

tsconfig.json

  • Strict type checking
  • Path mappings
  • Compiler options

tailwind.config.js

  • Custom theme
  • Plugin configuration
  • Purge settings

🤝 Contributing

Code Style

  • ESLint configuration
  • Prettier formatting
  • TypeScript strict mode
  • Component naming conventions

Git Workflow

  • Feature branches
  • Conventional commits
  • PR templates
  • Code review process