* Fix race condition in concurrent crawling with unique source IDs
- Add unique hash-based source_id generation to prevent conflicts
- Separate source identification from display with three fields:
- source_id: 16-char SHA256 hash for unique identification
- source_url: Original URL for tracking
- source_display_name: Human-friendly name for UI
- Add comprehensive test suite validating the fix
- Migrate existing data with backward compatibility
* Fix title generation to use source_display_name for better AI context
- Pass source_display_name to title generation function
- Use display name in AI prompt instead of hash-based source_id
- Results in more specific, meaningful titles for each source
* Skip AI title generation when display name is available
- Use source_display_name directly as title to avoid unnecessary AI calls
- More efficient and predictable than AI-generated titles
- Keep AI generation only as fallback for backward compatibility
* Fix critical issues from code review
- Add missing os import to prevent NameError crash
- Remove unused imports (pytest, Mock, patch, hashlib, urlparse, etc.)
- Fix GitHub API capitalization consistency
- Reuse existing DocumentStorageService instance
- Update test expectations to match corrected capitalization
Addresses CodeRabbit review feedback on PR #472
* Add safety improvements from code review
- Truncate display names to 100 chars when used as titles
- Document hash collision probability (negligible for <1M sources)
Simple, pragmatic fixes per KISS principle
* Fix code extraction to use hash-based source_ids and improve display names
- Fixed critical bug where code extraction was using old domain-based source_ids
- Updated code extraction service to accept source_id as parameter instead of extracting from URL
- Added special handling for llms.txt and sitemap.xml files in display names
- Added comprehensive tests for source_id handling in code extraction
- Removed unused urlparse import from code_extraction_service.py
This fixes the foreign key constraint errors that were preventing code examples
from being stored after the source_id architecture refactor.
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix critical variable shadowing and source_type determination issues
- Fixed variable shadowing in document_storage_operations.py where source_url parameter
was being overwritten by document URLs, causing incorrect source_url in database
- Fixed source_type determination to use actual URLs instead of hash-based source_id
- Added comprehensive tests for source URL preservation
- Ensure source_type is correctly set to "file" for file uploads, "url" for web crawls
The variable shadowing bug was causing sitemap sources to have the wrong source_url
(last crawled page instead of sitemap URL). The source_type bug would mark all
sources as "url" even for file uploads due to hash-based IDs not starting with "file_".
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix URL canonicalization and document metrics calculation
- Implement proper URL canonicalization to prevent duplicate sources
- Remove trailing slashes (except root)
- Remove URL fragments
- Remove tracking parameters (utm_*, gclid, fbclid, etc.)
- Sort query parameters for consistency
- Remove default ports (80 for HTTP, 443 for HTTPS)
- Normalize scheme and domain to lowercase
- Fix avg_chunks_per_doc calculation to avoid division by zero
- Track processed_docs count separately from total crawl_results
- Handle all-empty document sets gracefully
- Show processed/total in logs for better visibility
- Add comprehensive tests for both fixes
- 10 test cases for URL canonicalization edge cases
- 4 test cases for document metrics calculation
This prevents database constraint violations when crawling the same
content with URL variations and provides accurate metrics in logs.
* Fix synchronous extract_source_summary blocking async event loop
- Run extract_source_summary in thread pool using asyncio.to_thread
- Prevents blocking the async event loop during AI summary generation
- Preserves exact error handling and fallback behavior
- Variables (source_id, combined_content) properly passed to thread
Added comprehensive tests verifying:
- Function runs in thread without blocking
- Error handling works correctly with fallback
- Multiple sources can be processed
- Thread safety with variable passing
* Fix synchronous update_source_info blocking async event loop
- Run update_source_info in thread pool using asyncio.to_thread
- Prevents blocking the async event loop during database operations
- Preserves exact error handling and fallback behavior
- All kwargs properly passed to thread execution
Added comprehensive tests verifying:
- Function runs in thread without blocking
- Error handling triggers fallback correctly
- All kwargs are preserved when passed to thread
- Existing extract_source_summary tests still pass
* Fix race condition in source creation using upsert
- Replace INSERT with UPSERT for new sources to prevent PRIMARY KEY violations
- Handles concurrent crawls attempting to create the same source
- Maintains existing UPDATE behavior for sources that already exist
Added comprehensive tests verifying:
- Concurrent source creation doesn't fail
- Upsert is used for new sources (not insert)
- Update is still used for existing sources
- Async concurrent operations work correctly
- Race conditions with delays are handled
This prevents database constraint errors when multiple crawls target
the same URL simultaneously.
* Add migration detection UI components
Add MigrationBanner component with clear user instructions for database schema updates. Add useMigrationStatus hook for periodic health check monitoring with graceful error handling.
* Integrate migration banner into main app
Add migration status monitoring and banner display to App.tsx. Shows migration banner when database schema updates are required.
* Enhance backend startup error instructions
Add detailed Docker restart instructions and migration script guidance. Improves user experience when encountering startup failures.
* Add database schema caching to health endpoint
Implement smart caching for schema validation to prevent repeated database queries. Cache successful validations permanently and throttle failures to 30-second intervals. Replace debug prints with proper logging.
* Clean up knowledge API imports and logging
Remove duplicate import statements and redundant logging. Improves code clarity and reduces log noise.
* Remove unused instructions prop from MigrationBanner
Clean up component API by removing instructions prop that was accepted but never rendered. Simplifies the interface and eliminates dead code while keeping the functional hardcoded migration steps.
* Add schema_valid flag to migration_required health response
Add schema_valid: false flag to health endpoint response when database schema migration is required. Improves API consistency without changing existing behavior.
---------
Co-authored-by: Claude <noreply@anthropic.com>
* depends on and env var added
Update Vite configuration to enable allowed hosts
- Uncommented the allowedHosts configuration to allow for dynamic host settings based on environment variables.
- This change enhances flexibility for different deployment environments while maintaining the default localhost and specific domain access.
Needs testing to confirm proper functionality with various host configurations.
rm my domain
* Enhance Vite configuration with dynamic allowed hosts support
- Added VITE_ALLOWED_HOSTS environment variable to .env.example and docker-compose.yml for flexible host configuration.
- Updated Vite config to dynamically set allowed hosts, incorporating defaults and custom values from the environment variable.
- This change improves deployment flexibility while maintaining security by defaulting to localhost and specific domains.
Needs testing to confirm proper functionality with various host configurations.
* refactor: remove unnecessary dependency on archon-agents in docker-compose.yml
- Removed the dependency condition for archon-agents from the archon-mcp service to streamline the startup process.
- This change simplifies the service configuration and reduces potential startup issues related to agent service health checks.
Needs testing to ensure that the application functions correctly without the archon-agents dependency.
---------
Co-authored-by: Julian Gegenhuber <office@salzkammercode.at>
- Fix misleading profile documentation at top of docker-compose.yml
- Add AGENTS_ENABLED flag for cleaner agent service handling
- Make AGENTS_SERVICE_URL configurable via environment variable
- Prevent noisy connection errors when agents service isn't running
This provides a cleaner way to disable the agents service and allows
the application to skip agent wiring when AGENTS_ENABLED=false.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add 'agents' profile to archon-agents service
- Remove archon-agents as dependency from archon-mcp service
- Service now only starts with --profile agents flag
- Prevents startup issues while agents service is under development
- All core functionality continues to work without agents
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add include_archived parameter to TaskService.list_tasks()
- Service now conditionally applies archived filter based on parameter
- Add 'archived' field to task DTO for client visibility
- Update API endpoints to pass include_archived down to service
- Remove redundant client-side filtering in API layer
- Fix type hints in integration tests (dict[str, Any] | None)
- Use pytest.skip() instead of return for proper test reporting
These fixes address the functional bug identified by CodeRabbit where
archived tasks couldn't be retrieved even when explicitly requested.
- Add include_content parameter to ProjectService.list_projects()
- Add exclude_large_fields parameter to TaskService.list_tasks()
- Add include_content parameter to DocumentService.list_documents()
- Update all MCP tools to use lightweight responses by default
- Fix critical N+1 query problem in ProjectService (was making separate query per project)
- Add response size monitoring and logging for validation
- Add comprehensive unit and integration tests
Results:
- Projects endpoint: 99.3% token reduction (27,055 -> 194 tokens)
- Tasks endpoint: 98.2% token reduction (12,750 -> 226 tokens)
- Documents endpoint: Returns metadata with content_size instead of full content
- Maintains full backward compatibility with default parameters
- Single query optimization eliminates N+1 performance issue
- Update error modal to show default 'docker compose up --build -d' command
- Add better organized note structure with bullet points
- Include profile-specific fallback example for existing users
- Update README Quick Start to show default command first
- Maintain backward compatibility guidance for profile users
- Change from generic YOUR_PROFILE to specific 'full' profile
- Add note explaining users can replace 'full' if needed
- Maintains clarity while providing flexibility for different profiles
- Remove profile restrictions from all services so they start with 'docker compose up'
- All services now run by default without requiring --profile flags
- Profile functionality removed - users now use default behavior only
- This enables the requested 'docker compose up --build -d' workflow
- Add 'default' profile to all services so 'docker compose up --build -d' works without --profile flag
- Update BackendStartupError.tsx to include '--profile full' in Docker command examples
- Update docker-compose.yml comments to document the new default behavior
This allows users to run either:
- docker compose up --build -d (uses default profile, starts all services)
- docker compose --profile full up --build -d (explicit profile, same result)
- docker compose --profile backend up --build -d (backend services only)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add type, aria-label, and aria-hidden attributes to action and icon buttons across task and document components to improve accessibility and assistive technology support.
- Removed the original_archon/ directory containing the legacy Archon v1-v6 iterations
- This was the original AI agent builder system before the pivot to the current architecture
- The folder has been preserved in the 'preserve-original-archon' branch for historical reference
- Reduces repository size by ~5.2MB and removes confusion about which codebase is active
- Removed python/src/server/testing/ folder containing deprecated test utilities
- These PRP viewer testing tools were used during initial development
- No longer needed as functionality has been integrated into main codebase
- No dependencies or references found in production code
Applied the extra parameter pattern to all remaining logging statements (11 more) to ensure consistency and prevent runtime errors when any code path is executed. This completes the fix for the entire file.
Fixed TypeError when passing custom fields to Python logger by using the 'extra' parameter instead of direct keyword arguments. This resolves embedding creation failures during crawl operations.
* Add improved development environment with backend in Docker and frontend locally
- Created dev.bat script to run backend services in Docker and frontend locally
- Added docker-compose.backend.yml for backend-only Docker setup
- Updated package.json to run frontend on port 3737
- Fixed api.ts to use default port 8181 instead of throwing error
- Script automatically stops production containers to avoid port conflicts
- Provides instant HMR for frontend development
* Refactor development environment setup: replace dev.bat with Makefile for cross-platform support and enhanced commands
* Enhance development environment: add environment variable checks and update test commands for frontend and backend
* Improve development environment with Docker Compose profiles
This commit enhances the development workflow by replacing the separate
docker-compose.backend.yml file with Docker Compose profiles, fixing
critical service discovery issues, and adding comprehensive developer
tooling through an improved Makefile system.
Key improvements:
- Replace docker-compose.backend.yml with cleaner profile approach
- Fix service discovery by maintaining consistent container names
- Fix port mappings (3737:3737 instead of 3737:5173)
- Add make doctor for environment validation
- Fix port configuration and frontend HMR
- Improve error handling with .SHELLFLAGS in Makefile
- Add comprehensive port configuration via environment variables
- Simplify make dev-local to only run essential services
- Add logging directory creation for local development
- Document profile strategy in docker-compose.yml
These changes provide three flexible development modes:
- Hybrid mode (default): Backend in Docker, frontend local with HMR
- Docker mode: Everything in Docker for production-like testing
- Local mode: API server and UI run locally
Co-authored-by: Zak Stam <zaksnet@users.noreply.github.com>
* Fix make stop command to properly handle Docker Compose profiles
The stop command now explicitly specifies all profiles to ensure
all containers are stopped regardless of how they were started.
* Fix README to document correct make commands
- Changed 'make lint' to 'make lint-frontend' and 'make lint-backend'
- Removed non-existent 'make logs-server' command
- Added 'make watch-mcp' and 'make watch-agents' commands
- All documented make commands now match what's available in Makefile
* fix: Address critical issues from code review #435
- Create robust environment validation script (check-env.js) that properly parses .env files
- Fix Docker healthcheck port mismatch (5173 -> 3737)
- Remove hard-coded port flags from package.json to allow environment configuration
- Fix Docker detection logic using /.dockerenv instead of HOSTNAME
- Normalize container names to lowercase (archon-server, archon-mcp, etc.)
- Improve stop-local command with port-based fallback for process killing
- Fix API configuration fallback chain to include VITE_PORT
- Fix Makefile shell variable expansion using runtime evaluation
- Update .PHONY targets with comprehensive list
- Add --profile flags to Docker Compose commands in README
- Add VITE_ARCHON_SERVER_PORT to docker-compose.yml
- Add Node.js 18+ to prerequisites
- Use dynamic ports in Makefile help messages
- Add lint alias combining frontend and backend linting
- Update .env.example documentation
- Scope .gitignore logs entry to /logs/
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix container name resolution for MCP server
- Add dynamic container name resolution with three-tier strategy
- Support environment variables for custom container names
- Add service discovery labels to docker-compose services
- Update BackendStartupError with correct container name references
* Fix frontend test failures in API configuration tests
- Update environment variable names to use VITE_ prefix that matches production code
- Fix MCP client service tests to use singleton instance export
- Update default behavior tests to expect fallback to port 8181
- All 77 frontend tests now pass
* Fix make stop-local to avoid Docker daemon interference
Replace aggressive kill -9 with targeted process termination:
- Filter processes by command name (node/vite/python/uvicorn) before killing
- Use graceful SIGTERM instead of SIGKILL
- Add process verification to avoid killing Docker-related processes
- Improve logging with descriptive step messages
* refactor: Simplify development workflow based on comprehensive review
- Reduced Makefile from 344 lines (43 targets) to 83 lines (8 essential targets)
- Removed unnecessary environment variables (*_CONTAINER_NAME variables)
- Fixed Windows compatibility by removing Unix-specific commands
- Added security fixes to check-env.js (path validation)
- Simplified MCP container discovery to use fixed container names
- Fixed 'make stop' to properly handle Docker Compose profiles
- Updated documentation to reflect simplified workflow
- Restored original .env.example with comprehensive Supabase key documentation
This addresses all critical issues from code review:
- Cross-platform compatibility ✅
- Security vulnerabilities fixed ✅
- 81% reduction in complexity ✅
- Maintains all essential functionality ✅
All tests pass: Frontend (77/77), Backend (267/267)
* feat: Add granular test and lint commands to Makefile
- Split test command into test-fe and test-be for targeted testing
- Split lint command into lint-fe and lint-be for targeted linting
- Keep original test and lint commands that run both
- Update help text with new commands for better developer experience
* feat: Improve Docker Compose detection and prefer modern syntax
- Prefer 'docker compose' (plugin) over 'docker-compose' (standalone)
- Add better error handling in Makefile with proper exit on failures
- Add Node.js check before running environment scripts
- Pass environment variables correctly to frontend in hybrid mode
- Update all documentation to use modern 'docker compose' syntax
- Auto-detect which Docker Compose version is available
* docs: Update CONTRIBUTING.md to reflect simplified development workflow
- Add Node.js 18+ as prerequisite for hybrid development
- Mark Make as optional throughout the documentation
- Update all docker-compose commands to modern 'docker compose' syntax
- Add Make command alternatives for testing (make test, test-fe, test-be)
- Document make dev for hybrid development mode
- Remove linting requirements until codebase errors are resolved
* fix: Rename frontend service to archon-frontend for consistency
Aligns frontend service naming with other services (archon-server, archon-mcp, archon-agents) for better consistency in Docker image naming patterns.
---------
Co-authored-by: Zak Stam <zakscomputers@hotmail.com>
Co-authored-by: Zak Stam <zaksnet@users.noreply.github.com>
- Transform URLs to raw content (e.g., GitHub blob -> raw) before sending to crawler
- Maintain mapping dictionary to preserve original URLs in results
- Align progress callback signatures between batch and recursive strategies
- Add safety guards for missing links attribute
- Remove unused loop counter in batch strategy
- Optimize binary file checks to avoid duplicate calls
This ensures GitHub files are crawled as raw content instead of HTML pages,
fixing the issue where content extraction was degraded due to HTML wrapping.
Remove wait_for='body' selector from documentation site crawling config.
The body element exists immediately in HTML, causing unnecessary timeouts
for JavaScript-rendered content. Now relies on domcontentloaded event
and delay_before_return_html for proper JavaScript execution.
- Fixed test_update_task_status to use individual parameters
- Added test_update_task_no_fields for validation testing
- All MCP tests passing (44 tests)
Resolves#420 - Tasks being duplicated instead of updated
Changes:
1. Fixed update_task function signature to use individual optional parameters
- Changed from TypedDict to explicit parameters (title, status, etc.)
- Consistent with update_project and update_document patterns
- Builds update_fields dict internally from provided parameters
2. Updated MCP instructions with correct function names
- Replaced non-existent manage_task with actual functions
- Added complete function signatures for all tools
- Improved workflow documentation with concrete examples
This fixes the issue where AI agents were confused by:
- Wrong function names in instructions (manage_task vs update_task)
- Inconsistent parameter patterns across update functions
- TypedDict magic that wasn't clearly documented
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed missing knowledge_type and tags parameters in DocumentStorageService.upload_document()
- Added source_type='file' to document chunk metadata for proper categorization
- Enhanced source metadata creation to include source_type based on source_id pattern
- Fixed metadata spread order in knowledge_item_service to prevent source_type override
- Business documents now correctly show pink color theme and appear in Business Documents section
Fixes issue where business documents were incorrectly stored as technical knowledge
and appeared with blue color theme instead of pink.
* fix: Allow HTTP for all private network ranges in Supabase URLs
- Extend HTTP support to all RFC 1918 private IP ranges
- Class A: 10.0.0.0 to 10.255.255.255 (10.0.0.0/8)
- Class B: 172.16.0.0 to 172.31.255.255 (172.16.0.0/12)
- Class C: 192.168.0.0 to 192.168.255.255 (192.168.0.0/16)
- Also includes link-local (169.254.0.0/16) addresses
- Uses Python's ipaddress module for robust IP validation
- Maintains HTTPS requirement for public/production URLs
- Backwards compatible with existing localhost exceptions
* security: Fix URL validation vulnerabilities
- Replace substring matching with exact hostname matching to prevent bypass attacks
- Exclude unspecified address (0.0.0.0) from allowed HTTP hosts
- Add support for .localhost domains per RFC 6761
- Improve error messages with hostname context for better debugging
Addresses security concerns raised in PR review regarding:
- Malicious domains like 'localhost.attacker.com' bypassing HTTPS requirements
- Unspecified address being incorrectly allowed as valid connection target
---------
Co-authored-by: tazmon95 <tazmon95@users.noreply.github.com>
Co-authored-by: root <root@supatest2.jtpa.net>
Applied automated linting and formatting:
- Fixed missing newlines at end of files
- Adjusted line wrapping for better readability
- Fixed multi-line string formatting in tests
- No functional changes, only style improvements
All 43 tests still passing after formatting changes.
Based on latest PR #306 review feedback:
Fixed Issues:
- Replaced last remaining basic error handling with MCPErrorFormatter
in version_tools.py get_version function
- Added proper error handling for invalid env vars in get_max_polling_attempts
- Improved type hints with TaskUpdateFields TypedDict for better validation
- All tools now consistently use get_default_timeout() (verified with grep)
Test Improvements:
- Added comprehensive tests for MCPErrorFormatter utility (10 tests)
- Added tests for timeout_config utility (13 tests)
- All 43 MCP tests passing with new utilities
- Tests verify structured error format and timeout configuration
Type Safety:
- Created TaskUpdateFields TypedDict to specify exact allowed fields
- Documents valid statuses and assignees in type comments
- Improves IDE support and catches type errors at development time
This completes all priority actions from the review:
✅ Fixed inconsistent timeout usage (was already done)
✅ Fixed error handling inconsistency
✅ Improved type hints for update_fields
✅ Added tests for utility modules
Comprehensive update to MCP server error handling:
Error Handling Improvements:
- Applied MCPErrorFormatter to all remaining MCP tool files
- Replaced all hardcoded timeout values with configurable timeout system
- Converted all simple string errors to structured error format
- Added proper httpx exception handling with detailed context
Tools Updated:
- document_tools.py: All 5 document management tools
- version_tools.py: All 4 version management tools
- feature_tools.py: Project features tool
- project_tools.py: Remaining 3 project tools (get, list, delete)
- task_tools.py: Remaining 4 task tools (get, list, update, delete)
Test Improvements:
- Removed backward compatibility checks from all tests
- Tests now enforce structured error format (dict not string)
- Any string error response is now considered a bug
- All 20 tests passing with new strict validation
This completes the error handling refactor for all MCP tools,
ensuring consistent client experience and better debugging.
Critical improvements to MCP server reliability and client experience:
Error Handling:
- Created MCPErrorFormatter for consistent error responses across all tools
- Provides structured errors with type, message, details, and actionable suggestions
- Helps clients (like Claude Code) understand and handle failures gracefully
- Categorizes errors (connection_timeout, validation_error, etc.) for better debugging
Timeout Configuration:
- Centralized timeout config with environment variable support
- Different timeouts for regular operations vs polling operations
- Configurable via MCP_REQUEST_TIMEOUT, MCP_CONNECT_TIMEOUT, etc.
- Prevents indefinite hangs when services are unavailable
Module Registration:
- Distinguishes between ImportError (acceptable) and code errors (must fix)
- SyntaxError/NameError/AttributeError now halt execution immediately
- Prevents broken code from silently failing in production
Polling Safety:
- Fixed project creation polling with exponential backoff
- Handles API unavailability with proper error messages
- Maximum attempts configurable via MCP_MAX_POLLING_ATTEMPTS
Response Normalization:
- Fixed inconsistent response handling in list_tasks
- Validates and normalizes different API response formats
- Clear error messages when response format is unexpected
These changes address critical issues from PR review while maintaining
backward compatibility. All 20 existing tests pass.