Commit Graph

13 Commits

Author SHA1 Message Date
Rasmus Widing
3359085150 MCP server consolidation and simplification
- Consolidated multiple MCP modules into unified project_module
- Removed redundant project, task, document, and version modules
- Identified critical issue with async project creation losing context
- Updated CLAUDE.md with project instructions

This commit captures the current state before refactoring to split
consolidated tools into separate operations for better clarity and
to solve the async project creation context issue.
2025-08-18 14:48:52 +03:00
Wirasm
41c58e53dc
Merge pull request #219 from coleam00/fix/respect-log-level-env-var
Fix LOG_LEVEL environment variable not being respected
2025-08-16 00:39:35 +03:00
Wirasm
8743c059bb
Merge pull request #218 from coleam00/fix/filter-binary-files-from-crawl
Fix crawler attempting to navigate to binary files
2025-08-16 00:39:17 +03:00
Wirasm
f96a9a4c4a
Merge pull request #213 from coleam00/fix/consolidate-concurrency-settings
Fix crawler concurrency configuration to prevent memory crashes
2025-08-16 00:38:45 +03:00
Cole Medin
4a4663bddb Disabling reranking by default so the server container isn't so big 2025-08-15 15:20:04 -05:00
Rasmus Widing
ade439791d Suppress noisy third-party library debug logs
- Set hpack and httpcore loggers to WARNING level
- These libraries produce excessive protocol-level debug output
- Improves signal-to-noise ratio in logs
2025-08-15 18:26:26 +03:00
Rasmus Widing
caefaccbe4 Fix trailing whitespace (ruff formatting) 2025-08-15 17:56:51 +03:00
Rasmus Widing
e9a19ffb41 Fix LOG_LEVEL environment variable not being respected
- Read LOG_LEVEL from environment with INFO as default
- Use getattr to safely convert string to logging level constant
- Supports DEBUG, INFO, WARNING, ERROR, CRITICAL levels
- Falls back to INFO if invalid level specified

This minimal change allows debug logs to appear when LOG_LEVEL=DEBUG
is set in the .env file, fixing the issue where debug messages were
being filtered out.
2025-08-15 17:36:58 +03:00
Rasmus Widing
8157670936 Fix crawler attempting to navigate to binary files
- Add is_binary_file() method to URLHandler to detect 40+ binary extensions
- Update RecursiveCrawlStrategy to filter binary URLs before crawl queue
- Add comprehensive unit tests for binary file detection
- Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc.

This fixes the issue where the crawler was treating binary file URLs
(like .zip downloads) as navigable web pages, causing errors in crawl4ai.
2025-08-15 17:24:46 +03:00
Rasmus Widing
e98f52aa57 Address code review feedback: improve error handling and documentation
- Implement fail-fast error handling for configuration errors
- Distinguish between critical config errors (fail) and network issues (use defaults)
- Add detailed error logging with stack traces for debugging
- Document new crawler settings in .env.example
- Add inline comments explaining safe defaults

Critical configuration errors (ValueError, KeyError, TypeError) now fail fast
as per alpha principles, while transient errors still fall back to safe defaults
with prominent error logging.
2025-08-15 16:02:00 +03:00
Rasmus Widing
aab0721f0c Fix crawler concurrency configuration to prevent memory crashes
Consolidate concurrent crawling limits to use single database setting
instead of hardcoded special case for documentation sites.

Changes:
- Remove hardcoded 20 concurrent limit for documentation sites
- Let strategies use CRAWL_MAX_CONCURRENT from database (default: 10)
- Apply consistent concurrency across all site types
- Improve code formatting and consistency

This fixes Playwright browser crashes caused by excessive concurrent
pages on documentation sites and provides single configuration point
for tuning crawler performance.
2025-08-15 15:45:04 +03:00
Cole Medin
bb64af9e7a Archon onboarding, README updates, and MCP/global rule expansion for more coding assistants 2025-08-13 18:36:36 -05:00
Cole Medin
59084036f6 The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00