Commit Graph

56 Commits

Author SHA1 Message Date
Rasmus Widing
e9a19ffb41 Fix LOG_LEVEL environment variable not being respected
- Read LOG_LEVEL from environment with INFO as default
- Use getattr to safely convert string to logging level constant
- Supports DEBUG, INFO, WARNING, ERROR, CRITICAL levels
- Falls back to INFO if invalid level specified

This minimal change allows debug logs to appear when LOG_LEVEL=DEBUG
is set in the .env file, fixing the issue where debug messages were
being filtered out.
2025-08-15 17:36:58 +03:00
Rasmus Widing
8157670936 Fix crawler attempting to navigate to binary files
- Add is_binary_file() method to URLHandler to detect 40+ binary extensions
- Update RecursiveCrawlStrategy to filter binary URLs before crawl queue
- Add comprehensive unit tests for binary file detection
- Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc.

This fixes the issue where the crawler was treating binary file URLs
(like .zip downloads) as navigable web pages, causing errors in crawl4ai.
2025-08-15 17:24:46 +03:00
Rasmus Widing
e98f52aa57 Address code review feedback: improve error handling and documentation
- Implement fail-fast error handling for configuration errors
- Distinguish between critical config errors (fail) and network issues (use defaults)
- Add detailed error logging with stack traces for debugging
- Document new crawler settings in .env.example
- Add inline comments explaining safe defaults

Critical configuration errors (ValueError, KeyError, TypeError) now fail fast
as per alpha principles, while transient errors still fall back to safe defaults
with prominent error logging.
2025-08-15 16:02:00 +03:00
Rasmus Widing
aab0721f0c Fix crawler concurrency configuration to prevent memory crashes
Consolidate concurrent crawling limits to use single database setting
instead of hardcoded special case for documentation sites.

Changes:
- Remove hardcoded 20 concurrent limit for documentation sites
- Let strategies use CRAWL_MAX_CONCURRENT from database (default: 10)
- Apply consistent concurrency across all site types
- Improve code formatting and consistency

This fixes Playwright browser crashes caused by excessive concurrent
pages on documentation sites and provides single configuration point
for tuning crawler performance.
2025-08-15 15:45:04 +03:00
Cole Medin
bb64af9e7a Archon onboarding, README updates, and MCP/global rule expansion for more coding assistants 2025-08-13 18:36:36 -05:00
Cole Medin
59084036f6 The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00