- Read LOG_LEVEL from environment with INFO as default
- Use getattr to safely convert string to logging level constant
- Supports DEBUG, INFO, WARNING, ERROR, CRITICAL levels
- Falls back to INFO if invalid level specified
This minimal change allows debug logs to appear when LOG_LEVEL=DEBUG
is set in the .env file, fixing the issue where debug messages were
being filtered out.
- Add is_binary_file() method to URLHandler to detect 40+ binary extensions
- Update RecursiveCrawlStrategy to filter binary URLs before crawl queue
- Add comprehensive unit tests for binary file detection
- Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc.
This fixes the issue where the crawler was treating binary file URLs
(like .zip downloads) as navigable web pages, causing errors in crawl4ai.
- Implement fail-fast error handling for configuration errors
- Distinguish between critical config errors (fail) and network issues (use defaults)
- Add detailed error logging with stack traces for debugging
- Document new crawler settings in .env.example
- Add inline comments explaining safe defaults
Critical configuration errors (ValueError, KeyError, TypeError) now fail fast
as per alpha principles, while transient errors still fall back to safe defaults
with prominent error logging.
Consolidate concurrent crawling limits to use single database setting
instead of hardcoded special case for documentation sites.
Changes:
- Remove hardcoded 20 concurrent limit for documentation sites
- Let strategies use CRAWL_MAX_CONCURRENT from database (default: 10)
- Apply consistent concurrency across all site types
- Improve code formatting and consistency
This fixes Playwright browser crashes caused by excessive concurrent
pages on documentation sites and provides single configuration point
for tuning crawler performance.
* Create dependabot.yml
Currently watches for updates in github actions, and current iteration, present in the root folder. Commented expansion on how to maintain previous iterations addded.
* CI for local development
* CI for docker build
* Use matrix strategy on docker build
Docker version uses 3.12, so its interesting to ensure it properly works with this version
* Enable python 3.10 backporting
* In the MCP Streamlit page, added instructios for use in Claude Code.
* Fixing paths for Claude Code and adding one for Python
---------
Co-authored-by: Cole Medin <cole@dynamous.ai>
* docs: Update README with instructions for updating Archon via Docker and local Python installation
* fix: Improve container management in run_docker.py - Check for existing containers, stop if running, and force remove if necessary