- Add backend validation to detect and warn about anon vs service keys
- Prevent startup with incorrect Supabase key configuration
- Consolidate frontend state management following KISS principles
- Remove duplicate state tracking and sessionStorage polling
- Add clear error display when backend fails to start
- Improve .env.example documentation with detailed key selection guide
- Add comprehensive test coverage for validation logic
- Remove unused test results checking to eliminate 404 errors
The implementation now warns users about key misconfiguration while
maintaining backward compatibility. Frontend state is simplified with
MainLayout as the single source of truth for backend status.
- Set hpack and httpcore loggers to WARNING level
- These libraries produce excessive protocol-level debug output
- Improves signal-to-noise ratio in logs
- Read LOG_LEVEL from environment with INFO as default
- Use getattr to safely convert string to logging level constant
- Supports DEBUG, INFO, WARNING, ERROR, CRITICAL levels
- Falls back to INFO if invalid level specified
This minimal change allows debug logs to appear when LOG_LEVEL=DEBUG
is set in the .env file, fixing the issue where debug messages were
being filtered out.
- Add is_binary_file() method to URLHandler to detect 40+ binary extensions
- Update RecursiveCrawlStrategy to filter binary URLs before crawl queue
- Add comprehensive unit tests for binary file detection
- Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc.
This fixes the issue where the crawler was treating binary file URLs
(like .zip downloads) as navigable web pages, causing errors in crawl4ai.
- Implement fail-fast error handling for configuration errors
- Distinguish between critical config errors (fail) and network issues (use defaults)
- Add detailed error logging with stack traces for debugging
- Document new crawler settings in .env.example
- Add inline comments explaining safe defaults
Critical configuration errors (ValueError, KeyError, TypeError) now fail fast
as per alpha principles, while transient errors still fall back to safe defaults
with prominent error logging.
Consolidate concurrent crawling limits to use single database setting
instead of hardcoded special case for documentation sites.
Changes:
- Remove hardcoded 20 concurrent limit for documentation sites
- Let strategies use CRAWL_MAX_CONCURRENT from database (default: 10)
- Apply consistent concurrency across all site types
- Improve code formatting and consistency
This fixes Playwright browser crashes caused by excessive concurrent
pages on documentation sites and provides single configuration point
for tuning crawler performance.