Commit Graph

64 Commits

Author SHA1 Message Date
Wirasm
41c58e53dc
Merge pull request #219 from coleam00/fix/respect-log-level-env-var
Fix LOG_LEVEL environment variable not being respected
2025-08-16 00:39:35 +03:00
Wirasm
8743c059bb
Merge pull request #218 from coleam00/fix/filter-binary-files-from-crawl
Fix crawler attempting to navigate to binary files
2025-08-16 00:39:17 +03:00
Wirasm
f96a9a4c4a
Merge pull request #213 from coleam00/fix/consolidate-concurrency-settings
Fix crawler concurrency configuration to prevent memory crashes
2025-08-16 00:38:45 +03:00
Rasmus Widing
4004090b45 Fix critical issues from code review
- Use python-jose (already in dependencies) instead of PyJWT for JWT decoding
- Make unknown Supabase key roles fail fast per alpha principles
- Skip all JWT validations (not just signature) when checking role
- Update tests to expect failure for unknown roles

Fixes:
- No need to add PyJWT dependency - python-jose provides JWT functionality
- Unknown key types now raise ConfigurationError instead of warning
- JWT decode properly skips all validations to only check role claim
2025-08-16 00:23:37 +03:00
Rasmus Widing
3800280f2e Add Supabase key validation and simplify frontend state management
- Add backend validation to detect and warn about anon vs service keys
- Prevent startup with incorrect Supabase key configuration
- Consolidate frontend state management following KISS principles
- Remove duplicate state tracking and sessionStorage polling
- Add clear error display when backend fails to start
- Improve .env.example documentation with detailed key selection guide
- Add comprehensive test coverage for validation logic
- Remove unused test results checking to eliminate 404 errors

The implementation now warns users about key misconfiguration while
maintaining backward compatibility. Frontend state is simplified with
MainLayout as the single source of truth for backend status.
2025-08-16 00:10:23 +03:00
Cole Medin
4a4663bddb Disabling reranking by default so the server container isn't so big 2025-08-15 15:20:04 -05:00
Rasmus Widing
ade439791d Suppress noisy third-party library debug logs
- Set hpack and httpcore loggers to WARNING level
- These libraries produce excessive protocol-level debug output
- Improves signal-to-noise ratio in logs
2025-08-15 18:26:26 +03:00
Rasmus Widing
caefaccbe4 Fix trailing whitespace (ruff formatting) 2025-08-15 17:56:51 +03:00
Rasmus Widing
e9a19ffb41 Fix LOG_LEVEL environment variable not being respected
- Read LOG_LEVEL from environment with INFO as default
- Use getattr to safely convert string to logging level constant
- Supports DEBUG, INFO, WARNING, ERROR, CRITICAL levels
- Falls back to INFO if invalid level specified

This minimal change allows debug logs to appear when LOG_LEVEL=DEBUG
is set in the .env file, fixing the issue where debug messages were
being filtered out.
2025-08-15 17:36:58 +03:00
Rasmus Widing
8157670936 Fix crawler attempting to navigate to binary files
- Add is_binary_file() method to URLHandler to detect 40+ binary extensions
- Update RecursiveCrawlStrategy to filter binary URLs before crawl queue
- Add comprehensive unit tests for binary file detection
- Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc.

This fixes the issue where the crawler was treating binary file URLs
(like .zip downloads) as navigable web pages, causing errors in crawl4ai.
2025-08-15 17:24:46 +03:00
Rasmus Widing
e98f52aa57 Address code review feedback: improve error handling and documentation
- Implement fail-fast error handling for configuration errors
- Distinguish between critical config errors (fail) and network issues (use defaults)
- Add detailed error logging with stack traces for debugging
- Document new crawler settings in .env.example
- Add inline comments explaining safe defaults

Critical configuration errors (ValueError, KeyError, TypeError) now fail fast
as per alpha principles, while transient errors still fall back to safe defaults
with prominent error logging.
2025-08-15 16:02:00 +03:00
Rasmus Widing
aab0721f0c Fix crawler concurrency configuration to prevent memory crashes
Consolidate concurrent crawling limits to use single database setting
instead of hardcoded special case for documentation sites.

Changes:
- Remove hardcoded 20 concurrent limit for documentation sites
- Let strategies use CRAWL_MAX_CONCURRENT from database (default: 10)
- Apply consistent concurrency across all site types
- Improve code formatting and consistency

This fixes Playwright browser crashes caused by excessive concurrent
pages on documentation sites and provides single configuration point
for tuning crawler performance.
2025-08-15 15:45:04 +03:00
Cole Medin
bb64af9e7a Archon onboarding, README updates, and MCP/global rule expansion for more coding assistants 2025-08-13 18:36:36 -05:00
Cole Medin
59084036f6 The New Archon (Beta) - The Operating System for AI Coding Assistants! 2025-08-13 07:58:24 -05:00