Archon

luis.erlacher/Archon

Fork 0

Commit Graph

Author	SHA1	Message	Date
Rasmus Widing	8157670936	Fix crawler attempting to navigate to binary files - Add is_binary_file() method to URLHandler to detect 40+ binary extensions - Update RecursiveCrawlStrategy to filter binary URLs before crawl queue - Add comprehensive unit tests for binary file detection - Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc. This fixes the issue where the crawler was treating binary file URLs (like .zip downloads) as navigable web pages, causing errors in crawl4ai.	2025-08-15 17:24:46 +03:00

Author

SHA1

Message

Date

Rasmus Widing

8157670936

Fix crawler attempting to navigate to binary files

- Add is_binary_file() method to URLHandler to detect 40+ binary extensions
- Update RecursiveCrawlStrategy to filter binary URLs before crawl queue
- Add comprehensive unit tests for binary file detection
- Prevents net::ERR_ABORTED errors when crawler encounters ZIP, PDF, etc.

This fixes the issue where the crawler was treating binary file URLs
(like .zip downloads) as navigable web pages, causing errors in crawl4ai.

2025-08-15 17:24:46 +03:00

1 Commits