Fix sitemap URL detection to require .xml extension (#611)
Resolves issue where URLs containing 'sitemap' in path (like https://nx.dev/see-also/sitemap) were incorrectly treated as XML sitemaps, causing XML parsing errors. - Changed detection to require both .xml extension AND 'sitemap' in path - Fixes XML parsing error: "not well-formed (invalid token)" - Maintains compatibility with existing test cases - Now correctly identifies only actual XML sitemap files Fixes #607 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
parent
3d5753f8a7
commit
ce2f871ebb
@ -29,7 +29,10 @@ class URLHandler:
|
||||
True if URL is a sitemap, False otherwise
|
||||
"""
|
||||
try:
|
||||
return url.endswith("sitemap.xml") or "sitemap" in urlparse(url).path
|
||||
parsed = urlparse(url)
|
||||
path = parsed.path.lower()
|
||||
# Only match URLs that end with .xml and contain sitemap in the filename
|
||||
return path.endswith(".xml") and "sitemap" in path
|
||||
except Exception as e:
|
||||
logger.warning(f"Error checking if URL is sitemap: {e}")
|
||||
return False
|
||||
|
||||
Loading…
Reference in New Issue
Block a user