Fix Issue #362: Provider-agnostic error handling for all LLM providers (#650)

* feat: Provider-agnostic error handling for Issue #362

Implements generic error handling that works for OpenAI, Google AI,
Anthropic, and other LLM providers to prevent silent failures.

Essential files only:
1. Provider error adapters (new) - handles any LLM provider
2. Backend API key validation - detects invalid keys before operations
3. Frontend error handler - provider-aware error messages
4. Updated hooks - uses generic error handling

Core functionality:
 Validates API keys before expensive operations (crawl, upload, refresh)
 Shows clear provider-specific error messages
 Works with OpenAI: 'Please verify your OpenAI API key in Settings'
 Works with Google: 'Please verify your Google API key in Settings'
 Prevents 90-minute debugging sessions from Issue #362

No unnecessary changes - only essential error handling logic.

Fixes #362

* fix: Enhance API key validation with detailed logging and error handling

- Add comprehensive logging to trace validation flow
- Ensure validation actually blocks operations on authentication failures
- Improve error detection to catch wrapped OpenAI errors
- Fail fast on any validation errors to prevent wasted operations

This should ensure invalid API keys are caught before crawl starts,
not during embedding processing after documents are crawled.

* fix: Simplify API key validation to always fail on exceptions

- Remove complex provider adapter imports that cause module issues
- Simplified validation that fails fast on any embedding creation error
- Enhanced logging to trace exactly what's happening
- Always block operations when API key validation fails

This ensures invalid API keys are caught immediately before
crawl operations start, preventing silent failures.

* fix: Add API key validation to refresh and upload endpoints

The validation was only added to new crawl endpoint but missing from:
- Knowledge item refresh endpoint (/knowledge-items/{source_id}/refresh)
- Document upload endpoint (/documents/upload)

Now all three endpoints that create embeddings will validate API keys
before starting operations, preventing silent failures on refresh/upload.

* security: Implement core security fixes from CodeRabbit review

Enhanced sanitization and provider detection based on CodeRabbit feedback:

 Comprehensive regex patterns for all provider API keys
  - OpenAI: sk-[a-zA-Z0-9]{48} with case-insensitive matching
  - Google AI: AIza[a-zA-Z0-9_-]{35} with flexible matching
  - Anthropic: sk-ant-[a-zA-Z0-9_-]{10,} with variable length

 Enhanced provider detection with multiple patterns
  - Case-insensitive keyword matching (openai, google, anthropic)
  - Regex-based API key detection for reliable identification
  - Additional keywords (gpt, claude, vertex, googleapis)

 Improved sanitization patterns
  - Provider-specific URL sanitization (openai.com, googleapis.com, anthropic.com)
  - Organization and project ID redaction
  - OAuth token and bearer token sanitization
  - Sensitive keyword detection and generic fallback

 Sanitized error logging
  - All error messages sanitized before logging
  - Prevents sensitive data exposure in backend logs
  - Maintains debugging capability with redacted information

Core security improvements while maintaining simplicity for beta deployment.

* fix: Replace ad-hoc error sanitization with centralized ProviderErrorFactory

- Remove local _sanitize_provider_error implementation with inline regex patterns
- Add ProviderErrorFactory import from embeddings.provider_error_adapters
- Update _validate_provider_api_key calls to pass correct active embedding provider
- Replace sanitization call with ProviderErrorFactory.sanitize_provider_error()
- Eliminate duplicate logic and fixed-length key assumptions
- Ensure provider-specific, configurable sanitization patterns are used consistently

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: Remove accidentally committed PRP file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: address code review feedback

- Add barrel export for providerErrorHandler in utils/index.ts
- Change TypeScript typing from 'any' to 'unknown' for strict type safety

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
This commit is contained in:
DIY Smart Code 2025-09-17 12:13:41 +02:00 committed by GitHub
parent b2ec7df666
commit 9f2d70ae0e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 330 additions and 1 deletions

View File

@ -11,6 +11,7 @@ import { useActiveOperations } from "../progress/hooks";
import { progressKeys } from "../progress/hooks/useProgressQueries";
import type { ActiveOperation, ActiveOperationsResponse } from "../progress/types";
import { knowledgeService } from "../services";
import { getProviderErrorMessage } from "../utils/providerErrorHandler";
import type {
CrawlRequest,
CrawlStartResponse,
@ -273,7 +274,7 @@ export function useCrawlUrl() {
queryClient.setQueryData(progressKeys.list(), context.previousOperations);
}
const errorMessage = error instanceof Error ? error.message : "Failed to start crawl";
const errorMessage = getProviderErrorMessage(error) || "Failed to start crawl";
showToast(errorMessage, "error");
},
});

View File

@ -1 +1,2 @@
export * from "./knowledge-utils";
export * from "./providerErrorHandler";

View File

@ -0,0 +1,71 @@
/**
* Provider-agnostic error handler for LLM operations
* Supports OpenAI, Google AI, Anthropic, and other providers
*/
export interface ProviderError extends Error {
statusCode?: number;
provider?: string;
errorType?: string;
isProviderError?: boolean;
}
/**
* Parse backend error responses into provider-aware error objects
*/
export function parseProviderError(error: unknown): ProviderError {
const providerError = error as ProviderError;
// Check if this is a structured provider error from backend
if (error && typeof error === 'object') {
if (error.statusCode || error.status) {
providerError.statusCode = error.statusCode || error.status;
}
// Parse backend error structure
if (error.message && error.message.includes('detail')) {
try {
const parsed = JSON.parse(error.message);
if (parsed.detail && parsed.detail.error_type) {
providerError.isProviderError = true;
providerError.provider = parsed.detail.provider || 'LLM';
providerError.errorType = parsed.detail.error_type;
providerError.message = parsed.detail.message || error.message;
}
} catch {
// If parsing fails, use message as-is
}
}
}
return providerError;
}
/**
* Get user-friendly error message for any LLM provider
*/
export function getProviderErrorMessage(error: unknown): string {
const parsed = parseProviderError(error);
if (parsed.isProviderError) {
const provider = parsed.provider || 'LLM';
switch (parsed.errorType) {
case 'authentication_failed':
return `Please verify your ${provider} API key in Settings.`;
case 'quota_exhausted':
return `${provider} quota exhausted. Please check your billing settings.`;
case 'rate_limit':
return `${provider} rate limit exceeded. Please wait and try again.`;
default:
return `${provider} API error. Please check your configuration.`;
}
}
// Handle status codes for non-structured errors
if (parsed.statusCode === 401) {
return "Please verify your API key in Settings.";
}
return parsed.message || "An error occurred.";
}

View File

@ -22,6 +22,8 @@ from pydantic import BaseModel
from ..config.logfire_config import get_logger, safe_logfire_error, safe_logfire_info
from ..services.crawler_manager import get_crawler
from ..services.crawling import CrawlingService
from ..services.credential_service import credential_service
from ..services.embeddings.provider_error_adapters import ProviderErrorFactory
from ..services.knowledge import DatabaseMetricsService, KnowledgeItemService, KnowledgeSummaryService
from ..services.search.rag_service import RAGService
from ..services.storage import DocumentStorageService
@ -53,6 +55,59 @@ crawl_semaphore = asyncio.Semaphore(CONCURRENT_CRAWL_LIMIT)
active_crawl_tasks: dict[str, asyncio.Task] = {}
async def _validate_provider_api_key(provider: str = None) -> None:
"""Validate LLM provider API key before starting operations."""
logger.info("🔑 Starting API key validation...")
try:
if not provider:
provider = "openai"
logger.info(f"🔑 Testing {provider.title()} API key with minimal embedding request...")
# Test API key with minimal embedding request - this will fail if key is invalid
from ..services.embeddings.embedding_service import create_embedding
test_result = await create_embedding(text="test")
if not test_result:
logger.error(f"{provider.title()} API key validation failed - no embedding returned")
raise HTTPException(
status_code=401,
detail={
"error": f"Invalid {provider.title()} API key",
"message": f"Please verify your {provider.title()} API key in Settings.",
"error_type": "authentication_failed",
"provider": provider
}
)
logger.info(f"{provider.title()} API key validation successful")
except HTTPException:
# Re-raise our intended HTTP exceptions
logger.error("🚨 Re-raising HTTPException from validation")
raise
except Exception as e:
# Sanitize error before logging to prevent sensitive data exposure
error_str = str(e)
sanitized_error = ProviderErrorFactory.sanitize_provider_error(error_str, provider or "openai")
logger.error(f"❌ Caught exception during API key validation: {sanitized_error}")
# Always fail for any exception during validation - better safe than sorry
logger.error("🚨 API key validation failed - blocking crawl operation")
raise HTTPException(
status_code=401,
detail={
"error": "Invalid API key",
"message": f"Please verify your {(provider or 'openai').title()} API key in Settings before starting a crawl.",
"error_type": "authentication_failed",
"provider": provider or "openai"
}
) from None
# Request Models
class KnowledgeItemRequest(BaseModel):
url: str
@ -479,6 +534,14 @@ async def get_knowledge_item_code_examples(
@router.post("/knowledge-items/{source_id}/refresh")
async def refresh_knowledge_item(source_id: str):
"""Refresh a knowledge item by re-crawling its URL with the same metadata."""
# Validate API key before starting expensive refresh operation
logger.info("🔍 About to validate API key for refresh...")
provider_config = await credential_service.get_active_provider("embedding")
provider = provider_config.get("provider", "openai")
await _validate_provider_api_key(provider)
logger.info("✅ API key validation completed successfully for refresh")
try:
safe_logfire_info(f"Starting knowledge item refresh | source_id={source_id}")
@ -597,6 +660,13 @@ async def crawl_knowledge_item(request: KnowledgeItemRequest):
if not request.url.startswith(("http://", "https://")):
raise HTTPException(status_code=422, detail="URL must start with http:// or https://")
# Validate API key before starting expensive operation
logger.info("🔍 About to validate API key...")
provider_config = await credential_service.get_active_provider("embedding")
provider = provider_config.get("provider", "openai")
await _validate_provider_api_key(provider)
logger.info("✅ API key validation completed successfully")
try:
safe_logfire_info(
f"Starting knowledge item crawl | url={str(request.url)} | knowledge_type={request.knowledge_type} | tags={request.tags}"
@ -750,6 +820,14 @@ async def upload_document(
knowledge_type: str = Form("technical"),
):
"""Upload and process a document with progress tracking."""
# Validate API key before starting expensive upload operation
logger.info("🔍 About to validate API key for upload...")
provider_config = await credential_service.get_active_provider("embedding")
provider = provider_config.get("provider", "openai")
await _validate_provider_api_key(provider)
logger.info("✅ API key validation completed successfully for upload")
try:
# DETAILED LOGGING: Track knowledge_type parameter flow
safe_logfire_info(

View File

@ -99,6 +99,22 @@ class EmbeddingAPIError(EmbeddingError):
self.metadata["original_error_message"] = str(original_error)
class EmbeddingAuthenticationError(EmbeddingError):
"""
Raised when API authentication fails (invalid or expired API key).
This is a CRITICAL error that should stop the entire process
as continuing would be pointless without valid API access.
"""
def __init__(self, message: str, api_key_prefix: str | None = None, **kwargs):
super().__init__(message, **kwargs)
# Store masked API key prefix for debugging
self.api_key_prefix = api_key_prefix[:3] + "" if api_key_prefix and len(api_key_prefix) >= 3 else None
if self.api_key_prefix:
self.metadata["api_key_prefix"] = self.api_key_prefix
class EmbeddingValidationError(EmbeddingError):
"""
Raised when embedding validation fails (e.g., zero vector detected).

View File

@ -0,0 +1,162 @@
"""
Provider-agnostic error handling for LLM embedding services.
Supports OpenAI, Google AI, Anthropic, Ollama, and future providers
with unified error handling and sanitization patterns.
"""
import re
from abc import ABC, abstractmethod
from .embedding_exceptions import (
EmbeddingAPIError,
EmbeddingAuthenticationError,
EmbeddingQuotaExhaustedError,
EmbeddingRateLimitError,
)
class ProviderErrorAdapter(ABC):
"""Abstract base class for provider-specific error handling."""
@abstractmethod
def get_provider_name(self) -> str:
pass
@abstractmethod
def sanitize_error_message(self, message: str) -> str:
pass
class OpenAIErrorAdapter(ProviderErrorAdapter):
def get_provider_name(self) -> str:
return "openai"
def sanitize_error_message(self, message: str) -> str:
if not isinstance(message, str) or not message.strip() or len(message) > 2000:
return "OpenAI API encountered an error. Please verify your API key and quota."
sanitized = message
# Comprehensive OpenAI patterns with case-insensitive matching
patterns = [
(r'sk-[a-zA-Z0-9]{48}', '[REDACTED_KEY]'), # OpenAI API keys
(r'https?://[^\s]*openai\.com[^\s]*', '[REDACTED_URL]'), # OpenAI URLs
(r'org-[a-zA-Z0-9]{20,}', '[REDACTED_ORG]'), # Organization IDs
(r'proj_[a-zA-Z0-9]{10,}', '[REDACTED_PROJECT]'), # Project IDs
(r'req_[a-zA-Z0-9]{10,}', '[REDACTED_REQUEST]'), # Request IDs
(r'Bearer\s+[a-zA-Z0-9._-]+', 'Bearer [REDACTED_TOKEN]'), # Bearer tokens
]
for pattern, replacement in patterns:
sanitized = re.sub(pattern, replacement, sanitized, flags=re.IGNORECASE)
# Check for sensitive words after sanitization
sensitive_words = ['internal', 'server', 'endpoint']
if any(word in sanitized.lower() for word in sensitive_words):
return "OpenAI API encountered an error. Please verify your API key and quota."
return sanitized
class GoogleAIErrorAdapter(ProviderErrorAdapter):
def get_provider_name(self) -> str:
return "google"
def sanitize_error_message(self, message: str) -> str:
if not isinstance(message, str) or not message.strip() or len(message) > 2000:
return "Google AI API encountered an error. Please verify your API key."
sanitized = message
# Comprehensive Google AI patterns
patterns = [
(r'AIza[a-zA-Z0-9_-]{35}', '[REDACTED_KEY]'), # Google AI API keys
(r'https?://[^\s]*googleapis\.com[^\s]*', '[REDACTED_URL]'), # Google API URLs
(r'https?://[^\s]*googleusercontent\.com[^\s]*', '[REDACTED_URL]'), # Google content URLs
(r'projects/[a-zA-Z0-9_-]+', 'projects/[REDACTED_PROJECT]'), # GCP project paths
(r'ya29\.[a-zA-Z0-9_-]+', '[REDACTED_TOKEN]'), # OAuth tokens
(r'Bearer\s+[a-zA-Z0-9._-]+', 'Bearer [REDACTED_TOKEN]'), # Bearer tokens
]
for pattern, replacement in patterns:
sanitized = re.sub(pattern, replacement, sanitized, flags=re.IGNORECASE)
# Check for sensitive words
sensitive_words = ['internal', 'server', 'endpoint', 'project']
if any(word in sanitized.lower() for word in sensitive_words):
return "Google AI API encountered an error. Please verify your API key."
return sanitized
class AnthropicErrorAdapter(ProviderErrorAdapter):
def get_provider_name(self) -> str:
return "anthropic"
def sanitize_error_message(self, message: str) -> str:
if not isinstance(message, str) or not message.strip() or len(message) > 2000:
return "Anthropic API encountered an error. Please verify your API key."
sanitized = message
# Comprehensive Anthropic patterns
patterns = [
(r'sk-ant-[a-zA-Z0-9_-]{10,}', '[REDACTED_KEY]'), # Anthropic API keys
(r'https?://[^\s]*anthropic\.com[^\s]*', '[REDACTED_URL]'), # Anthropic URLs
(r'Bearer\s+[a-zA-Z0-9._-]+', 'Bearer [REDACTED_TOKEN]'), # Bearer tokens
]
for pattern, replacement in patterns:
sanitized = re.sub(pattern, replacement, sanitized, flags=re.IGNORECASE)
# Check for sensitive words
sensitive_words = ['internal', 'server', 'endpoint']
if any(word in sanitized.lower() for word in sensitive_words):
return "Anthropic API encountered an error. Please verify your API key."
return sanitized
class ProviderErrorFactory:
"""Factory for provider-agnostic error handling."""
_adapters = {
"openai": OpenAIErrorAdapter(),
"google": GoogleAIErrorAdapter(),
"anthropic": AnthropicErrorAdapter(),
}
@classmethod
def get_adapter(cls, provider: str) -> ProviderErrorAdapter:
return cls._adapters.get(provider.lower(), cls._adapters["openai"])
@classmethod
def sanitize_provider_error(cls, message: str, provider: str) -> str:
adapter = cls.get_adapter(provider)
return adapter.sanitize_error_message(message)
@classmethod
def detect_provider_from_error(cls, error_str: str) -> str:
"""Detect provider from error message with comprehensive pattern matching."""
if not error_str:
return "openai"
error_lower = error_str.lower()
# Case-insensitive provider detection with multiple patterns
if ("anthropic" in error_lower or
re.search(r'sk-ant-[a-zA-Z0-9_-]+', error_str, re.IGNORECASE) or
"claude" in error_lower):
return "anthropic"
elif ("google" in error_lower or
re.search(r'AIza[a-zA-Z0-9_-]+', error_str, re.IGNORECASE) or
"googleapis" in error_lower or
"vertex" in error_lower):
return "google"
elif ("openai" in error_lower or
re.search(r'sk-[a-zA-Z0-9]{48}', error_str, re.IGNORECASE) or
"gpt" in error_lower):
return "openai"
else:
return "openai" # Safe default