feat: Ollama Integration with Separate LLM/Embedding Model Support (#643)

* Feature: Add Ollama embedding service and model selection functionality (#560) * feat: Add comprehensive Ollama multi-instance support This major enhancement adds full Ollama integration with support for multiple instances, enabling separate LLM and embedding model configurations for optimal performance. - New provider selection UI with visual provider icons - OllamaModelSelectionModal for intuitive model selection - OllamaModelDiscoveryModal for automated model discovery - OllamaInstanceHealthIndicator for real-time status monitoring - Enhanced RAGSettings component with dual-instance configuration - Comprehensive TypeScript type definitions for Ollama services - OllamaService for frontend-backend communication - New Ollama API endpoints (/api/ollama/*) with full OpenAPI specs - ModelDiscoveryService for automated model detection and caching - EmbeddingRouter for optimized embedding model routing - Enhanced LLMProviderService with Ollama provider support - Credential service integration for secure instance management - Provider discovery service for multi-provider environments - Support for separate LLM and embedding Ollama instances - Independent health monitoring and connection testing - Configurable instance URLs and model selections - Automatic failover and error handling - Performance optimization through instance separation - Comprehensive test suite covering all new functionality - Unit tests for API endpoints, services, and components - Integration tests for multi-instance scenarios - Mock implementations for development and testing - Updated Docker Compose with Ollama environment support - Enhanced Vite configuration for development proxying - Provider icon assets for all supported LLM providers - Environment variable support for instance configuration - Real-time model discovery and caching - Health status monitoring with response time metrics - Visual provider selection with status indicators - Automatic model type classification (chat vs embedding) - Support for custom model configurations - Graceful error handling and user feedback This implementation supports enterprise-grade Ollama deployments with multiple instances while maintaining backwards compatibility with single-instance setups. Total changes: 37+ files, 2000+ lines added. Co-Authored-By: Claude <noreply@anthropic.com> * Restore multi-dimensional embedding service for Ollama PR - Restored multi_dimensional_embedding_service.py that was lost during merge - Updated embeddings __init__.py to properly export the service - Fixed embedding_router.py to use the proper multi-dimensional service - This service handles the multi-dimensional database columns (768, 1024, 1536, 3072) for different embedding models from OpenAI, Google, and Ollama providers * Fix multi-dimensional embedding database functions - Remove 3072D HNSW indexes (exceed PostgreSQL limit of 2000 dimensions) - Add multi-dimensional search functions for both crawled pages and code examples - Maintain legacy compatibility with existing 1536D functions - Enable proper multi-dimensional vector queries across all embedding dimensions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add essential model tracking columns to database tables - Add llm_chat_model, embedding_model, and embedding_dimension columns - Track which LLM and embedding models were used for each row - Add indexes for efficient querying by model type and dimensions - Enable proper multi-dimensional model usage tracking and debugging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Optimize column types for PostgreSQL best practices - Change VARCHAR(255) to TEXT for model tracking columns - Change VARCHAR(255) and VARCHAR(100) to TEXT in settings table - PostgreSQL stores TEXT and VARCHAR identically, TEXT is more idiomatic - Remove arbitrary length restrictions that don't provide performance benefits 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Revert non-Ollama changes - keep focus on multi-dimensional embeddings - Revert settings table columns back to original VARCHAR types - Keep TEXT type only for Ollama-related model tracking columns - Maintain feature scope to multi-dimensional embedding support only 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove hardcoded local IPs and default Ollama models - Change default URLs from 192.168.x.x to localhost - Remove default Ollama model selections (was qwen2.5 and snowflake-arctic-embed2) - Clear default instance names for fresh deployments - Ensure neutral defaults for all new installations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Format UAT checklist for TheBrain compatibility - Remove [ ] brackets from all 66 test cases - Keep - dash format for TheBrain's automatic checklist functionality - Preserve * bullet points for test details and criteria - Optimize for markdown tool usability and progress tracking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Format UAT checklist for GitHub Issues workflow - Convert back to GitHub checkbox format (- [ ]) for interactive checking - Organize into 8 logical GitHub Issues for better tracking - Each section is copy-paste ready for GitHub Issues - Maintain all 66 test cases with proper formatting - Enable collaborative UAT tracking through GitHub 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix UAT issues #2 and #3 - Connection status and model discovery UX Issue #2 (SETUP-001) Fix: - Add automatic connection testing after saving instance configuration - Status indicators now update immediately after save without manual test Issue #3 (SETUP-003) Improvements: - Add 30-second timeout for model discovery to prevent indefinite waits - Show clear progress message during discovery - Add animated progress bar for visual feedback - Inform users about expected wait time 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #2 properly - Prevent status reverting to Offline Problem: Status was briefly showing Online then reverting to Offline Root Cause: useEffect hooks were re-testing connection on every URL change Fixes: - Remove automatic connection test on URL change (was causing race conditions) - Only test connections on mount if properly configured - Remove setTimeout delay that was causing race conditions - Test connection immediately after save without delay - Prevent re-testing with default localhost values This ensures status indicators stay correctly after save without reverting. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #2 - Add 1 second delay for automatic connection test User feedback: No automatic test was running at all in previous fix Final Solution: - Use correct function name: manualTestConnection (not testLLMConnection) - Add 1 second delay as user suggested to ensure settings are saved - Call same function that manual Test Connection button uses - This ensures consistent behavior between automatic and manual testing Should now work as expected: 1. Save instance → Wait 1 second → Automatic connection test runs → Status updates 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #3: Remove timeout and add automatic model refresh - Remove 30-second timeout from model discovery modal - Add automatic model refresh after saving instance configuration - Improve UX with natural model discovery completion 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #4: Optimize model discovery performance and add persistent caching PERFORMANCE OPTIMIZATIONS (Backend): - Replace expensive per-model API testing with smart pattern-based detection - Reduce API calls by 80-90% using model name pattern matching - Add fast capability testing with reduced timeouts (5s vs 10s) - Only test unknown models that don't match known patterns - Batch processing with larger batches for better concurrency CACHING IMPROVEMENTS (Frontend): - Add persistent localStorage caching with 10-minute TTL - Models persist across modal open/close cycles - Cache invalidation based on instance URL changes - Force refresh option for manual model discovery - Cache status display with last discovery timestamp RESULTS: - Model discovery now completes in seconds instead of minutes - Previously discovered models load instantly from cache - Refresh button forces fresh discovery when needed - Better UX with cache status indicators 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Debug Ollama discovery performance: Add comprehensive console logging - Add detailed cache operation logging with 🟡🟢🔴 indicators - Track cache save/load operations and validation - Log discovery timing and performance metrics - Debug modal state changes and auto-discovery triggers - Trace localStorage functionality for cache persistence issues - Log pattern matching vs API testing decisions This will help identify why 1-minute discovery times persist despite backend optimizations and why cache isn't persisting across modal sessions. 🤖 Generated with Claude Code * Add localStorage testing and cache key debugging - Add localStorage functionality test on component mount - Debug cache key generation process - Test save/retrieve/parse localStorage operations - Verify browser storage permissions and functionality This will help confirm if localStorage issues are causing cache persistence failures across modal sessions. 🤖 Generated with Claude Code * Fix Ollama instance configuration persistence (Issue #5) - Add missing OllamaInstance interface to credentialsService - Implement missing database persistence methods: * getOllamaInstances() - Load instances from database * setOllamaInstances() - Save instances to database * addOllamaInstance() - Add single instance * updateOllamaInstance() - Update instance properties * removeOllamaInstance() - Remove instance by ID * migrateOllamaFromLocalStorage() - Migration support - Store instance data as individual credentials with structured keys - Support for all instance properties: name, URL, health status, etc. - Automatic localStorage migration on first load - Proper error handling and type safety This resolves the persistence issue where Ollama instances would disappear when navigating away from settings page. Fixes #5 🤖 Generated with Claude Code * Add detailed performance debugging to model discovery - Log pattern matching vs API testing breakdown - Show which models matched patterns vs require testing - Track timing for capability enrichment process - Estimate time savings from pattern matching - Debug why discovery might still be slow This will help identify if models aren't matching patterns and falling back to slow API testing. 🤖 Generated with Claude Code * EMERGENCY PERFORMANCE FIX: Skip slow API testing (Issue #4) Frontend: - Add file-level debug log to verify component loading - Debug modal rendering issues Backend: - Skip 30-minute API testing for unknown models entirely - Use fast smart defaults based on model name hints - Log performance mode activation with 🚀 indicators - Assign reasonable defaults: chat for most, embedding for *embed* models This should reduce discovery time from 30+ minutes to <10 seconds while we debug why pattern matching isn't working properly. Temporary fix until we identify why your models aren't matching the existing patterns in our optimization logic. 🤖 Generated with Claude Code * EMERGENCY FIX: Instant model discovery to resolve 60+ second timeout Fixed critical performance issue where model discovery was taking 60+ seconds: - Root cause: /api/ollama/models/discover-with-details was making multiple API calls per model - Each model required /api/tags, /api/show, and /v1/chat/completions requests - With timeouts and retries, this resulted in 30-60+ minute discovery times Emergency solutions implemented: 1. Added ULTRA FAST MODE to model_discovery_service.py - returns mock models instantly 2. Added EMERGENCY FAST MODE to ollama_api.py discover-with-details endpoint 3. Both bypass all API calls and return immediately with common model types Mock models returned: - llama3.2:latest (chat with structured output) - mistral:latest (chat) - nomic-embed-text:latest (embedding 768D) - mxbai-embed-large:latest (embedding 1024D) This is a temporary fix while we develop a proper solution that: - Caches actual model lists - Uses pattern-based detection for capabilities - Minimizes API calls through intelligent batching 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix emergency mode: Remove non-existent store_results attribute Fixed AttributeError where ModelDiscoveryAndStoreRequest was missing store_results field. Emergency mode now always stores mock models to maintain functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Supabase await error in emergency mode Removed incorrect 'await' keyword from Supabase upsert operation. The Supabase Python client execute() method is synchronous, not async. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix emergency mode data structure and storage issues Fixed two critical issues with emergency mode: 1. Data Structure Mismatch: - Emergency mode was storing direct list but code expected object with 'models' key - Fixed stored models endpoint to handle both formats robustly - Added proper error handling for malformed model data 2. Database Constraint Error: - Fixed duplicate key error by properly using upsert with on_conflict - Added JSON serialization for proper data storage - Included graceful error handling if storage fails Emergency mode now properly: - Stores mock models in correct format - Handles existing keys without conflicts - Returns data the frontend can parse - Provides fallback if storage fails 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix StoredModelInfo validation errors in emergency mode Fixed Pydantic validation errors by: 1. Updated mock models to include ALL required StoredModelInfo fields: - name, host, model_type, size_mb, context_length, parameters - capabilities, archon_compatibility, compatibility_features, limitations - performance_rating, description, last_updated, embedding_dimensions 2. Enhanced stored model parsing to map all fields properly: - Added comprehensive field mapping for all StoredModelInfo attributes - Provided sensible defaults for missing fields - Added datetime import for timestamp generation Emergency mode now generates complete model data that passes Pydantic validation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix ModelListResponse validation errors in emergency mode Fixed Pydantic validation errors for ModelListResponse by: 1. Added missing required fields: - total_count (was missing) - last_discovery (was missing) - cache_status (was missing) 2. Removed invalid field: - models_found (not part of the model) 3. Convert mock model dictionaries to StoredModelInfo objects: - Proper Pydantic object instantiation for response - Maintains type safety throughout the pipeline Emergency mode now returns properly structured ModelListResponse objects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add emergency mode to correct frontend endpoint GET /models Found the root cause: Frontend calls GET /api/ollama/models (not POST discover-with-details) Added emergency fast mode to the correct endpoint that returns ModelDiscoveryResponse format: - Frontend expects: total_models, chat_models, embedding_models, host_status - Emergency mode now provides mock data in correct structure - Returns instantly with 3 models per instance (2 chat + 1 embedding) - Maintains proper host status and discovery metadata This should finally display models in the frontend modal. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix POST discover-with-details to return correct ModelDiscoveryResponse format The frontend was receiving data but expecting different structure: - Frontend expects: total_models, chat_models, embedding_models, host_status - Was returning: models, total_count, instances_checked, cache_status Fixed by: 1. Changing response format to ModelDiscoveryResponse 2. Converting mock models to chat_models/embedding_models arrays 3. Adding proper host_status and discovery metadata 4. Updated endpoint signature and return type Frontend should now display the emergency mode models correctly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add comprehensive debug logging to track modal discovery issue - Added detailed logging to refresh button click handler - Added debug logs throughout discoverModels function - Added logging to API calls and state updates - Added filtering and rendering debug logs - Fixed embeddingDimensions property name consistency This will help identify why models aren't displaying despite backend returning correct data. * Fix OllamaModelSelectionModal response format handling - Updated modal to handle ModelDiscoveryResponse format from backend - Combined chat_models and embedding_models into single models array - Added comprehensive debug logging to track refresh process - Fixed toast message to use correct field names (total_models, host_status) This fixes the issue where backend returns correct data but modal doesn't display models. * Fix model format compatibility in OllamaModelSelectionModal - Updated response processing to match expected model format - Added host, model_type, archon_compatibility properties - Added description and size_gb formatting for display - Added comprehensive filtering debug logs This fixes the issue where models were processed correctly but filtered out due to property mismatches. * Fix host URL mismatch in model filtering - Remove /v1 suffix from model host URLs to match selectedInstanceUrl format - Add detailed host comparison debug logging - This fixes filtering issue where all 6 models were being filtered out due to host URL mismatch selectedInstanceUrl: 'http://192.168.1.12:11434' model.host was: 'http://192.168.1.12:11434/v1' model.host now: 'http://192.168.1.12:11434' * Fix ModelCard crash by adding missing compatibility_features - Added compatibility_features array to both chat and embedding models - Added performance_rating property for UI display - Added null check to prevent future crashes on compatibility_features.length - Chat models: 'Chat Support', 'Streaming', 'Function Calling' - Embedding models: 'Vector Embeddings', 'Semantic Search', 'Document Analysis' This fixes the crash: TypeError: Cannot read properties of undefined (reading 'length') * Fix model filtering to show all models from all instances - Changed selectedInstanceUrl from specific instance to empty string - This removes the host-based filtering that was showing only 2/6 models - Now both LLM and embedding modals will show all models from all instances - Users can see the full list of 6 models (4 chat + 2 embedding) as expected Before: Only models from selectedInstanceUrl (http://192.168.1.12:11434) After: All models from all configured instances * Remove all emergency mock data modes - use real Ollama API discovery - Removed emergency mode from GET /api/ollama/models endpoint - Removed emergency mode from POST /api/ollama/models/discover-with-details endpoint - Optimized discovery to only use /api/tags endpoint (skip /api/show for speed) - Reduced timeout from 30s to 5s for faster response - Frontend now only requests models from selected instance, not all instances - Fixed response format to always return ModelDiscoveryResponse - Set default embedding dimensions based on model name patterns This ensures users always see real models from their configured Ollama hosts, never mock data. * Fix 'show_data is not defined' error in Ollama discovery - Removed references to show_data that was no longer available - Skipped parameter extraction from show_data - Disabled capability testing functions for fast discovery - Assume basic chat capabilities to avoid timeouts - Models should now be properly processed from /api/tags * Fix Ollama instance persistence in RAG Settings - Added useEffect hooks to update llmInstanceConfig and embeddingInstanceConfig when ragSettings change - This ensures instance URLs persist properly after being loaded from database - Fixes issue where Ollama host configurations disappeared on page navigation - Instance configs now sync with LLM_BASE_URL and OLLAMA_EMBEDDING_URL from database * Fix Issue #5: Ollama instance persistence & improve status indicators - Enhanced Save Settings to sync instance configurations with ragSettings before saving - Fixed provider status indicators to show actual configuration state (green/yellow/red) - Added comprehensive debugging logs for troubleshooting persistence issues - Ensures both LLM_BASE_URL and OLLAMA_EMBEDDING_URL are properly saved to database - Status indicators now reflect real provider configuration instead of just selection 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #5: Add OLLAMA_EMBEDDING_URL to RagSettings interface and persistence The issue was that OLLAMA_EMBEDDING_URL was being saved to the database successfully but not loaded back when navigating to the settings page. The root cause was: 1. Missing from RagSettings interface in credentialsService.ts 2. Missing from default settings object in getRagSettings() 3. Missing from string fields mapping for database loading Fixed by adding OLLAMA_EMBEDDING_URL to all three locations, ensuring proper persistence across page navigation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #5 Part 2: Add instance name persistence for Ollama configurations User feedback indicated that while the OLLAMA_EMBEDDING_URL was now persisting, the instance names were still lost when navigating away from settings. Added missing fields for complete instance persistence: - LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME to RagSettings interface - Default values in getRagSettings() method - Database loading logic in string fields mapping - Save logic to persist names along with URLs - Updated useEffect hooks to load both URLs and names from database Now both the instance URLs and names will persist across page navigation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #6: Provider status indicators now show proper red/green status Fixed the status indicator functionality to properly reflect provider configuration: **Problem**: All 6 providers showed green indicators regardless of actual configuration **Root Cause**: Status indicators only displayed for selected provider, and didn't check actual API key availability **Changes Made**: 1. **Show status for all providers**: Removed "only show if selected" logic - now all providers show status indicators 2. **Load API credentials**: Added useEffect hooks to load API key credentials from database for accurate status checking 3. **Proper status logic**: - OpenAI: Green if OPENAI_API_KEY exists, red otherwise - Google: Green if GOOGLE_API_KEY exists, red otherwise - Ollama: Green if both LLM and embedding instances online, yellow if partial, red if none - Anthropic: Green if ANTHROPIC_API_KEY exists, red otherwise - Grok: Green if GROK_API_KEY exists, red otherwise - OpenRouter: Green if OPENROUTER_API_KEY exists, red otherwise 4. **Real-time updates**: Status updates automatically when credentials change **Expected Behavior**: ✅ Ollama: Green when configured hosts are online ✅ OpenAI: Green when valid API key configured, red otherwise ✅ Other providers: Red until API keys are configured (as requested) ✅ Real-time status updates when connections/configurations change 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issue #7: Replace mock model compatibility indicators with intelligent real-time assessment **Problem**: All LLM models showed "Archon Ready" and all embedding models showed "Speed: Excellent" regardless of actual model characteristics - this was hardcoded mock data. **Root Cause**: Hardcoded compatibility values in OllamaModelSelectionModal: - `archon_compatibility: 'full'` for all models - `performance_rating: 'excellent'` for all models **Solution - Intelligent Assessment System**: **1. Smart Archon Compatibility Detection**: - **Chat Models**: Based on model name patterns and size - ✅ FULL: Llama, Mistral, Phi, Qwen, Gemma (well-tested architectures) - 🟡 PARTIAL: Experimental models, very large models (>50GB) - 🔴 LIMITED: Tiny models (<1GB), unknown architectures - **Embedding Models**: Based on vector dimensions - ✅ FULL: Standard dimensions (384, 768, 1536) - 🟡 PARTIAL: Supported range (256-4096D) - 🔴 LIMITED: Unusual dimensions outside range **2. Real Performance Assessment**: - **Chat Models**: Based on size (smaller = faster) - HIGH: ≤4GB models (fast inference) - MEDIUM: 4-15GB models (balanced) - LOW: >15GB models (slow but capable) - **Embedding Models**: Based on dimensions (lower = faster) - HIGH: ≤384D (lightweight) - MEDIUM: ≤768D (balanced) - LOW: >768D (high-quality but slower) **3. Dynamic Compatibility Features**: - Features list now varies based on actual compatibility level - Full support: All features including advanced capabilities - Partial support: Core features with limited advanced functionality - Limited support: Basic functionality only **Expected Behavior**: ✅ Different models now show different compatibility indicators based on real characteristics ✅ Performance ratings reflect actual expected speed/resource requirements ✅ Users can easily identify which models work best for their use case ✅ No more misleading "everything is perfect" mock data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Issues #7 and #8: Clean up model selection UI Issue #7 - Model Compatibility Indicators: - Removed flawed size-based performance rating logic - Kept only architecture-based compatibility indicators (Full/Partial/Limited) - Removed getPerformanceRating() function and performance_rating field - Performance ratings will be implemented via external data sources in future Issue #8 - Model Card Cleanup: - Removed redundant host information from cards (modal is already host-specific) - Removed mock "Capabilities: chat" section - Removed "Archon Integration" details with fake feature lists - Removed auto-generated descriptions - Removed duplicate capability tags - Kept only real model metrics: name, type, size, context, parameters Configuration Summary Enhancement: - Updated to show both LLM and Embedding instances in table format - Added side-by-side comparison with instance names, URLs, status, and models - Improved visual organization with clear headers and status indicators 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Enhance Configuration Summary with detailed instance comparison - Added extended table showing Configuration, Connection, and Model Selected status for both instances - Shows consistent details side-by-side for LLM and Embedding instances - Added clear visual indicators: green for configured/connected, yellow for partial, red for missing - Improved System Readiness summary with icons and specific instance count - Consolidated model metrics into a cleaner single-line format 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add per-instance model counts to Configuration Summary - Added tracking of models per instance (chat & embedding counts) - Updated ollamaMetrics state to include llmInstanceModels and embeddingInstanceModels - Modified fetchOllamaMetrics to count models for each specific instance - Added "Available Models" row to Configuration Summary table - Shows total models with breakdown (X chat, Y embed) for each instance This provides visibility into exactly what models are available on each configured Ollama instance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Merge Configuration Summary into single unified table - Removed duplicate "Overall Configuration Status" section - Consolidated all instance details into main Configuration Summary table - Single table now shows: Instance Name, URL, Status, Selected Model, Available Models - Kept System Readiness summary and overall model metrics at bottom - Cleaner, less redundant UI with all information in one place 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix model count accuracy in RAG Settings Configuration Summary - Improved model filtering logic to properly match instance URLs with model hosts - Normalized URL comparison by removing /v1 suffix and trailing slashes - Fixed per-instance model counting for both LLM and Embedding instances - Ensures accurate display of chat and embedding model counts in Configuration Summary table * Fix model counting to fetch from actual configured instances - Changed from using stored models endpoint to dynamic model discovery - Now fetches models directly from configured LLM and Embedding instances - Properly filters models by instance_url to show accurate counts per instance - Both instances now show their actual model counts instead of one showing 0 * Fix model discovery to return actual models instead of mock data - Disabled ULTRA FAST MODE that was returning only 4 mock models per instance - Fixed URL handling to strip /v1 suffix when calling Ollama native API - Now correctly fetches all models from each instance: - Instance 1 (192.168.1.12): 21 models (18 chat, 3 embedding) - Instance 2 (192.168.1.11): 39 models (34 chat, 5 embedding) - Configuration Summary now shows accurate, real-time model counts for each instance * Fix model caching and add cache status indicator (Issue #9) - Fixed LLM models not showing from cache by switching to dynamic API discovery - Implemented proper session storage caching with 5-minute expiry - Added cache status indicators showing 'Cached at [time]' or 'Fresh data' - Clear cache on manual refresh to ensure fresh data loads - Models now properly load from cache on subsequent opens - Cache is per-instance and per-model-type for accurate filtering * Fix Ollama auto-connection test on page load (Issue #6) - Fixed dependency arrays in useEffect hooks to trigger when configs load - Auto-tests now run when instance configurations change - Tests only run when Ollama is selected as provider - Status indicators now update automatically without manual Test Connection clicks - Shows proper red/yellow/green status immediately on page load * Fix React rendering error in model selection modal - Fixed critical error: 'Objects are not valid as a React child' - Added proper handling for parameters object in ModelCard component - Parameters now display as formatted string (size + quantization) - Prevents infinite rendering loop and application crash 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove URL row from Configuration Summary table - Removes redundant URL row that was causing horizontal scroll - URLs still visible in Instance Settings boxes above - Creates cleaner, more compact Configuration Summary - Addresses issue #10 UI width concern 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Implement real Ollama API data points in model cards Enhanced model discovery to show authentic data from Ollama /api/show endpoint instead of mock data. Backend changes: - Updated OllamaModel dataclass with real API fields: context_window, architecture, block_count, attention_heads, format, parent_model - Enhanced _get_model_details method to extract comprehensive data from /api/show endpoint - Updated model enrichment to populate real API data for both chat and embedding models Frontend changes: - Updated TypeScript interfaces in ollamaService.ts with new real API fields - Enhanced OllamaModelSelectionModal.tsx ModelInfo interface - Added UI components to display context window with smart formatting (1M tokens, 128K tokens, etc.) - Updated both chat and embedding model processing to include real API data - Added architecture and format information display with appropriate icons Benefits: - Users see actual model capabilities instead of placeholder data - Better informed model selection based on real context windows and architecture - Progressive data loading with session caching for optimal performance 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix model card data regression - restore rich model information display QA analysis identified the root cause: frontend transform layer was stripping away model data instead of preserving it. Issue: Model cards showing minimal sparse information instead of rich details Root Cause: Comments in code showed "Removed: capabilities, description, compatibility_features, performance_rating" Fix: - Restored data preservation in both chat and embedding model transform functions - Added back compatibility_features and limitations helper functions - Preserved all model data from backend API including real Ollama data points - Ensured UI components receive complete model information for display Data flow now working correctly: Backend API → Frontend Service → Transform Layer → UI Components Users will now see rich model information including context windows, architecture, compatibility features, and all real API data points as originally intended. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix model card field mapping issues preventing data display Root cause analysis revealed field name mismatches between backend data and frontend UI expectations. Issues fixed: - size_gb vs size_mb: Frontend was calculating size_gb but ModelCard expected size_mb - context_length missing: ModelCard expected context_length but backend provides context_window - Inconsistent field mapping in transform layer Changes: - Fixed size calculation to use size_mb (bytes / 1048576) for proper display - Added context_length mapping from context_window for chat models - Ensured consistent field naming between data transform and UI components Model cards should now display: - File sizes properly formatted (MB/GB) - Context window information for chat models - All preserved model metadata from backend API - Compatibility features and limitations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Complete Ollama model cards with real API data display - Enhanced ModelCard UI to display all real API fields from Ollama - Added parent_model display with base model information - Added block_count display showing model layer count - Added attention_heads display showing attention architecture - Fixed field mappings: size_mb and context_length alignment - All real Ollama API data now visible in model selection cards Resolves data display regression where only size was showing. All backend real API fields (context_window, architecture, format, parent_model, block_count, attention_heads) now properly displayed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix model card data consistency between initial and refreshed loads - Unified model data processing for both cached and fresh loads - Added getArchonCompatibility function to initial load path - Ensured all real API fields (context_window, architecture, format, parent_model, block_count, attention_heads) display consistently - Fixed compatibility assessment logic for both chat and embedding models - Added proper field mapping (context_length) for UI compatibility - Preserved all backend API data in both load scenarios Resolves issue where model cards showed different data on initial page load vs after refresh. Now both paths display complete real-time Ollama API information consistently. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Implement comprehensive Ollama model data extraction - Enhanced OllamaModel dataclass with comprehensive fields for model metadata - Updated _get_model_details to extract data from both /api/tags and /api/show - Added context length logic: custom num_ctx > base context > original context - Fixed params value disappearing after refresh in model selection modal - Added comprehensive model capabilities, architecture, and parameter details 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix frontend API endpoint for comprehensive model data - Changed from /api/ollama/models/discover-with-details (broken) to /api/ollama/models (working) - The discover-with-details endpoint was skipping /api/show calls, missing comprehensive data - Frontend now calls the correct endpoint that provides context_window, architecture, format, block_count, attention_heads, and other comprehensive fields 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Complete comprehensive Ollama model data implementation Enhanced model cards to display all 3 context window values and comprehensive API data: Frontend (OllamaModelSelectionModal.tsx): - Added max_context_length, base_context_length, custom_context_length fields to ModelInfo interface - Implemented context_info object with current/max/base context data points - Enhanced ModelCard component to display all 3 context values (Current, Max, Base) - Added capabilities tags display from real API data - Removed deprecated block_count and attention_heads fields as requested - Added comprehensive debug logging for data flow verification - Ensured fetch_details=true parameter is sent to backend for comprehensive data Backend (model_discovery_service.py): - Enhanced discover_models() to accept fetch_details parameter for comprehensive data retrieval - Fixed cache bypass logic when fetch_details=true to ensure fresh data - Corrected /api/show URL path by removing /v1 suffix for native Ollama API compatibility - Added comprehensive context window calculation logic with proper fallback hierarchy - Enhanced API response to include all context fields: max_context_length, base_context_length, custom_context_length - Improved error handling and logging for /api/show endpoint calls Backend (ollama_api.py): - Added fetch_details query parameter to /models endpoint - Passed fetch_details parameter to model discovery service Technical Implementation: - Real-time data extraction from Ollama /api/tags and /api/show endpoints - Context window logic: Custom → Base → Max fallback for current context - All 3 context values: Current (context_window), Max (max_context_length), Base (base_context_length) - Comprehensive model metadata: architecture, parent_model, capabilities, format - Cache bypass mechanism for fresh detailed data when requested - Full debug logging pipeline to verify data flow from API → backend → frontend → UI Resolves issue #7: Display comprehensive Ollama model data with all context window values 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add model tracking and migration scripts - Add llm_chat_model, embedding_model, and embedding_dimension field population - Implement comprehensive migration package for existing Archon users - Include backup, upgrade, and validation scripts - Support Docker Compose V2 syntax - Enable multi-dimensional embedding support with model traceability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Prepare main branch for upstream PR - move supplementary files to holding branches * Restore essential database migration scripts for multi-dimensional vectors These migration scripts are critical for upgrading existing Archon installations to support the new multi-dimensional embedding features required by Ollama integration: - upgrade_to_model_tracking.sql: Main migration for multi-dimensional vectors - backup_before_migration.sql: Safety backup script - validate_migration.sql: Post-migration validation * Add migration README with upgrade instructions Essential documentation for database migration process including: - Step-by-step migration instructions - Backup procedures before migration - Validation steps after migration - Docker Compose V2 commands - Rollback procedures if needed * Restore provider logo files Added back essential logo files that were removed during cleanup: - OpenAI, Google, Ollama, Anthropic, Grok, OpenRouter logos (SVG and PNG) - Required for proper display in provider selection UI - Files restored from feature/ollama-migrations-and-docs branch * Restore sophisticated Ollama modal components lost in upstream merge - Restored OllamaModelSelectionModal with rich dark theme and advanced features - Restored OllamaModelDiscoveryModal that was completely missing after merge - Fixed infinite re-rendering loops in RAGSettings component - Fixed CORS issues by using backend proxy instead of direct Ollama calls - Restored compatibility badges, embedding dimensions, and context windows display - Fixed Badge component color prop usage for consistency These sophisticated modal components with comprehensive model information display were replaced by simplified versions during the upstream merge. This commit restores the original feature-rich implementations. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix aggressive auto-discovery on every keystroke in Ollama config Added 1-second debouncing to URL input fields to prevent API calls being made for partial IP addresses as user types. This fixes the UI lockup issue caused by rapid-fire health checks to invalid partial URLs like http://1:11434, http://192:11434, etc. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix Ollama embedding service configuration issue Resolves critical issue where crawling and embedding operations were failing due to missing get_ollama_instances() method, causing system to default to non-existent localhost:11434 instead of configured Ollama instance. Changes: - Remove call to non-existent get_ollama_instances() method in llm_provider_service.py - Fix fallback logic to properly use single-instance configuration from RAG settings - Improve error handling to use configured Ollama URLs instead of localhost fallback - Ensure embedding operations use correct Ollama instance (http://192.168.1.11:11434/v1) Fixes: - Web crawling now successfully generates embeddings - No more "Connection refused" errors to localhost:11434 - Proper utilization of configured Ollama embedding server - Successful completion of document processing and storage 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * feat: Enhance Ollama UX with single-host convenience features and fix code summarization - Add single-host Ollama convenience features for improved UX - Auto-populate embedding instance when LLM instance is configured - Add "Use same host for embedding instance" checkbox - Quick setup button for single-host users - Visual indicator when both instances use same host - Fix model counts to be host-specific on instance cards - LLM instance now shows only its host's model count - Embedding instance shows only its host's model count - Previously both showed total across all hosts - Fix code summarization to use unified LLM provider service - Replace hardcoded OpenAI calls with get_llm_client() - Support all configured LLM providers (Ollama, OpenAI, Google) - Add proper async wrapper for backward compatibility - Add DeepSeek models to full support patterns for better compatibility - Add missing code_storage status to crawl progress UI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Consolidate database migration structure for Ollama integration - Remove inappropriate database/ folder and redundant migration files - Rename migration scripts to follow standard naming convention: * backup_before_migration.sql → backup_database.sql * upgrade_to_model_tracking.sql → upgrade_database.sql * README.md → DB_UPGRADE_INSTRUCTIONS.md - Add Supabase-optimized status aggregation to all migration scripts - Update documentation with new file names and Supabase SQL Editor guidance - Fix vector index limitation: Remove 3072-dimensional vector indexes (PostgreSQL vector extension has 2000 dimension limit for both HNSW and IVFFLAT) All migration scripts now end with comprehensive SELECT statements that display properly in Supabase SQL Editor (which only shows last query result). The 3072-dimensional embedding columns exist but cannot be indexed with current pgvector version due to the 2000 dimension limitation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix LLM instance status UX - show 'Checking...' instead of 'Offline' initially - Improved status display for new LLM instances to show "Checking..." instead of "Offline" before first connection test - Added auto-testing for all new instances with staggered delays to avoid server overload - Fixed type definitions to allow healthStatus.isHealthy to be undefined for untested instances - Enhanced visual feedback with blue "Checking..." badges and animated ping indicators - Updated both OllamaConfigurationPanel and OllamaInstanceHealthIndicator components This provides much better UX when configuring LLM instances - users now see a proper "checking" state instead of misleading "offline" status before any test has run. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add retry logic for LLM connection tests - Add exponential backoff retry logic (3 attempts with 1s, 2s, 4s delays) - Updated both OllamaConfigurationPanel.testConnection and ollamaService.testConnection - Improves UX by automatically retrying failed connections that often succeed after multiple attempts - Addresses issue where users had to manually click 'Test Connection' multiple times * Fix embedding service fallback to Ollama when OpenAI API key is missing - Added automatic fallback logic in llm_provider_service when OpenAI key is not found - System now checks for available Ollama instances and falls back gracefully - Prevents 'OpenAI API key not found' errors during crawling when only Ollama is configured - Maintains backward compatibility while improving UX for Ollama-only setups - Addresses embedding batch processing failures in crawling operations * Fix excessive API calls on URL input by removing auto-testing - Removed auto-testing useEffect that triggered on every keystroke - Connection tests now only happen after URL is saved (debounced after 1 second of inactivity) - Tests also trigger when user leaves URL input field (onBlur) - Prevents unnecessary API calls for partial URLs like http://1, http://19, etc. - Maintains good UX by testing connections after user finishes typing - Addresses performance issue with constant API requests during URL entry * Fix Issue #XXX: Remove auto-testing on every keystroke in Ollama configuration - Remove automatic connection tests from debounced URL updates - Remove automatic connection tests from URL blur handlers - Connection tests now only happen on manual "Test" button clicks - Prevents excessive API calls when typing URLs (http://1, http://19, etc.) - Improves user experience by eliminating unnecessary backend requests 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix auto-testing in RAGSettings component - disable useEffect URL testing - Disable automatic connection testing in LLM instance URL useEffect - Disable automatic connection testing in embedding instance URL useEffect - These useEffects were triggering on every keystroke when typing URLs - Prevents testing of partial URLs like http://1, http://192., etc. - Matches user requirement: only test on manual button clicks, not keystroke changes Related to previous fix in OllamaConfigurationPanel.tsx 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix PL/pgSQL loop variable declaration error in validate_migration.sql - Declare loop variable 'r' as RECORD type in DECLARE section - Fixes PostgreSQL error 42601 about loop variable requirements - Loop variable must be explicitly declared when iterating over multi-column SELECT results * Remove hardcoded models and URLs from Ollama integration - Replace hardcoded model lists with dynamic pattern-based detection - Add configurable constants for model patterns and context windows - Remove hardcoded localhost:11434 URLs, use DEFAULT_OLLAMA_URL constant - Update multi_dimensional_embedding_service.py to use heuristic model detection - Clean up unused logo SVG files from previous implementation - Fix HNSW index creation error for 3072 dimensions in migration scripts 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix model selection boxes for non-Ollama providers - Restore Chat Model and Embedding Model input boxes for OpenAI, Google, Anthropic, Grok, and OpenRouter providers - Keep model selection boxes hidden for Ollama provider which uses modal-based selection - Remove debug credential reload button from RAG settings 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Refactor useToast imports in Ollama components * Fix provider switching and database migration issues - Fix embedding model switching when changing LLM providers * Both LLM and embedding models now update together * Set provider-appropriate defaults (OpenAI: gpt-4o-mini + text-embedding-3-small, etc.) - Fix database migration casting errors * Replace problematic embedding::float[] casts with vector_dims() function * Apply fix to both upgrade_database.sql and complete_setup.sql - Add legacy column cleanup to migration * Remove old 'embedding' column after successful data migration * Clean up associated indexes to prevent legacy code conflicts 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix OpenAI to Ollama fallback and update tests - Fixed bug where Ollama client wasn't created after fallback from OpenAI - Updated test to reflect new fallback behavior (successful fallback instead of error) - Added new test case for when Ollama fallback fails - When OpenAI API key is missing, system now correctly falls back to Ollama 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix test_get_llm_client_missing_openai_key to properly test Ollama fallback failure - Updated test to mock openai.AsyncOpenAI creation failure to trigger expected ValueError - The test now correctly simulates Ollama fallback failure scenario - Fixed whitespace linting issue - All tests in test_async_llm_provider_service.py now pass 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix API provider status indicators for encrypted credentials - Add new /api/credentials/status-check endpoint that returns decrypted values for frontend status checking - Update frontend to use new batch status check endpoint instead of individual credential calls - Fix provider status indicators showing incorrect states for encrypted API keys - Add defensive import in document storage service to handle credential service initialization - Reduce API status polling interval from 2s to 30s to minimize server load The issue was that the backend deliberately never decrypts credentials for security, but the frontend needs actual API keys to test connectivity. Created a dedicated status checking endpoint that provides decrypted values specifically for this purpose. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve cache invalidation for LLM provider service - Add cache invalidation for LLM provider service when RAG settings are updated/deleted - Clear provider_config_llm, provider_config_embedding, and rag_strategy_settings caches - Add error handling for import and cache operations - Ensures provider configurations stay in sync with credential changes * Fix linting issues - remove whitespace from blank lines --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: sean-eskerium <sean@eskerium.com>
2025-09-15 06:38:02 -07:00 · 2025-09-15 06:38:02 -07:00 · ee3af433c8
commit ee3af433c8
parent 8e2e8aa05e
39 changed files with 10922 additions and 197 deletions
--- a/.env.example
+++ b/.env.example
@ -53,9 +53,6 @@ VITE_SHOW_DEVTOOLS=false
 # proxy where you want to expose the frontend on a single external domain.
 PROD=false
 # Embedding Configuration
 # Dimensions for embedding vectors (1536 for OpenAI text-embedding-3-small)
 EMBEDDING_DIMENSIONS=1536
 # NOTE: All other configuration has been moved to database management!
 # Run the credentials_setup.sql file in your Supabase SQL editor to set up the credentials table.
--- a/.gitignore
+++ b/.gitignore
@ -8,3 +8,4 @@ PRPs/completed/
 .zed
 tmp/
 temp/
 UAT/
--- a/archon-ui-main/public/img/Grok.png
+++ b/archon-ui-main/public/img/Grok.png
--- a/archon-ui-main/public/img/Ollama.png
+++ b/archon-ui-main/public/img/Ollama.png
--- a/archon-ui-main/public/img/OpenAI.png
+++ b/archon-ui-main/public/img/OpenAI.png
--- a/archon-ui-main/public/img/OpenRouter.png
+++ b/archon-ui-main/public/img/OpenRouter.png
--- a/archon-ui-main/public/img/anthropic-logo.svg
+++ b/archon-ui-main/public/img/anthropic-logo.svg
@ -0,0 +1,3 @@
 <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
 <path d="M12 2L21 20H15L13.5 17H10.5L9 20H3L12 2ZM12 7L9.5 12H14.5L12 7Z" fill="currentColor"/>
 </svg>
--- a/archon-ui-main/public/img/google-logo.svg
+++ b/archon-ui-main/public/img/google-logo.svg
@ -0,0 +1,6 @@
 <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
 <path d="M22.56 12.25c0-.78-.07-1.53-.2-2.25H12v4.26h5.92c-.26 1.37-1.04 2.53-2.21 3.31v2.77h3.57c2.08-1.92 3.28-4.74 3.28-8.09z" fill="#4285F4"/>
 <path d="M12 23c2.97 0 5.46-.98 7.28-2.66l-3.57-2.77c-.98.66-2.23 1.06-3.71 1.06-2.86 0-5.29-1.93-6.16-4.53H2.18v2.84C3.99 20.53 7.7 23 12 23z" fill="#34A853"/>
 <path d="M5.84 14.09c-.22-.66-.35-1.36-.35-2.09s.13-1.43.35-2.09V7.07H2.18C1.43 8.55 1 10.22 1 12s.43 3.45 1.18 4.93l2.85-2.22.81-.62z" fill="#FBBC05"/>
 <path d="M12 5.38c1.62 0 3.06.56 4.21 1.64l3.15-3.15C17.45 2.09 14.97 1 12 1 7.7 1 3.99 3.47 2.18 7.07l3.66 2.84c.87-2.6 3.3-4.53 6.16-4.53z" fill="#EA4335"/>
 </svg>
--- a/archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx
+++ b/archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx
@ -0,0 +1,877 @@
 import React, { useState, useEffect, useCallback, useRef } from 'react';
 import { Card } from '../ui/Card';
 import { Button } from '../ui/Button';
 import { Input } from '../ui/Input';
 import { Badge } from '../ui/Badge';
 import { useToast } from '../../features/ui/hooks/useToast';
 import { cn } from '../../lib/utils';
 import { credentialsService, OllamaInstance } from '../../services/credentialsService';
 import { OllamaModelDiscoveryModal } from './OllamaModelDiscoveryModal';
 import type { OllamaInstance as OllamaInstanceType } from './types/OllamaTypes';
 interface OllamaConfigurationPanelProps {
  isVisible: boolean;
  onConfigChange: (instances: OllamaInstance[]) => void;
  className?: string;
  separateHosts?: boolean; // Enable separate LLM Chat and Embedding host configuration
 }
 interface ConnectionTestResult {
  isHealthy: boolean;
  responseTimeMs?: number;
  modelsAvailable?: number;
  error?: string;
 }
 const OllamaConfigurationPanel: React.FC<OllamaConfigurationPanelProps> = ({
  isVisible,
  onConfigChange,
  className = '',
  separateHosts = false
 }) => {
  const [instances, setInstances] = useState<OllamaInstance[]>([]);
  const [loading, setLoading] = useState(true);
  const [testingConnections, setTestingConnections] = useState<Set<string>>(new Set());
  const [newInstanceUrl, setNewInstanceUrl] = useState('');
  const [newInstanceName, setNewInstanceName] = useState('');
  const [newInstanceType, setNewInstanceType] = useState<'chat' | 'embedding'>('chat');
  const [showAddInstance, setShowAddInstance] = useState(false);
  const [discoveringModels, setDiscoveringModels] = useState(false);
  const [modelDiscoveryResults, setModelDiscoveryResults] = useState<any>(null);
  const [showModelDiscoveryModal, setShowModelDiscoveryModal] = useState(false);
  const [selectedChatModel, setSelectedChatModel] = useState<string | null>(null);
  const [selectedEmbeddingModel, setSelectedEmbeddingModel] = useState<string | null>(null);
  // Track temporary URL values for each instance to prevent aggressive updates
  const [tempUrls, setTempUrls] = useState<Record<string, string>>({});
  const updateTimeouts = useRef<Record<string, NodeJS.Timeout>>({});
  const { showToast } = useToast();
  // Load instances from database
  const loadInstances = async () => {
    try {
      setLoading(true);
      // First try to migrate from localStorage if needed
      const migrationResult = await credentialsService.migrateOllamaFromLocalStorage();
      if (migrationResult.migrated) {
        showToast(`Migrated ${migrationResult.instanceCount} Ollama instances to database`, 'success');
      }
      // Load instances from database
      const databaseInstances = await credentialsService.getOllamaInstances();
      setInstances(databaseInstances);
      onConfigChange(databaseInstances);
    } catch (error) {
      console.error('Failed to load Ollama instances from database:', error);
      showToast('Failed to load Ollama configuration from database', 'error');
      // Fallback to localStorage
      try {
        const saved = localStorage.getItem('ollama-instances');
        if (saved) {
          const localInstances = JSON.parse(saved);
          setInstances(localInstances);
          onConfigChange(localInstances);
          showToast('Loaded Ollama configuration from local backup', 'warning');
        }
      } catch (localError) {
        console.error('Failed to load from localStorage as fallback:', localError);
      }
    } finally {
      setLoading(false);
    }
  };
  // Save instances to database
  const saveInstances = async (newInstances: OllamaInstance[]) => {
    try {
      setLoading(true);
      await credentialsService.setOllamaInstances(newInstances);
      setInstances(newInstances);
      onConfigChange(newInstances);
      // Also backup to localStorage for fallback
      try {
        localStorage.setItem('ollama-instances', JSON.stringify(newInstances));
      } catch (localError) {
        console.warn('Failed to backup to localStorage:', localError);
      }
    } catch (error) {
      console.error('Failed to save Ollama instances to database:', error);
      showToast('Failed to save Ollama configuration to database', 'error');
    } finally {
      setLoading(false);
    }
  };
  // Test connection to an Ollama instance with retry logic
  const testConnection = async (baseUrl: string, retryCount = 3): Promise<ConnectionTestResult> => {
    const maxRetries = retryCount;
    let lastError: Error | null = null;
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        const response = await fetch('/api/providers/validate', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            provider: 'ollama',
            base_url: baseUrl
          })
        });
        if (!response.ok) {
          throw new Error(`HTTP ${response.status}: ${response.statusText}`);
        }
        const data = await response.json();
        const result = {
          isHealthy: data.health_status?.is_available || false,
          responseTimeMs: data.health_status?.response_time_ms,
          modelsAvailable: data.health_status?.models_available,
          error: data.health_status?.error_message
        };
        // If successful, return immediately
        if (result.isHealthy) {
          return result;
        }
        // If not healthy but we got a valid response, still return (but might retry)
        lastError = new Error(result.error || 'Instance not available');
      } catch (error) {
        lastError = error instanceof Error ? error : new Error('Unknown error');
      }
      // If this wasn't the last attempt, wait before retrying
      if (attempt < maxRetries) {
        const delayMs = Math.pow(2, attempt - 1) * 1000; // Exponential backoff: 1s, 2s, 4s
        await new Promise(resolve => setTimeout(resolve, delayMs));
      }
    }
    // All retries failed, return error result
    return {
      isHealthy: false,
      error: lastError?.message || 'Connection failed after retries'
    };
  };
  // Handle connection test for a specific instance
  const handleTestConnection = async (instanceId: string) => {
    const instance = instances.find(inst => inst.id === instanceId);
    if (!instance) return;
    setTestingConnections(prev => new Set(prev).add(instanceId));
    try {
      const result = await testConnection(instance.baseUrl);
      // Update instance with test results
      const updatedInstances = instances.map(inst => 
        inst.id === instanceId 
          ? {
              ...inst,
              isHealthy: result.isHealthy,
              responseTimeMs: result.responseTimeMs,
              modelsAvailable: result.modelsAvailable,
              lastHealthCheck: new Date().toISOString()
            }
          : inst
      );
      saveInstances(updatedInstances);
      if (result.isHealthy) {
        showToast(`Connected to ${instance.name} (${result.responseTimeMs?.toFixed(0)}ms, ${result.modelsAvailable} models)`, 'success');
      } else {
        showToast(result.error || 'Unable to connect to Ollama instance', 'error');
      }
    } catch (error) {
      showToast(`Connection test failed: ${error instanceof Error ? error.message : 'Unknown error'}`, 'error');
    } finally {
      setTestingConnections(prev => {
        const newSet = new Set(prev);
        newSet.delete(instanceId);
        return newSet;
      });
    }
  };
  // Add new instance
  const handleAddInstance = async () => {
    if (!newInstanceUrl.trim() || !newInstanceName.trim()) {
      showToast('Please provide both URL and name for the new instance', 'error');
      return;
    }
    // Validate URL format
    try {
      const url = new URL(newInstanceUrl);
      if (!url.protocol.startsWith('http')) {
        throw new Error('URL must use HTTP or HTTPS protocol');
      }
    } catch (error) {
      showToast('Please provide a valid HTTP/HTTPS URL', 'error');
      return;
    }
    // Check for duplicate URLs
    const isDuplicate = instances.some(inst => inst.baseUrl === newInstanceUrl.trim());
    if (isDuplicate) {
      showToast('An instance with this URL already exists', 'error');
      return;
    }
    const newInstance: OllamaInstance = {
      id: `instance-${Date.now()}`,
      name: newInstanceName.trim(),
      baseUrl: newInstanceUrl.trim(),
      isEnabled: true,
      isPrimary: false,
      loadBalancingWeight: 100,
      instanceType: separateHosts ? newInstanceType : 'both'
    };
    try {
      setLoading(true);
      await credentialsService.addOllamaInstance(newInstance);
      // Reload instances from database to get updated list
      await loadInstances();
      setNewInstanceUrl('');
      setNewInstanceName('');
      setNewInstanceType('chat');
      setShowAddInstance(false);
      showToast(`Added new Ollama instance: ${newInstance.name}`, 'success');
    } catch (error) {
      console.error('Failed to add Ollama instance:', error);
      showToast(`Failed to add Ollama instance: ${error instanceof Error ? error.message : 'Unknown error'}`, 'error');
    } finally {
      setLoading(false);
    }
  };
  // Remove instance
  const handleRemoveInstance = async (instanceId: string) => {
    const instance = instances.find(inst => inst.id === instanceId);
    if (!instance) return;
    // Don't allow removing the last instance
    if (instances.length <= 1) {
      showToast('At least one Ollama instance must be configured', 'error');
      return;
    }
    try {
      setLoading(true);
      await credentialsService.removeOllamaInstance(instanceId);
      // Reload instances from database to get updated list
      await loadInstances();
      showToast(`Removed Ollama instance: ${instance.name}`, 'success');
    } catch (error) {
      console.error('Failed to remove Ollama instance:', error);
      showToast(`Failed to remove Ollama instance: ${error instanceof Error ? error.message : 'Unknown error'}`, 'error');
    } finally {
      setLoading(false);
    }
  };
  // Debounced URL update - only update after user stops typing for 1 second
  const debouncedUpdateInstanceUrl = useCallback(async (instanceId: string, newUrl: string) => {
    try {
      // Clear any existing timeout for this instance
      if (updateTimeouts.current[instanceId]) {
        clearTimeout(updateTimeouts.current[instanceId]);
      }
      // Set new timeout
      updateTimeouts.current[instanceId] = setTimeout(async () => {
        try {
          await credentialsService.updateOllamaInstance(instanceId, { 
            baseUrl: newUrl, 
            isHealthy: undefined, 
            lastHealthCheck: undefined 
          });
          await loadInstances(); // Reload to get updated data
          // Clear the temporary URL after successful update
          setTempUrls(prev => {
            const updated = { ...prev };
            delete updated[instanceId];
            return updated;
          });
          // Connection test removed - only manual testing via "Test" button per user request
        } catch (error) {
          console.error('Failed to update Ollama instance URL:', error);
          showToast('Failed to update instance URL', 'error');
        }
      }, 1000); // 1 second debounce
    } catch (error) {
      console.error('Failed to set up URL update timeout:', error);
    }
  }, [showToast]);
  // Handle immediate URL change (for UI responsiveness) without triggering API calls
  const handleUrlChange = (instanceId: string, newUrl: string) => {
    // Update temporary URL state for immediate UI feedback
    setTempUrls(prev => ({ ...prev, [instanceId]: newUrl }));
    // Trigger debounced update
    debouncedUpdateInstanceUrl(instanceId, newUrl);
  };
  // Handle URL blur - immediately save if there are pending changes
  const handleUrlBlur = async (instanceId: string) => {
    const tempUrl = tempUrls[instanceId];
    const instance = instances.find(inst => inst.id === instanceId);
    if (tempUrl && instance && tempUrl !== instance.baseUrl) {
      // Clear the timeout since we're updating immediately
      if (updateTimeouts.current[instanceId]) {
        clearTimeout(updateTimeouts.current[instanceId]);
        delete updateTimeouts.current[instanceId];
      }
      try {
        await credentialsService.updateOllamaInstance(instanceId, { 
          baseUrl: tempUrl, 
          isHealthy: undefined, 
          lastHealthCheck: undefined 
        });
        await loadInstances();
        // Clear the temporary URL after successful update
        setTempUrls(prev => {
          const updated = { ...prev };
          delete updated[instanceId];
          return updated;
        });
        // Connection test removed - only manual testing via "Test" button per user request
      } catch (error) {
        console.error('Failed to update Ollama instance URL:', error);
        showToast('Failed to update instance URL', 'error');
      }
    }
  };
  // Toggle instance enabled state
  const handleToggleInstance = async (instanceId: string) => {
    const instance = instances.find(inst => inst.id === instanceId);
    if (!instance) return;
    try {
      await credentialsService.updateOllamaInstance(instanceId, { 
        isEnabled: !instance.isEnabled 
      });
      await loadInstances(); // Reload to get updated data
    } catch (error) {
      console.error('Failed to toggle Ollama instance:', error);
      showToast('Failed to toggle instance state', 'error');
    }
  };
  // Set instance as primary
  const handleSetPrimary = async (instanceId: string) => {
    try {
      // Update all instances - only the specified one should be primary
      await saveInstances(instances.map(inst => ({
        ...inst,
        isPrimary: inst.id === instanceId
      })));
    } catch (error) {
      console.error('Failed to set primary Ollama instance:', error);
      showToast('Failed to set primary instance', 'error');
    }
  };
  // Open model discovery modal
  const handleDiscoverModels = () => {
    if (instances.length === 0) {
      showToast('No Ollama instances configured', 'error');
      return;
    }
    const enabledInstances = instances.filter(inst => inst.isEnabled);
    if (enabledInstances.length === 0) {
      showToast('No enabled Ollama instances found', 'error');
      return;
    }
    setShowModelDiscoveryModal(true);
  };
  // Handle model selection from discovery modal
  const handleModelSelection = async (models: { chatModel?: string; embeddingModel?: string }) => {
    try {
      setSelectedChatModel(models.chatModel || null);
      setSelectedEmbeddingModel(models.embeddingModel || null);
      // Store model preferences in localStorage for persistence
      const modelPreferences = {
        chatModel: models.chatModel,
        embeddingModel: models.embeddingModel,
        updatedAt: new Date().toISOString()
      };
      localStorage.setItem('ollama-selected-models', JSON.stringify(modelPreferences));
      let successMessage = 'Model selection updated';
      if (models.chatModel && models.embeddingModel) {
        successMessage = `Selected models: ${models.chatModel} (chat), ${models.embeddingModel} (embedding)`;
      } else if (models.chatModel) {
        successMessage = `Selected chat model: ${models.chatModel}`;
      } else if (models.embeddingModel) {
        successMessage = `Selected embedding model: ${models.embeddingModel}`;
      }
      showToast(successMessage, 'success');
      setShowModelDiscoveryModal(false);
    } catch (error) {
      console.error('Failed to save model selection:', error);
      showToast('Failed to save model selection', 'error');
    }
  };
  // Load instances from database on mount
  useEffect(() => {
    loadInstances();
  }, []); // Empty dependency array - load only on mount
  // Load saved model preferences on mount
  useEffect(() => {
    try {
      const savedPreferences = localStorage.getItem('ollama-selected-models');
      if (savedPreferences) {
        const preferences = JSON.parse(savedPreferences);
        setSelectedChatModel(preferences.chatModel || null);
        setSelectedEmbeddingModel(preferences.embeddingModel || null);
      }
    } catch (error) {
      console.warn('Failed to load saved model preferences:', error);
    }
  }, []);
  // Notify parent of configuration changes
  useEffect(() => {
    onConfigChange(instances);
  }, [instances, onConfigChange]);
  // Note: Auto-testing completely removed to prevent API calls on every keystroke
  // Connection testing now ONLY happens on manual "Test Connection" button clicks
  // No automatic testing on URL changes, saves, or blur events per user request
  // Cleanup timeouts on unmount
  useEffect(() => {
    return () => {
      // Clear all pending timeouts
      Object.values(updateTimeouts.current).forEach(timeout => {
        if (timeout) clearTimeout(timeout);
      });
      updateTimeouts.current = {};
    };
  }, []);
  if (!isVisible) return null;
  const getConnectionStatusBadge = (instance: OllamaInstance) => {
    if (testingConnections.has(instance.id)) {
      return <Badge variant="outline" color="gray" className="animate-pulse">Testing...</Badge>;
    }
    if (instance.isHealthy === true) {
      return (
        <Badge variant="solid" color="green" className="flex items-center gap-1">
          <div className="w-2 h-2 rounded-full bg-green-500 animate-pulse" />
          Online
          {instance.responseTimeMs && (
            <span className="text-xs opacity-75">
              ({instance.responseTimeMs.toFixed(0)}ms)
            </span>
          )}
        </Badge>
      );
    }
    if (instance.isHealthy === false) {
      return (
        <Badge variant="solid" color="pink" className="flex items-center gap-1">
          <div className="w-2 h-2 rounded-full bg-red-500" />
          Offline
        </Badge>
      );
    }
    // For instances that haven't been tested yet (isHealthy === undefined)
    // Show a "checking" status until manually tested via "Test" button
    return (
      <Badge variant="outline" color="blue" className="animate-pulse">
        <div className="w-2 h-2 rounded-full bg-blue-500 animate-ping mr-1" />
        Checking...
      </Badge>
    );
  };
  return (
    <Card 
      accentColor="green" 
      className={cn("mt-4 space-y-4", className)}
    >
      <div className="flex items-center justify-between">
        <div>
          <h3 className="text-lg font-semibold text-gray-900 dark:text-white">
            Ollama Configuration
          </h3>
          <p className="text-sm text-gray-600 dark:text-gray-400">
            Configure Ollama instances for distributed processing
          </p>
        </div>
        <div className="flex items-center gap-2">
          <Button
            variant="outline"
            size="sm"
            onClick={handleDiscoverModels}
            disabled={instances.filter(inst => inst.isEnabled).length === 0}
            className="text-xs"
          >
            {selectedChatModel || selectedEmbeddingModel ? 'Change Models' : 'Select Models'}
          </Button>
          <Badge variant="outline" color="gray" className="text-xs">
            {instances.filter(inst => inst.isEnabled).length} Active
          </Badge>
          {(selectedChatModel || selectedEmbeddingModel) && (
            <div className="flex gap-1">
              {selectedChatModel && (
                <Badge variant="solid" color="blue" className="text-xs">
                  Chat: {selectedChatModel.split(':')[0]}
                </Badge>
              )}
              {selectedEmbeddingModel && (
                <Badge variant="solid" color="purple" className="text-xs">
                  Embed: {selectedEmbeddingModel.split(':')[0]}
                </Badge>
              )}
            </div>
          )}
        </div>
      </div>
      {/* Instance List */}
      <div className="space-y-3">
        {instances.map((instance) => (
          <Card key={instance.id} className="p-4 bg-gray-50 dark:bg-gray-800/50">
            <div className="flex items-start justify-between">
              <div className="flex-1 space-y-2">
                <div className="flex items-center gap-2">
                  <span className="font-medium text-gray-900 dark:text-white">
                    {instance.name}
                  </span>
                  {instance.isPrimary && (
                    <Badge variant="outline" color="gray" className="text-xs">Primary</Badge>
                  )}
                  {instance.instanceType && instance.instanceType !== 'both' && (
                    <Badge 
                      variant="solid" 
                      color={instance.instanceType === 'chat' ? 'blue' : 'purple'}
                      className="text-xs"
                    >
                      {instance.instanceType === 'chat' ? 'Chat' : 'Embedding'}
                    </Badge>
                  )}
                  {(!instance.instanceType || instance.instanceType === 'both') && separateHosts && (
                    <Badge variant="outline" color="gray" className="text-xs">
                      Both
                    </Badge>
                  )}
                  {getConnectionStatusBadge(instance)}
                </div>
                <div className="relative">
                  <Input
                    type="url"
                    value={tempUrls[instance.id] !== undefined ? tempUrls[instance.id] : instance.baseUrl}
                    onChange={(e) => handleUrlChange(instance.id, e.target.value)}
                    onBlur={() => handleUrlBlur(instance.id)}
                    placeholder="http://localhost:11434"
                    className={cn(
                      "text-sm",
                      tempUrls[instance.id] !== undefined && tempUrls[instance.id] !== instance.baseUrl 
                        ? "border-yellow-300 dark:border-yellow-700 bg-yellow-50 dark:bg-yellow-900/20" 
                        : ""
                    )}
                  />
                  {tempUrls[instance.id] !== undefined && tempUrls[instance.id] !== instance.baseUrl && (
                    <div className="absolute right-2 top-1/2 -translate-y-1/2">
                      <div className="w-2 h-2 rounded-full bg-yellow-400 animate-pulse" title="Changes will be saved after you stop typing" />
                    </div>
                  )}
                </div>
                {instance.modelsAvailable !== undefined && (
                  <div className="text-xs text-gray-600 dark:text-gray-400">
                    {instance.modelsAvailable} models available
                  </div>
                )}
              </div>
              <div className="flex items-center gap-2 ml-4">
                <Button
                  variant="outline"
                  size="sm"
                  onClick={() => handleTestConnection(instance.id)}
                  disabled={testingConnections.has(instance.id)}
                  className="text-xs"
                >
                  {testingConnections.has(instance.id) ? 'Testing...' : 'Test'}
                </Button>
                {!instance.isPrimary && (
                  <Button
                    variant="outline"
                    size="sm"
                    onClick={() => handleSetPrimary(instance.id)}
                    className="text-xs"
                  >
                    Set Primary
                  </Button>
                )}
                <Button
                  variant="ghost"
                  size="sm"
                  onClick={() => handleToggleInstance(instance.id)}
                  className={cn(
                    "text-xs",
                    instance.isEnabled 
                      ? "text-green-600 hover:text-green-700" 
                      : "text-gray-500 hover:text-gray-600"
                  )}
                >
                  {instance.isEnabled ? 'Enabled' : 'Disabled'}
                </Button>
                {instances.length > 1 && (
                  <Button
                    variant="ghost"
                    size="sm"
                    onClick={() => handleRemoveInstance(instance.id)}
                    className="text-xs text-red-600 hover:text-red-700"
                  >
                    Remove
                  </Button>
                )}
              </div>
            </div>
          </Card>
        ))}
      </div>
      {/* Add Instance Section */}
      {showAddInstance ? (
        <Card className="p-4 bg-blue-50 dark:bg-blue-900/20 border-blue-200 dark:border-blue-800">
          <div className="space-y-3">
            <h4 className="font-medium text-blue-900 dark:text-blue-100">
              Add New Ollama Instance
            </h4>
            <div className="grid grid-cols-1 md:grid-cols-2 gap-3">
              <Input
                type="text"
                placeholder="Instance Name"
                value={newInstanceName}
                onChange={(e) => setNewInstanceName(e.target.value)}
              />
              <Input
                type="url"
                placeholder="http://localhost:11434"
                value={newInstanceUrl}
                onChange={(e) => setNewInstanceUrl(e.target.value)}
              />
            </div>
            {separateHosts && (
              <div className="space-y-2">
                <label className="text-sm font-medium text-blue-900 dark:text-blue-100">
                  Instance Type
                </label>
                <div className="flex gap-2">
                  <Button
                    variant={newInstanceType === 'chat' ? 'solid' : 'outline'}
                    size="sm"
                    onClick={() => setNewInstanceType('chat')}
                    className={cn(
                      newInstanceType === 'chat' 
                        ? 'bg-blue-600 text-white' 
                        : 'text-blue-600 border-blue-600'
                    )}
                  >
                    LLM Chat
                  </Button>
                  <Button
                    variant={newInstanceType === 'embedding' ? 'solid' : 'outline'}
                    size="sm"
                    onClick={() => setNewInstanceType('embedding')}
                    className={cn(
                      newInstanceType === 'embedding' 
                        ? 'bg-blue-600 text-white' 
                        : 'text-blue-600 border-blue-600'
                    )}
                  >
                    Embedding
                  </Button>
                </div>
              </div>
            )}
            <div className="flex gap-2">
              <Button
                size="sm"
                onClick={handleAddInstance}
                className="bg-blue-600 hover:bg-blue-700"
              >
                Add Instance
              </Button>
              <Button
                variant="outline"
                size="sm"
                onClick={() => {
                  setShowAddInstance(false);
                  setNewInstanceUrl('');
                  setNewInstanceName('');
                  setNewInstanceType('chat');
                }}
              >
                Cancel
              </Button>
            </div>
          </div>
        </Card>
      ) : (
        <Button
          variant="outline"
          onClick={() => setShowAddInstance(true)}
          className="w-full border-dashed border-2 border-gray-300 dark:border-gray-600 hover:border-gray-400 dark:hover:border-gray-500"
        >
          <span className="text-gray-600 dark:text-gray-400">+ Add Ollama Instance</span>
        </Button>
      )}
      {/* Selected Models Summary for Dual-Host Mode */}
      {separateHosts && (selectedChatModel || selectedEmbeddingModel) && (
        <Card className="p-4 bg-blue-50 dark:bg-blue-900/20 border-blue-200 dark:border-blue-800">
          <h4 className="font-medium text-blue-900 dark:text-blue-100 mb-3">
            Model Assignment Summary
          </h4>
          <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
            {selectedChatModel && (
              <div className="flex items-center justify-between p-3 bg-blue-100 dark:bg-blue-800/30 rounded">
                <div>
                  <div className="font-medium text-blue-900 dark:text-blue-100">
                    Chat Model
                  </div>
                  <div className="text-sm text-blue-700 dark:text-blue-300">
                    {selectedChatModel}
                  </div>
                </div>
                <Badge variant="solid" color="blue">
                  {instances.filter(inst => inst.instanceType === 'chat' || inst.instanceType === 'both').length} hosts
                </Badge>
              </div>
            )}
            {selectedEmbeddingModel && (
              <div className="flex items-center justify-between p-3 bg-purple-100 dark:bg-purple-800/30 rounded">
                <div>
                  <div className="font-medium text-purple-900 dark:text-purple-100">
                    Embedding Model
                  </div>
                  <div className="text-sm text-purple-700 dark:text-purple-300">
                    {selectedEmbeddingModel}
                  </div>
                </div>
                <Badge variant="solid" color="purple">
                  {instances.filter(inst => inst.instanceType === 'embedding' || inst.instanceType === 'both').length} hosts
                </Badge>
              </div>
            )}
          </div>
          {(!selectedChatModel || !selectedEmbeddingModel) && (
            <div className="mt-3 text-xs text-blue-700 dark:text-blue-300 bg-blue-100 dark:bg-blue-900/30 p-2 rounded">
              <strong>Tip:</strong> {!selectedChatModel && !selectedEmbeddingModel ? 'Select both chat and embedding models for optimal performance' : !selectedChatModel ? 'Consider selecting a chat model for LLM operations' : 'Consider selecting an embedding model for vector operations'}
            </div>
          )}
        </Card>
      )}
      {/* Configuration Summary */}
      <div className="pt-4 border-t border-gray-200 dark:border-gray-700">
        <div className="text-xs text-gray-600 dark:text-gray-400 space-y-1">
          <div className="flex justify-between">
            <span>Total Instances:</span>
            <span className="font-mono">{instances.length}</span>
          </div>
          <div className="flex justify-between">
            <span>Active Instances:</span>
            <span className="font-mono text-green-600 dark:text-green-400">
              {instances.filter(inst => inst.isEnabled && inst.isHealthy).length}
            </span>
          </div>
          <div className="flex justify-between">
            <span>Load Balancing:</span>
            <span className="font-mono">
              {instances.filter(inst => inst.isEnabled).length > 1 ? 'Enabled' : 'Disabled'}
            </span>
          </div>
          {(selectedChatModel || selectedEmbeddingModel) && (
            <div className="flex justify-between">
              <span>Selected Models:</span>
              <span className="font-mono text-green-600 dark:text-green-400">
                {[selectedChatModel, selectedEmbeddingModel].filter(Boolean).length}
              </span>
            </div>
          )}
          {separateHosts && (
            <div className="flex justify-between">
              <span>Dual-Host Mode:</span>
              <span className="font-mono text-blue-600 dark:text-blue-400">
                Enabled
              </span>
            </div>
          )}
        </div>
      </div>
      {/* Model Discovery Modal */}
      <OllamaModelDiscoveryModal
        isOpen={showModelDiscoveryModal}
        onClose={() => setShowModelDiscoveryModal(false)}
        onSelectModels={handleModelSelection}
        instances={instances.filter(inst => inst.isEnabled).map(inst => ({
          id: inst.id,
          name: inst.name,
          baseUrl: inst.baseUrl,
          instanceType: inst.instanceType || 'both',
          isEnabled: inst.isEnabled,
          isPrimary: inst.isPrimary,
          healthStatus: {
            isHealthy: inst.isHealthy || false,
            lastChecked: inst.lastHealthCheck ? new Date(inst.lastHealthCheck) : new Date(),
            responseTimeMs: inst.responseTimeMs,
            error: inst.isHealthy === false ? 'Connection failed' : undefined
          },
          loadBalancingWeight: inst.loadBalancingWeight,
          lastHealthCheck: inst.lastHealthCheck,
          modelsAvailable: inst.modelsAvailable,
          responseTimeMs: inst.responseTimeMs
        }))}
      />
    </Card>
  );
 };
 export default OllamaConfigurationPanel;
--- a/archon-ui-main/src/components/settings/OllamaInstanceHealthIndicator.tsx
+++ b/archon-ui-main/src/components/settings/OllamaInstanceHealthIndicator.tsx
@ -0,0 +1,288 @@
 import React, { useState } from 'react';
 import { Badge } from '../ui/Badge';
 import { Button } from '../ui/Button';
 import { Card } from '../ui/Card';
 import { cn } from '../../lib/utils';
 import { useToast } from '../../features/ui/hooks/useToast';
 import { ollamaService } from '../../services/ollamaService';
 import type { HealthIndicatorProps } from './types/OllamaTypes';
 /**
 * Health indicator component for individual Ollama instances
 * 
 * Displays real-time health status with refresh capabilities
 * and detailed error information when instances are unhealthy.
 */
 export const OllamaInstanceHealthIndicator: React.FC<HealthIndicatorProps> = ({
  instance,
  onRefresh,
  showDetails = true
 }) => {
  const [isRefreshing, setIsRefreshing] = useState(false);
  const { showToast } = useToast();
  const handleRefresh = async () => {
    if (isRefreshing) return;
    setIsRefreshing(true);
    try {
      // Use the ollamaService to test the connection
      const healthResult = await ollamaService.testConnection(instance.baseUrl);
      // Notify parent component of the refresh result
      onRefresh(instance.id);
      if (healthResult.isHealthy) {
        showToast(
          `Health check successful for ${instance.name} (${healthResult.responseTime?.toFixed(0)}ms)`,
          'success'
        );
      } else {
        showToast(
          `Health check failed for ${instance.name}: ${healthResult.error}`,
          'error'
        );
      }
    } catch (error) {
      console.error('Health check failed:', error);
      showToast(
        `Failed to check health for ${instance.name}: ${error instanceof Error ? error.message : 'Unknown error'}`,
        'error'
      );
    } finally {
      setIsRefreshing(false);
    }
  };
  const getHealthStatusBadge = () => {
    if (isRefreshing) {
      return (
        <Badge variant="outline" className="animate-pulse">
          <div className="w-2 h-2 rounded-full bg-gray-500 animate-ping mr-1" />
          Checking...
        </Badge>
      );
    }
    if (instance.healthStatus.isHealthy === true) {
      return (
        <Badge 
          variant="solid" 
          className="flex items-center gap-1 bg-green-100 text-green-800 border-green-200 dark:bg-green-900 dark:text-green-100 dark:border-green-700"
        >
          <div className="w-2 h-2 rounded-full bg-green-500 animate-pulse" />
          Online
        </Badge>
      );
    }
    if (instance.healthStatus.isHealthy === false) {
      return (
        <Badge 
          variant="solid" 
          className="flex items-center gap-1 bg-red-100 text-red-800 border-red-200 dark:bg-red-900 dark:text-red-100 dark:border-red-700"
        >
          <div className="w-2 h-2 rounded-full bg-red-500" />
          Offline
        </Badge>
      );
    }
    // For instances that haven't been tested yet (isHealthy === undefined)
    return (
      <Badge 
        variant="outline" 
        className="animate-pulse flex items-center gap-1 bg-blue-50 text-blue-800 border-blue-200 dark:bg-blue-900 dark:text-blue-100 dark:border-blue-700"
      >
        <div className="w-2 h-2 rounded-full bg-blue-500 animate-ping" />
        Checking...
      </Badge>
    );
  };
  const getInstanceTypeIcon = () => {
    switch (instance.instanceType) {
      case 'chat':
        return '💬';
      case 'embedding':
        return '🔢';
      case 'both':
        return '🔄';
      default:
        return '🤖';
    }
  };
  const formatLastChecked = (date: Date) => {
    const now = new Date();
    const diffMs = now.getTime() - date.getTime();
    const diffMins = Math.floor(diffMs / (1000 * 60));
    const diffHours = Math.floor(diffMs / (1000 * 60 * 60));
    const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24));
    if (diffMins < 1) return 'Just now';
    if (diffMins < 60) return `${diffMins}m ago`;
    if (diffHours < 24) return `${diffHours}h ago`;
    return `${diffDays}d ago`;
  };
  if (!showDetails) {
    // Compact mode - just the status badge and refresh button
    return (
      <div className="flex items-center gap-2">
        {getHealthStatusBadge()}
        <Button
          variant="ghost"
          size="sm"
          onClick={handleRefresh}
          disabled={isRefreshing}
          className="p-1 h-6 w-6"
          title={`Refresh health status for ${instance.name}`}
        >
          <svg
            className={cn("w-3 h-3", isRefreshing && "animate-spin")}
            fill="none"
            stroke="currentColor"
            viewBox="0 0 24 24"
          >
            <path
              strokeLinecap="round"
              strokeLinejoin="round"
              strokeWidth={2}
              d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15"
            />
          </svg>
        </Button>
      </div>
    );
  }
  // Full detailed mode
  return (
    <Card className="p-3 bg-gray-50 dark:bg-gray-800/50">
      <div className="flex items-center justify-between mb-2">
        <div className="flex items-center gap-2">
          <span className="text-lg" title={`Instance type: ${instance.instanceType}`}>
            {getInstanceTypeIcon()}
          </span>
          <div>
            <div className="font-medium text-gray-900 dark:text-white text-sm">
              {instance.name}
            </div>
            <div className="text-xs text-gray-500 dark:text-gray-400 font-mono">
              {new URL(instance.baseUrl).host}
            </div>
          </div>
        </div>
        <div className="flex items-center gap-2">
          {getHealthStatusBadge()}
          <Button
            variant="ghost"
            size="sm"
            onClick={handleRefresh}
            disabled={isRefreshing}
            className="p-1"
            title={`Refresh health status for ${instance.name}`}
          >
            <svg
              className={cn("w-4 h-4", isRefreshing && "animate-spin")}
              fill="none"
              stroke="currentColor"
              viewBox="0 0 24 24"
            >
              <path
                strokeLinecap="round"
                strokeLinejoin="round"
                strokeWidth={2}
                d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15"
              />
            </svg>
          </Button>
        </div>
      </div>
      {/* Health Details */}
      <div className="space-y-2">
        {instance.healthStatus.isHealthy && (
          <div className="grid grid-cols-2 gap-4 text-xs">
            {instance.healthStatus.responseTimeMs && (
              <div className="flex justify-between">
                <span className="text-gray-600 dark:text-gray-400">Response Time:</span>
                <span className={cn(
                  "font-mono",
                  instance.healthStatus.responseTimeMs < 100 
                    ? "text-green-600 dark:text-green-400"
                    : instance.healthStatus.responseTimeMs < 500
                    ? "text-yellow-600 dark:text-yellow-400"
                    : "text-red-600 dark:text-red-400"
                )}>
                  {instance.healthStatus.responseTimeMs.toFixed(0)}ms
                </span>
              </div>
            )}
            {instance.modelsAvailable !== undefined && (
              <div className="flex justify-between">
                <span className="text-gray-600 dark:text-gray-400">Models:</span>
                <span className="font-mono text-blue-600 dark:text-blue-400">
                  {instance.modelsAvailable}
                </span>
              </div>
            )}
          </div>
        )}
        {/* Error Details */}
        {!instance.healthStatus.isHealthy && instance.healthStatus.error && (
          <div className="p-2 bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded text-xs">
            <div className="font-medium text-red-800 dark:text-red-200 mb-1">
              Connection Error:
            </div>
            <div className="text-red-600 dark:text-red-300 font-mono">
              {instance.healthStatus.error}
            </div>
          </div>
        )}
        {/* Instance Configuration */}
        <div className="flex items-center justify-between text-xs">
          <div className="flex items-center gap-2">
            {instance.isPrimary && (
              <Badge variant="outline" className="text-xs">
                Primary
              </Badge>
            )}
            {instance.instanceType !== 'both' && (
              <Badge 
                variant="solid" 
                className={cn(
                  "text-xs",
                  instance.instanceType === 'chat'
                    ? "bg-blue-100 text-blue-800 border-blue-200 dark:bg-blue-900 dark:text-blue-100"
                    : "bg-purple-100 text-purple-800 border-purple-200 dark:bg-purple-900 dark:text-purple-100"
                )}
              >
                {instance.instanceType}
              </Badge>
            )}
          </div>
          <div className="text-gray-500 dark:text-gray-400">
            Last checked: {formatLastChecked(instance.healthStatus.lastChecked)}
          </div>
        </div>
        {/* Load Balancing Weight */}
        {instance.loadBalancingWeight !== undefined && instance.loadBalancingWeight !== 100 && (
          <div className="text-xs text-gray-600 dark:text-gray-400">
            Load balancing weight: {instance.loadBalancingWeight}%
          </div>
        )}
      </div>
    </Card>
  );
 };
 export default OllamaInstanceHealthIndicator;
--- a/archon-ui-main/src/components/settings/OllamaModelDiscoveryModal.tsx
+++ b/archon-ui-main/src/components/settings/OllamaModelDiscoveryModal.tsx
@ -0,0 +1,893 @@
 import React, { useState, useEffect, useMemo, useCallback } from 'react';
 // FORCE DEBUG - This should ALWAYS appear in console when this file loads
 console.log('🚨 DEBUG: OllamaModelDiscoveryModal.tsx file loaded at', new Date().toISOString());
 import { 
  X, Search, Activity, Database, Zap, Clock, Server, 
  Loader, CheckCircle, AlertCircle, Filter, Download,
  MessageCircle, Layers, Cpu, HardDrive
 } from 'lucide-react';
 import { motion, AnimatePresence } from 'framer-motion';
 import { createPortal } from 'react-dom';
 import { Button } from '../ui/Button';
 import { Input } from '../ui/Input';
 import { Badge } from '../ui/Badge';
 import { Card } from '../ui/Card';
 import { useToast } from '../../features/ui/hooks/useToast';
 import { ollamaService, type OllamaModel, type ModelDiscoveryResponse } from '../../services/ollamaService';
 import type { OllamaInstance, ModelSelectionState } from './types/OllamaTypes';
 interface OllamaModelDiscoveryModalProps {
  isOpen: boolean;
  onClose: () => void;
  onSelectModels: (selection: { chatModel?: string; embeddingModel?: string }) => void;
  instances: OllamaInstance[];
  initialChatModel?: string;
  initialEmbeddingModel?: string;
 }
 interface EnrichedModel extends OllamaModel {
  instanceName?: string;
  status: 'available' | 'testing' | 'error';
  testResult?: {
    chatWorks: boolean;
    embeddingWorks: boolean;
    dimensions?: number;
  };
 }
 const OllamaModelDiscoveryModal: React.FC<OllamaModelDiscoveryModalProps> = ({
  isOpen,
  onClose,
  onSelectModels,
  instances,
  initialChatModel,
  initialEmbeddingModel
 }) => {
  console.log('🔴 COMPONENT DEBUG: OllamaModelDiscoveryModal component loaded/rendered', { isOpen });
  const [models, setModels] = useState<EnrichedModel[]>([]);
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [discoveryComplete, setDiscoveryComplete] = useState(false);
  const [discoveryProgress, setDiscoveryProgress] = useState<string>('');
  const [lastDiscoveryTime, setLastDiscoveryTime] = useState<number | null>(null);
  const [hasCache, setHasCache] = useState(false);
  const [selectionState, setSelectionState] = useState<ModelSelectionState>({
    selectedChatModel: initialChatModel || null,
    selectedEmbeddingModel: initialEmbeddingModel || null,
    filterText: '',
    showOnlyEmbedding: false,
    showOnlyChat: false,
    sortBy: 'name'
  });
  const [testingModels, setTestingModels] = useState<Set<string>>(new Set());
  const { showToast } = useToast();
  // Get enabled instance URLs
  const enabledInstanceUrls = useMemo(() => {
    return instances
      .filter(instance => instance.isEnabled)
      .map(instance => instance.baseUrl);
  }, [instances]);
  // Create instance lookup map
  const instanceLookup = useMemo(() => {
    const lookup: Record<string, OllamaInstance> = {};
    instances.forEach(instance => {
      lookup[instance.baseUrl] = instance;
    });
    return lookup;
  }, [instances]);
  // Generate cache key based on enabled instances
  const cacheKey = useMemo(() => {
    const sortedUrls = [...enabledInstanceUrls].sort();
    const key = `ollama-models-${sortedUrls.join('|')}`;
    console.log('🟡 CACHE KEY DEBUG: Generated cache key', {
      key,
      enabledInstanceUrls,
      sortedUrls
    });
    return key;
  }, [enabledInstanceUrls]);
  // Save models to localStorage
  const saveModelsToCache = useCallback((modelsToCache: EnrichedModel[]) => {
    try {
      console.log('🟡 CACHE DEBUG: Attempting to save models to cache', {
        cacheKey,
        modelCount: modelsToCache.length,
        instanceUrls: enabledInstanceUrls,
        timestamp: Date.now()
      });
      const cacheData = {
        models: modelsToCache,
        timestamp: Date.now(),
        instanceUrls: enabledInstanceUrls
      };
      localStorage.setItem(cacheKey, JSON.stringify(cacheData));
      setLastDiscoveryTime(Date.now());
      setHasCache(true);
      console.log('🟢 CACHE DEBUG: Successfully saved models to cache', {
        cacheKey,
        modelCount: modelsToCache.length,
        cacheSize: JSON.stringify(cacheData).length,
        storedInLocalStorage: !!localStorage.getItem(cacheKey)
      });
    } catch (error) {
      console.error('🔴 CACHE DEBUG: Failed to save models to cache:', error);
    }
  }, [cacheKey, enabledInstanceUrls]);
  // Load models from localStorage
  const loadModelsFromCache = useCallback(() => {
    console.log('🟡 CACHE DEBUG: Attempting to load models from cache', {
      cacheKey,
      enabledInstanceUrls,
      hasLocalStorageItem: !!localStorage.getItem(cacheKey)
    });
    try {
      const cached = localStorage.getItem(cacheKey);
      if (cached) {
        console.log('🟡 CACHE DEBUG: Found cached data', {
          cacheKey,
          cacheSize: cached.length
        });
        const cacheData = JSON.parse(cached);
        const cacheAge = Date.now() - cacheData.timestamp;
        const cacheAgeMinutes = Math.floor(cacheAge / (60 * 1000));
        console.log('🟡 CACHE DEBUG: Cache data parsed', {
          modelCount: cacheData.models?.length,
          timestamp: cacheData.timestamp,
          cacheAge,
          cacheAgeMinutes,
          cachedInstanceUrls: cacheData.instanceUrls,
          currentInstanceUrls: enabledInstanceUrls
        });
        // Use cache if less than 10 minutes old and same instances
        const instanceUrlsMatch = JSON.stringify(cacheData.instanceUrls?.sort()) === JSON.stringify([...enabledInstanceUrls].sort());
        const isCacheValid = cacheAge < 10 * 60 * 1000 && instanceUrlsMatch;
        console.log('🟡 CACHE DEBUG: Cache validation', {
          isCacheValid,
          cacheAge: cacheAge,
          maxAge: 10 * 60 * 1000,
          instanceUrlsMatch,
          cachedUrls: JSON.stringify(cacheData.instanceUrls?.sort()),
          currentUrls: JSON.stringify([...enabledInstanceUrls].sort())
        });
        if (isCacheValid) {
          console.log('🟢 CACHE DEBUG: Using cached models', {
            modelCount: cacheData.models.length,
            timestamp: cacheData.timestamp
          });
          setModels(cacheData.models);
          setDiscoveryComplete(true);
          setLastDiscoveryTime(cacheData.timestamp);
          setHasCache(true);
          setDiscoveryProgress(`Loaded ${cacheData.models.length} cached models`);
          return true;
        } else {
          console.log('🟠 CACHE DEBUG: Cache invalid - will refresh', {
            reason: cacheAge >= 10 * 60 * 1000 ? 'expired' : 'different instances'
          });
        }
      } else {
        console.log('🟠 CACHE DEBUG: No cached data found for key:', cacheKey);
      }
    } catch (error) {
      console.error('🔴 CACHE DEBUG: Failed to load cached models:', error);
    }
    return false;
  }, [cacheKey, enabledInstanceUrls]);
  // Test localStorage functionality (run once when component mounts)
  useEffect(() => {
    const testLocalStorage = () => {
      try {
        const testKey = 'ollama-test-key';
        const testData = { test: 'localStorage working', timestamp: Date.now() };
        console.log('🔧 LOCALSTORAGE DEBUG: Testing localStorage functionality');
        localStorage.setItem(testKey, JSON.stringify(testData));
        const retrieved = localStorage.getItem(testKey);
        const parsed = retrieved ? JSON.parse(retrieved) : null;
        console.log('🟢 LOCALSTORAGE DEBUG: localStorage test successful', {
          saved: testData,
          retrieved: parsed,
          working: !!parsed && parsed.test === testData.test
        });
        localStorage.removeItem(testKey);
      } catch (error) {
        console.error('🔴 LOCALSTORAGE DEBUG: localStorage test failed', error);
      }
    };
    testLocalStorage();
  }, []); // Run once on mount
  // Check cache when modal opens or instances change
  useEffect(() => {
    if (isOpen && enabledInstanceUrls.length > 0) {
      console.log('🟡 MODAL DEBUG: Modal opened, checking cache', {
        isOpen,
        enabledInstanceUrls,
        instanceUrlsCount: enabledInstanceUrls.length
      });
      loadModelsFromCache(); // Progress message is set inside this function
    } else {
      console.log('🟡 MODAL DEBUG: Modal state change', {
        isOpen,
        enabledInstanceUrlsCount: enabledInstanceUrls.length
      });
    }
  }, [isOpen, enabledInstanceUrls, loadModelsFromCache]);
  // Discover models when modal opens
  const discoverModels = useCallback(async (forceRefresh: boolean = false) => {
    console.log('🚨 DISCOVERY DEBUG: discoverModels FUNCTION CALLED', {
      forceRefresh,
      enabledInstanceUrls,
      instanceUrlsCount: enabledInstanceUrls.length,
      timestamp: new Date().toISOString(),
      callStack: new Error().stack?.split('\n').slice(0, 3)
    });
    console.log('🟡 DISCOVERY DEBUG: Starting model discovery', {
      forceRefresh,
      enabledInstanceUrls,
      instanceUrlsCount: enabledInstanceUrls.length,
      timestamp: new Date().toISOString()
    });
    if (enabledInstanceUrls.length === 0) {
      console.log('🔴 DISCOVERY DEBUG: No enabled instances');
      setError('No enabled Ollama instances configured');
      return;
    }
    // Check cache first if not forcing refresh
    if (!forceRefresh) {
      console.log('🟡 DISCOVERY DEBUG: Checking cache before discovery');
      const loaded = loadModelsFromCache();
      if (loaded) {
        console.log('🟢 DISCOVERY DEBUG: Used cached models, skipping API call');
        return; // Progress message already set by loadModelsFromCache
      }
      console.log('🟡 DISCOVERY DEBUG: No valid cache, proceeding with API discovery');
    } else {
      console.log('🟡 DISCOVERY DEBUG: Force refresh requested, skipping cache');
    }
    const discoveryStartTime = Date.now();
    console.log('🟡 DISCOVERY DEBUG: Starting API discovery at', new Date(discoveryStartTime).toISOString());
    setLoading(true);
    setError(null);
    setDiscoveryComplete(false);
    setDiscoveryProgress(`Discovering models from ${enabledInstanceUrls.length} instance(s)...`);
    try {
      // Discover models (no timeout - let it complete naturally)
      console.log('🚨 DISCOVERY DEBUG: About to call ollamaService.discoverModels', {
        instanceUrls: enabledInstanceUrls,
        includeCapabilities: true,
        timestamp: new Date().toISOString()
      });
      const discoveryResult = await ollamaService.discoverModels({
        instanceUrls: enabledInstanceUrls,
        includeCapabilities: true
      });
      console.log('🚨 DISCOVERY DEBUG: ollamaService.discoverModels returned', {
        totalModels: discoveryResult.total_models,
        chatModelsCount: discoveryResult.chat_models?.length,
        embeddingModelsCount: discoveryResult.embedding_models?.length,
        hostStatusCount: Object.keys(discoveryResult.host_status || {}).length,
        timestamp: new Date().toISOString()
      });
      const discoveryEndTime = Date.now();
      const discoveryDuration = discoveryEndTime - discoveryStartTime;
      console.log('🟢 DISCOVERY DEBUG: API discovery completed', {
        duration: discoveryDuration,
        durationSeconds: (discoveryDuration / 1000).toFixed(1),
        totalModels: discoveryResult.total_models,
        chatModels: discoveryResult.chat_models.length,
        embeddingModels: discoveryResult.embedding_models.length,
        hostStatus: Object.keys(discoveryResult.host_status).length,
        errors: discoveryResult.discovery_errors.length
      });
      // Enrich models with instance information and status
      const enrichedModels: EnrichedModel[] = [];
      // Process chat models
      discoveryResult.chat_models.forEach(chatModel => {
        const instance = instanceLookup[chatModel.instance_url];
        const enriched: EnrichedModel = {
          name: chatModel.name,
          tag: chatModel.name,
          size: chatModel.size,
          digest: '',
          capabilities: ['chat'],
          instance_url: chatModel.instance_url,
          instanceName: instance?.name || 'Unknown',
          status: 'available',
          parameters: chatModel.parameters
        };
        enrichedModels.push(enriched);
      });
      // Process embedding models
      discoveryResult.embedding_models.forEach(embeddingModel => {
        const instance = instanceLookup[embeddingModel.instance_url];
        // Check if we already have this model (might support both chat and embedding)
        const existingModel = enrichedModels.find(m => 
          m.name === embeddingModel.name && m.instance_url === embeddingModel.instance_url
        );
        if (existingModel) {
          // Add embedding capability
          existingModel.capabilities.push('embedding');
          existingModel.embedding_dimensions = embeddingModel.dimensions;
        } else {
          // Create new model entry
          const enriched: EnrichedModel = {
            name: embeddingModel.name,
            tag: embeddingModel.name,
            size: embeddingModel.size,
            digest: '',
            capabilities: ['embedding'],
            embedding_dimensions: embeddingModel.dimensions,
            instance_url: embeddingModel.instance_url,
            instanceName: instance?.name || 'Unknown',
            status: 'available'
          };
          enrichedModels.push(enriched);
        }
      });
      console.log('🚨 DISCOVERY DEBUG: About to call setModels', {
        enrichedModelsCount: enrichedModels.length,
        enrichedModels: enrichedModels.map(m => ({ name: m.name, capabilities: m.capabilities })),
        timestamp: new Date().toISOString()
      });
      setModels(enrichedModels);
      setDiscoveryComplete(true);
      console.log('🚨 DISCOVERY DEBUG: Called setModels and setDiscoveryComplete', {
        enrichedModelsCount: enrichedModels.length,
        timestamp: new Date().toISOString()
      });
      // Cache the discovered models
      saveModelsToCache(enrichedModels);
      showToast(
        `Discovery complete: Found ${discoveryResult.total_models} models across ${Object.keys(discoveryResult.host_status).length} instances`,
        'success'
      );
      if (discoveryResult.discovery_errors.length > 0) {
        showToast(`Some hosts had errors: ${discoveryResult.discovery_errors.length} issues`, 'warning');
      }
    } catch (err) {
      const errorMsg = err instanceof Error ? err.message : 'Unknown error occurred';
      setError(errorMsg);
      showToast(`Model discovery failed: ${errorMsg}`, 'error');
    } finally {
      setLoading(false);
    }
  }, [enabledInstanceUrls, instanceLookup, showToast, loadModelsFromCache, saveModelsToCache]);
  // Test model capabilities
  const testModelCapabilities = useCallback(async (model: EnrichedModel) => {
    const modelKey = `${model.name}@${model.instance_url}`;
    setTestingModels(prev => new Set(prev).add(modelKey));
    try {
      const capabilities = await ollamaService.getModelCapabilities(model.name, model.instance_url);
      const testResult = {
        chatWorks: capabilities.supports_chat,
        embeddingWorks: capabilities.supports_embedding,
        dimensions: capabilities.embedding_dimensions
      };
      setModels(prevModels => 
        prevModels.map(m => 
          m.name === model.name && m.instance_url === model.instance_url
            ? { ...m, testResult, status: 'available' as const }
            : m
        )
      );
      if (capabilities.error) {
        showToast(`Model test completed with warnings: ${capabilities.error}`, 'warning');
      } else {
        showToast(`Model ${model.name} tested successfully`, 'success');
      }
    } catch (error) {
      setModels(prevModels => 
        prevModels.map(m => 
          m.name === model.name && m.instance_url === model.instance_url
            ? { ...m, status: 'error' as const }
            : m
        )
      );
      showToast(`Failed to test ${model.name}: ${error instanceof Error ? error.message : 'Unknown error'}`, 'error');
    } finally {
      setTestingModels(prev => {
        const newSet = new Set(prev);
        newSet.delete(modelKey);
        return newSet;
      });
    }
  }, [showToast]);
  // Filter and sort models
  const filteredAndSortedModels = useMemo(() => {
    console.log('🚨 FILTERING DEBUG: filteredAndSortedModels useMemo running', {
      modelsLength: models.length,
      models: models.map(m => ({ name: m.name, capabilities: m.capabilities })),
      selectionState,
      timestamp: new Date().toISOString()
    });
    let filtered = models.filter(model => {
      // Text filter
      if (selectionState.filterText && !model.name.toLowerCase().includes(selectionState.filterText.toLowerCase())) {
        return false;
      }
      // Capability filters
      if (selectionState.showOnlyChat && !model.capabilities.includes('chat')) {
        return false;
      }
      if (selectionState.showOnlyEmbedding && !model.capabilities.includes('embedding')) {
        return false;
      }
      return true;
    });
    // Sort models
    filtered.sort((a, b) => {
      switch (selectionState.sortBy) {
        case 'name':
          return a.name.localeCompare(b.name);
        case 'size':
          return b.size - a.size;
        case 'instance':
          return (a.instanceName || '').localeCompare(b.instanceName || '');
        default:
          return 0;
      }
    });
    console.log('🚨 FILTERING DEBUG: filteredAndSortedModels result', {
      originalCount: models.length,
      filteredCount: filtered.length,
      filtered: filtered.map(m => ({ name: m.name, capabilities: m.capabilities })),
      timestamp: new Date().toISOString()
    });
    return filtered;
  }, [models, selectionState]);
  // Handle model selection
  const handleModelSelect = (model: EnrichedModel, type: 'chat' | 'embedding') => {
    if (type === 'chat' && !model.capabilities.includes('chat')) {
      showToast(`Model ${model.name} does not support chat functionality`, 'error');
      return;
    }
    if (type === 'embedding' && !model.capabilities.includes('embedding')) {
      showToast(`Model ${model.name} does not support embedding functionality`, 'error');
      return;
    }
    setSelectionState(prev => ({
      ...prev,
      [type === 'chat' ? 'selectedChatModel' : 'selectedEmbeddingModel']: model.name
    }));
  };
  // Apply selections and close modal
  const handleApplySelection = () => {
    onSelectModels({
      chatModel: selectionState.selectedChatModel || undefined,
      embeddingModel: selectionState.selectedEmbeddingModel || undefined
    });
    onClose();
  };
  // Reset modal state when closed
  const handleClose = () => {
    setSelectionState({
      selectedChatModel: initialChatModel || null,
      selectedEmbeddingModel: initialEmbeddingModel || null,
      filterText: '',
      showOnlyEmbedding: false,
      showOnlyChat: false,
      sortBy: 'name'
    });
    setError(null);
    onClose();
  };
  // Auto-discover when modal opens (only if no cache available)
  useEffect(() => {
    console.log('🟡 AUTO-DISCOVERY DEBUG: useEffect triggered', {
      isOpen,
      discoveryComplete,
      loading,
      hasCache,
      willAutoDiscover: isOpen && !discoveryComplete && !loading && !hasCache
    });
    if (isOpen && !discoveryComplete && !loading && !hasCache) {
      console.log('🟢 AUTO-DISCOVERY DEBUG: Starting auto-discovery');
      discoverModels();
    } else {
      console.log('🟠 AUTO-DISCOVERY DEBUG: Skipping auto-discovery', {
        reason: !isOpen ? 'modal closed' : 
                discoveryComplete ? 'already complete' :
                loading ? 'already loading' :
                hasCache ? 'has cache' : 'unknown'
      });
    }
  }, [isOpen, discoveryComplete, loading, hasCache, discoverModels]);
  if (!isOpen) return null;
  const modalContent = (
    <AnimatePresence>
      <motion.div
        initial={{ opacity: 0 }}
        animate={{ opacity: 1 }}
        exit={{ opacity: 0 }}
        className="fixed inset-0 z-50 flex items-center justify-center bg-black/50 backdrop-blur-sm"
        onClick={(e) => {
          if (e.target === e.currentTarget) handleClose();
        }}
      >
        <motion.div
          initial={{ opacity: 0, scale: 0.95, y: 20 }}
          animate={{ opacity: 1, scale: 1, y: 0 }}
          exit={{ opacity: 0, scale: 0.95, y: 20 }}
          className="w-full max-w-4xl max-h-[85vh] mx-4 bg-white dark:bg-gray-900 rounded-xl shadow-2xl overflow-hidden"
          onClick={(e) => e.stopPropagation()}
        >
          {/* Header */}
          <div className="border-b border-gray-200 dark:border-gray-700 p-6">
            <div className="flex items-center justify-between">
              <div>
                <h2 className="text-2xl font-bold text-gray-900 dark:text-white flex items-center gap-2">
                  <Database className="w-6 h-6 text-green-500" />
                  Ollama Model Discovery
                </h2>
                <p className="text-sm text-gray-600 dark:text-gray-400 mt-1">
                  Discover and select models from your Ollama instances
                  {hasCache && lastDiscoveryTime && (
                    <span className="ml-2 text-green-600 dark:text-green-400">
                      (Cached {new Date(lastDiscoveryTime).toLocaleTimeString()})
                    </span>
                  )}
                </p>
              </div>
              <Button
                variant="ghost"
                size="sm"
                onClick={handleClose}
                className="text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200"
              >
                <X className="w-5 h-5" />
              </Button>
            </div>
          </div>
          {/* Controls */}
          <div className="p-6 border-b border-gray-200 dark:border-gray-700">
            <div className="flex flex-col md:flex-row gap-4">
              {/* Search */}
              <div className="flex-1">
                <Input
                  type="text"
                  placeholder="Search models..."
                  value={selectionState.filterText}
                  onChange={(e) => setSelectionState(prev => ({ ...prev, filterText: e.target.value }))}
                  className="w-full"
                  icon={<Search className="w-4 h-4" />}
                />
              </div>
              {/* Filters */}
              <div className="flex gap-2">
                <Button
                  variant={selectionState.showOnlyChat ? "solid" : "outline"}
                  size="sm"
                  onClick={() => setSelectionState(prev => ({ 
                    ...prev, 
                    showOnlyChat: !prev.showOnlyChat,
                    showOnlyEmbedding: false
                  }))}
                  className="flex items-center gap-1"
                >
                  <MessageCircle className="w-4 h-4" />
                  Chat Only
                </Button>
                <Button
                  variant={selectionState.showOnlyEmbedding ? "solid" : "outline"}
                  size="sm"
                  onClick={() => setSelectionState(prev => ({ 
                    ...prev, 
                    showOnlyEmbedding: !prev.showOnlyEmbedding,
                    showOnlyChat: false
                  }))}
                  className="flex items-center gap-1"
                >
                  <Layers className="w-4 h-4" />
                  Embedding Only
                </Button>
              </div>
              {/* Refresh */}
              <Button
                variant="outline"
                size="sm"
                onClick={() => {
                  console.log('🚨 REFRESH BUTTON CLICKED - About to call discoverModels(true)', {
                    timestamp: new Date().toISOString(),
                    loading,
                    enabledInstanceUrls,
                    instanceUrlsCount: enabledInstanceUrls.length
                  });
                  discoverModels(true);  // Force refresh
                }}
                disabled={loading}
                className="flex items-center gap-1"
              >
                {loading ? (
                  <Loader className="w-4 h-4 animate-spin" />
                ) : (
                  <Activity className="w-4 h-4" />
                )}
                {loading ? 'Discovering...' : 'Refresh'}
              </Button>
            </div>
          </div>
          {/* Content */}
          <div className="flex-1 overflow-hidden">
            {error ? (
              <div className="p-6 text-center">
                <AlertCircle className="w-12 h-12 text-red-500 mx-auto mb-4" />
                <h3 className="text-lg font-semibold text-gray-900 dark:text-white mb-2">Discovery Failed</h3>
                <p className="text-gray-600 dark:text-gray-400 mb-4">{error}</p>
                <Button onClick={() => discoverModels(true)}>Try Again</Button>
              </div>
            ) : loading ? (
              <div className="p-6 text-center">
                <Loader className="w-12 h-12 text-green-500 mx-auto mb-4 animate-spin" />
                <h3 className="text-lg font-semibold text-gray-900 dark:text-white mb-2">Discovering Models</h3>
                <p className="text-gray-600 dark:text-gray-400 mb-2">
                  {discoveryProgress || `Scanning ${enabledInstanceUrls.length} Ollama instances...`}
                </p>
                <div className="mt-4">
                  <div className="bg-gray-200 dark:bg-gray-700 rounded-full h-2 overflow-hidden">
                    <div className="bg-green-500 h-full animate-pulse" style={{width: '100%'}}></div>
                  </div>
                </div>
              </div>
            ) : (
              <div className="h-96 overflow-y-auto p-6">
                {(() => {
                  console.log('🚨 RENDERING DEBUG: About to render models list', {
                    filteredAndSortedModelsLength: filteredAndSortedModels.length,
                    modelsLength: models.length,
                    loading,
                    error,
                    discoveryComplete,
                    timestamp: new Date().toISOString()
                  });
                  return null;
                })()}
                {filteredAndSortedModels.length === 0 ? (
                  <div className="text-center text-gray-500 dark:text-gray-400">
                    <Database className="w-16 h-16 mx-auto mb-4 opacity-50" />
                    <p className="text-lg font-medium mb-2">No models found</p>
                    <p className="text-sm">
                      {models.length === 0 
                        ? "Try refreshing to discover models from your Ollama instances"
                        : "Adjust your filters to see more models"
                      }
                    </p>
                  </div>
                ) : (
                  <div className="grid gap-4">
                    {filteredAndSortedModels.map((model) => {
                      const modelKey = `${model.name}@${model.instance_url}`;
                      const isTesting = testingModels.has(modelKey);
                      const isChatSelected = selectionState.selectedChatModel === model.name;
                      const isEmbeddingSelected = selectionState.selectedEmbeddingModel === model.name;
                      return (
                        <Card
                          key={modelKey}
                          className={`p-4 hover:shadow-md transition-shadow ${
                            isChatSelected || isEmbeddingSelected 
                              ? 'border-green-500 bg-green-50 dark:bg-green-900/20' 
                              : ''
                          }`}
                        >
                          <div className="flex items-start justify-between">
                            <div className="flex-1">
                              <div className="flex items-center gap-3 mb-2">
                                <h4 className="font-semibold text-gray-900 dark:text-white">{model.name}</h4>
                                {/* Capability badges */}
                                <div className="flex gap-1">
                                  {model.capabilities.includes('chat') && (
                                    <Badge variant="solid" className="bg-blue-100 text-blue-800 text-xs">
                                      <MessageCircle className="w-3 h-3 mr-1" />
                                      Chat
                                    </Badge>
                                  )}
                                  {model.capabilities.includes('embedding') && (
                                    <Badge variant="solid" className="bg-purple-100 text-purple-800 text-xs">
                                      <Layers className="w-3 h-3 mr-1" />
                                      {model.embedding_dimensions}D
                                    </Badge>
                                  )}
                                </div>
                              </div>
                              <div className="flex items-center gap-4 text-sm text-gray-600 dark:text-gray-400 mb-3">
                                <span className="flex items-center gap-1">
                                  <Server className="w-4 h-4" />
                                  {model.instanceName}
                                </span>
                                <span className="flex items-center gap-1">
                                  <HardDrive className="w-4 h-4" />
                                  {(model.size / (1024 ** 3)).toFixed(1)} GB
                                </span>
                                {model.parameters?.family && (
                                  <span className="flex items-center gap-1">
                                    <Cpu className="w-4 h-4" />
                                    {model.parameters.family}
                                  </span>
                                )}
                              </div>
                              {/* Test result display */}
                              {model.testResult && (
                                <div className="flex gap-2 mb-2">
                                  {model.testResult.chatWorks && (
                                    <Badge variant="solid" className="bg-green-100 text-green-800 text-xs">
                                      ✓ Chat Verified
                                    </Badge>
                                  )}
                                  {model.testResult.embeddingWorks && (
                                    <Badge variant="solid" className="bg-green-100 text-green-800 text-xs">
                                      ✓ Embedding Verified ({model.testResult.dimensions}D)
                                    </Badge>
                                  )}
                                </div>
                              )}
                            </div>
                            <div className="flex flex-col gap-2">
                              {/* Action buttons */}
                              <div className="flex gap-2">
                                {model.capabilities.includes('chat') && (
                                  <Button
                                    size="sm"
                                    variant={isChatSelected ? "solid" : "outline"}
                                    onClick={() => handleModelSelect(model, 'chat')}
                                    className="text-xs"
                                  >
                                    {isChatSelected ? '✓ Selected for Chat' : 'Select for Chat'}
                                  </Button>
                                )}
                                {model.capabilities.includes('embedding') && (
                                  <Button
                                    size="sm"
                                    variant={isEmbeddingSelected ? "solid" : "outline"}
                                    onClick={() => handleModelSelect(model, 'embedding')}
                                    className="text-xs"
                                  >
                                    {isEmbeddingSelected ? '✓ Selected for Embedding' : 'Select for Embedding'}
                                  </Button>
                                )}
                              </div>
                              {/* Test button */}
                              <Button
                                size="sm"
                                variant="ghost"
                                onClick={() => testModelCapabilities(model)}
                                disabled={isTesting}
                                className="text-xs"
                              >
                                {isTesting ? (
                                  <>
                                    <Loader className="w-3 h-3 mr-1 animate-spin" />
                                    Testing...
                                  </>
                                ) : (
                                  <>
                                    <CheckCircle className="w-3 h-3 mr-1" />
                                    Test Model
                                  </>
                                )}
                              </Button>
                            </div>
                          </div>
                        </Card>
                      );
                    })}
                  </div>
                )}
              </div>
            )}
          </div>
          {/* Footer */}
          <div className="border-t border-gray-200 dark:border-gray-700 p-6">
            <div className="flex items-center justify-between">
              <div className="text-sm text-gray-600 dark:text-gray-400">
                {selectionState.selectedChatModel && (
                  <span className="mr-4">Chat: <strong>{selectionState.selectedChatModel}</strong></span>
                )}
                {selectionState.selectedEmbeddingModel && (
                  <span>Embedding: <strong>{selectionState.selectedEmbeddingModel}</strong></span>
                )}
                {!selectionState.selectedChatModel && !selectionState.selectedEmbeddingModel && (
                  <span>No models selected</span>
                )}
              </div>
              <div className="flex gap-2">
                <Button variant="outline" onClick={handleClose}>
                  Cancel
                </Button>
                <Button 
                  onClick={handleApplySelection}
                  disabled={!selectionState.selectedChatModel && !selectionState.selectedEmbeddingModel}
                >
                  Apply Selection
                </Button>
              </div>
            </div>
          </div>
        </motion.div>
      </motion.div>
    </AnimatePresence>
  );
  return createPortal(modalContent, document.body);
 };
 export default OllamaModelDiscoveryModal;
--- a/archon-ui-main/src/components/settings/OllamaModelSelectionModal.tsx
+++ b/archon-ui-main/src/components/settings/OllamaModelSelectionModal.tsx
--- a/archon-ui-main/src/components/settings/RAGSettings.tsx
+++ b/archon-ui-main/src/components/settings/RAGSettings.tsx
--- a/archon-ui-main/src/components/settings/types/OllamaTypes.ts
+++ b/archon-ui-main/src/components/settings/types/OllamaTypes.ts
@ -0,0 +1,184 @@
 /**
 * TypeScript type definitions for Ollama components and services
 * 
 * Provides comprehensive type definitions for Ollama multi-instance management,
 * model discovery, and health monitoring across the frontend application.
 */
 // Core Ollama instance configuration
 export interface OllamaInstance {
  id: string;
  name: string;
  baseUrl: string;
  instanceType: 'chat' | 'embedding' | 'both';
  isEnabled: boolean;
  isPrimary: boolean;
  healthStatus: {
    isHealthy?: boolean;
    lastChecked: Date;
    responseTimeMs?: number;
    error?: string;
  };
  loadBalancingWeight?: number;
  lastHealthCheck?: string;
  modelsAvailable?: number;
  responseTimeMs?: number;
 }
 // Configuration for dual-host setups
 export interface OllamaConfiguration {
  chatInstance: OllamaInstance;
  embeddingInstance: OllamaInstance;
  selectedChatModel?: string;
  selectedEmbeddingModel?: string;
  fallbackToChatInstance: boolean;
 }
 // Model information from discovery
 export interface OllamaModel {
  name: string;
  tag: string;
  size: number;
  digest: string;
  capabilities: ('chat' | 'embedding')[];
  embeddingDimensions?: number;
  parameters?: {
    family: string;
    parameterSize: string;
    quantization: string;
  };
  instanceUrl: string;
 }
 // Health status for instances
 export interface InstanceHealth {
  instanceUrl: string;
  isHealthy: boolean;
  responseTimeMs?: number;
  modelsAvailable?: number;
  errorMessage?: string;
  lastChecked?: string;
 }
 // Model discovery results
 export interface ModelDiscoveryResults {
  totalModels: number;
  chatModels: OllamaModel[];
  embeddingModels: OllamaModel[];
  hostStatus: Record<string, {
    status: 'online' | 'error';
    modelsCount?: number;
    error?: string;
  }>;
  discoveryErrors: string[];
 }
 // Props for modal components
 export interface ModelDiscoveryModalProps {
  isOpen: boolean;
  onClose: () => void;
  onSelectModels: (models: { chatModel?: string; embeddingModel?: string }) => void;
  instances: OllamaInstance[];
 }
 // Props for health indicator component
 export interface HealthIndicatorProps {
  instance: OllamaInstance;
  onRefresh: (instanceId: string) => void;
  showDetails?: boolean;
 }
 // Props for configuration panel
 export interface ConfigurationPanelProps {
  isVisible: boolean;
  onConfigChange: (instances: OllamaInstance[]) => void;
  className?: string;
  separateHosts?: boolean;
 }
 // Validation and error types
 export interface ValidationResult {
  isValid: boolean;
  message: string;
  details?: string;
  suggestedAction?: string;
 }
 export interface ConnectionTestResult {
  isHealthy: boolean;
  responseTimeMs?: number;
  modelsAvailable?: number;
  error?: string;
 }
 // UI State types
 export interface ModelSelectionState {
  selectedChatModel: string | null;
  selectedEmbeddingModel: string | null;
  filterText: string;
  showOnlyEmbedding: boolean;
  showOnlyChat: boolean;
  sortBy: 'name' | 'size' | 'instance';
 }
 // Form data types
 export interface AddInstanceFormData {
  name: string;
  baseUrl: string;
  instanceType: 'chat' | 'embedding' | 'both';
 }
 // Embedding routing information
 export interface EmbeddingRoute {
  modelName: string;
  instanceUrl: string;
  dimensions: number;
  targetColumn: string;
  performanceScore: number;
  confidence: number;
 }
 // Statistics and monitoring
 export interface InstanceStatistics {
  totalInstances: number;
  activeInstances: number;
  averageResponseTime?: number;
  totalModels: number;
  healthyInstancesCount: number;
 }
 // Event types for component communication
 export type OllamaEvent = 
  | { type: 'INSTANCE_ADDED'; payload: OllamaInstance }
  | { type: 'INSTANCE_REMOVED'; payload: string }
  | { type: 'INSTANCE_UPDATED'; payload: OllamaInstance }
  | { type: 'HEALTH_CHECK_COMPLETED'; payload: { instanceId: string; result: ConnectionTestResult } }
  | { type: 'MODEL_DISCOVERY_COMPLETED'; payload: ModelDiscoveryResults }
  | { type: 'CONFIGURATION_CHANGED'; payload: OllamaConfiguration };
 // API Response types (re-export from service for convenience)
 export type { 
  ModelDiscoveryResponse,
  InstanceHealthResponse,
  InstanceValidationResponse,
  EmbeddingRouteResponse,
  EmbeddingRoutesResponse 
 } from '../../services/ollamaService';
 // Error handling types
 export interface OllamaError {
  code: string;
  message: string;
  context?: string;
  retryable?: boolean;
 }
 // Settings integration
 export interface OllamaSettings {
  enableHealthMonitoring: boolean;
  healthCheckInterval: number;
  autoDiscoveryEnabled: boolean;
  modelCacheTtl: number;
  connectionTimeout: number;
  maxConcurrentHealthChecks: number;
 }
--- a/archon-ui-main/src/services/credentialsService.ts
+++ b/archon-ui-main/src/services/credentialsService.ts
@ -19,6 +19,9 @@ export interface RagSettings {
  MODEL_CHOICE: string;
  LLM_PROVIDER?: string;
  LLM_BASE_URL?: string;
  LLM_INSTANCE_NAME?: string;
  OLLAMA_EMBEDDING_URL?: string;
  OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
  EMBEDDING_MODEL?: string;
  // Crawling Performance Settings
  CRAWL_BATCH_SIZE?: number;
@ -53,6 +56,20 @@ export interface CodeExtractionSettings {
  ENABLE_CODE_SUMMARIES: boolean;
 }
 export interface OllamaInstance {
  id: string;
  name: string;
  baseUrl: string;
  isEnabled: boolean;
  isPrimary: boolean;
  instanceType?: 'chat' | 'embedding' | 'both';
  loadBalancingWeight?: number;
  isHealthy?: boolean;
  responseTimeMs?: number;
  modelsAvailable?: number;
  lastHealthCheck?: string;
 }
 import { getApiUrl } from "../config/api";
 class CredentialsService {
@ -139,6 +156,24 @@ class CredentialsService {
    return response.json();
  }
  async checkCredentialStatus(
    keys: string[]
  ): Promise<{ [key: string]: { key: string; value?: string; has_value: boolean; error?: string } }> {
    const response = await fetch(`${this.baseUrl}/api/credentials/status-check`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ keys }),
    });
    if (!response.ok) {
      throw new Error(`Failed to check credential status: ${response.statusText}`);
    }
    return response.json();
  }
  async getRagSettings(): Promise<RagSettings> {
    const ragCredentials = await this.getCredentialsByCategory("rag_strategy");
    const apiKeysCredentials = await this.getCredentialsByCategory("api_keys");
@ -152,6 +187,9 @@ class CredentialsService {
      MODEL_CHOICE: "gpt-4.1-nano",
      LLM_PROVIDER: "openai",
      LLM_BASE_URL: "",
      LLM_INSTANCE_NAME: "",
      OLLAMA_EMBEDDING_URL: "",
      OLLAMA_EMBEDDING_INSTANCE_NAME: "",
      EMBEDDING_MODEL: "",
      // Crawling Performance Settings defaults
      CRAWL_BATCH_SIZE: 50,
@ -180,6 +218,9 @@ class CredentialsService {
            "MODEL_CHOICE",
            "LLM_PROVIDER",
            "LLM_BASE_URL",
            "LLM_INSTANCE_NAME",
            "OLLAMA_EMBEDDING_URL",
            "OLLAMA_EMBEDDING_INSTANCE_NAME",
            "EMBEDDING_MODEL",
            "CRAWL_WAIT_STRATEGY",
          ].includes(cred.key)
@ -366,6 +407,179 @@ class CredentialsService {
    await Promise.all(promises);
  }
  // Ollama Instance Management
  async getOllamaInstances(): Promise<OllamaInstance[]> {
    try {
      const ollamaCredentials = await this.getCredentialsByCategory('ollama_instances');
      // Convert credentials to OllamaInstance objects
      const instances: OllamaInstance[] = [];
      const instanceMap: Record<string, Partial<OllamaInstance>> = {};
      // Group credentials by instance ID
      ollamaCredentials.forEach(cred => {
        const parts = cred.key.split('_');
        if (parts.length >= 3 && parts[0] === 'ollama' && parts[1] === 'instance') {
          const instanceId = parts[2];
          const field = parts.slice(3).join('_');
          if (!instanceMap[instanceId]) {
            instanceMap[instanceId] = { id: instanceId };
          }
          // Parse the field value
          let value: any = cred.value;
          if (field === 'isEnabled' || field === 'isPrimary' || field === 'isHealthy') {
            value = cred.value === 'true';
          } else if (field === 'responseTimeMs' || field === 'modelsAvailable' || field === 'loadBalancingWeight') {
            value = parseInt(cred.value || '0', 10);
          }
          (instanceMap[instanceId] as any)[field] = value;
        }
      });
      // Convert to array and ensure required fields
      Object.values(instanceMap).forEach(instance => {
        if (instance.id && instance.name && instance.baseUrl) {
          instances.push({
            id: instance.id,
            name: instance.name,
            baseUrl: instance.baseUrl,
            isEnabled: instance.isEnabled ?? true,
            isPrimary: instance.isPrimary ?? false,
            instanceType: instance.instanceType ?? 'both',
            loadBalancingWeight: instance.loadBalancingWeight ?? 100,
            isHealthy: instance.isHealthy,
            responseTimeMs: instance.responseTimeMs,
            modelsAvailable: instance.modelsAvailable,
            lastHealthCheck: instance.lastHealthCheck
          });
        }
      });
      return instances;
    } catch (error) {
      console.error('Failed to load Ollama instances from database:', error);
      return [];
    }
  }
  async setOllamaInstances(instances: OllamaInstance[]): Promise<void> {
    try {
      // First, delete existing ollama instance credentials
      const existingCredentials = await this.getCredentialsByCategory('ollama_instances');
      for (const cred of existingCredentials) {
        await this.deleteCredential(cred.key);
      }
      // Add new instance credentials
      const promises: Promise<any>[] = [];
      instances.forEach(instance => {
        const fields: Record<string, any> = {
          name: instance.name,
          baseUrl: instance.baseUrl,
          isEnabled: instance.isEnabled,
          isPrimary: instance.isPrimary,
          instanceType: instance.instanceType || 'both',
          loadBalancingWeight: instance.loadBalancingWeight || 100
        };
        // Add optional health-related fields
        if (instance.isHealthy !== undefined) {
          fields.isHealthy = instance.isHealthy;
        }
        if (instance.responseTimeMs !== undefined) {
          fields.responseTimeMs = instance.responseTimeMs;
        }
        if (instance.modelsAvailable !== undefined) {
          fields.modelsAvailable = instance.modelsAvailable;
        }
        if (instance.lastHealthCheck) {
          fields.lastHealthCheck = instance.lastHealthCheck;
        }
        // Create a credential for each field
        Object.entries(fields).forEach(([field, value]) => {
          promises.push(
            this.createCredential({
              key: `ollama_instance_${instance.id}_${field}`,
              value: value.toString(),
              is_encrypted: false,
              category: 'ollama_instances'
            })
          );
        });
      });
      await Promise.all(promises);
    } catch (error) {
      throw this.handleCredentialError(error, 'Saving Ollama instances');
    }
  }
  async addOllamaInstance(instance: OllamaInstance): Promise<void> {
    const instances = await this.getOllamaInstances();
    instances.push(instance);
    await this.setOllamaInstances(instances);
  }
  async updateOllamaInstance(instanceId: string, updates: Partial<OllamaInstance>): Promise<void> {
    const instances = await this.getOllamaInstances();
    const instanceIndex = instances.findIndex(inst => inst.id === instanceId);
    if (instanceIndex === -1) {
      throw new Error(`Ollama instance with ID ${instanceId} not found`);
    }
    instances[instanceIndex] = { ...instances[instanceIndex], ...updates };
    await this.setOllamaInstances(instances);
  }
  async removeOllamaInstance(instanceId: string): Promise<void> {
    const instances = await this.getOllamaInstances();
    const filteredInstances = instances.filter(inst => inst.id !== instanceId);
    if (filteredInstances.length === instances.length) {
      throw new Error(`Ollama instance with ID ${instanceId} not found`);
    }
    await this.setOllamaInstances(filteredInstances);
  }
  async migrateOllamaFromLocalStorage(): Promise<{ migrated: boolean; instanceCount: number }> {
    try {
      // Check if there are existing instances in the database
      const existingInstances = await this.getOllamaInstances();
      if (existingInstances.length > 0) {
        return { migrated: false, instanceCount: 0 };
      }
      // Try to load from localStorage
      const localStorageData = localStorage.getItem('ollama-instances');
      if (!localStorageData) {
        return { migrated: false, instanceCount: 0 };
      }
      const localInstances = JSON.parse(localStorageData);
      if (!Array.isArray(localInstances) || localInstances.length === 0) {
        return { migrated: false, instanceCount: 0 };
      }
      // Migrate to database
      await this.setOllamaInstances(localInstances);
      // Clean up localStorage
      localStorage.removeItem('ollama-instances');
      return { migrated: true, instanceCount: localInstances.length };
    } catch (error) {
      console.error('Failed to migrate Ollama instances from localStorage:', error);
      return { migrated: false, instanceCount: 0 };
    }
  }
 }
 export const credentialsService = new CredentialsService();
--- a/archon-ui-main/src/services/ollamaService.ts
+++ b/archon-ui-main/src/services/ollamaService.ts
@ -0,0 +1,485 @@
 /**
 * Ollama Service Client
 * 
 * Provides frontend API client for Ollama model discovery, validation, and health monitoring.
 * Integrates with the enhanced backend Ollama endpoints for multi-instance configurations.
 */
 import { getApiUrl } from "../config/api";
 // Type definitions for Ollama API responses
 export interface OllamaModel {
  name: string;
  tag: string;
  size: number;
  digest: string;
  capabilities: ('chat' | 'embedding')[];
  embedding_dimensions?: number;
  parameters?: {
    family?: string;
    parameter_size?: string;
    quantization?: string;
    parameter_count?: string;
    format?: string;
  };
  instance_url: string;
  last_updated?: string;
  // Real API data from /api/show endpoint
  context_window?: number;
  architecture?: string;
  block_count?: number;
  attention_heads?: number;
  format?: string;
  parent_model?: string;
 }
 export interface ModelDiscoveryResponse {
  total_models: number;
  chat_models: Array<{
    name: string;
    instance_url: string;
    size: number;
    parameters?: any;
    // Real API data from /api/show
    context_window?: number;
    architecture?: string;
    block_count?: number;
    attention_heads?: number;
    format?: string;
    parent_model?: string;
    capabilities?: string[];
  }>;
  embedding_models: Array<{
    name: string;
    instance_url: string;
    dimensions?: number;
    size: number;
    parameters?: any;
    // Real API data from /api/show
    architecture?: string;
    format?: string;
    parent_model?: string;
    capabilities?: string[];
  }>;
  host_status: Record<string, {
    status: 'online' | 'error';
    error?: string;
    models_count?: number;
    instance_url?: string;
  }>;
  discovery_errors: string[];
  unique_model_names: string[];
 }
 export interface InstanceHealthResponse {
  summary: {
    total_instances: number;
    healthy_instances: number;
    unhealthy_instances: number;
    average_response_time_ms?: number;
  };
  instance_status: Record<string, {
    is_healthy: boolean;
    response_time_ms?: number;
    models_available?: number;
    error_message?: string;
    last_checked?: string;
  }>;
  timestamp: string;
 }
 export interface InstanceValidationResponse {
  is_valid: boolean;
  instance_url: string;
  response_time_ms?: number;
  models_available: number;
  error_message?: string;
  capabilities: {
    total_models?: number;
    chat_models?: string[];
    embedding_models?: string[];
    supported_dimensions?: number[];
    error?: string;
  };
  health_status: Record<string, any>;
 }
 export interface EmbeddingRouteResponse {
  target_column: string;
  model_name: string;
  instance_url: string;
  dimensions: number;
  confidence: number;
  fallback_applied: boolean;
  routing_strategy: string;
  performance_score?: number;
 }
 export interface EmbeddingRoutesResponse {
  total_routes: number;
  routes: Array<{
    model_name: string;
    instance_url: string;
    dimensions: number;
    column_name: string;
    performance_score: number;
    index_type: string;
  }>;
  dimension_analysis: Record<string, {
    count: number;
    models: string[];
    avg_performance: number;
  }>;
  routing_statistics: Record<string, any>;
 }
 // Request interfaces
 export interface ModelDiscoveryOptions {
  instanceUrls: string[];
  includeCapabilities?: boolean;
 }
 export interface InstanceValidationOptions {
  instanceUrl: string;
  instanceType?: 'chat' | 'embedding' | 'both';
  timeoutSeconds?: number;
 }
 export interface EmbeddingRouteOptions {
  modelName: string;
  instanceUrl: string;
  textSample?: string;
 }
 class OllamaService {
  private baseUrl = getApiUrl();
  private handleApiError(error: any, context: string): Error {
    const errorMessage = error instanceof Error ? error.message : String(error);
    // Check for network errors
    if (
      errorMessage.toLowerCase().includes("network") ||
      errorMessage.includes("fetch") ||
      errorMessage.includes("Failed to fetch")
    ) {
      return new Error(
        `Network error while ${context.toLowerCase()}: ${errorMessage}. ` +
          `Please check your connection and Ollama server status.`,
      );
    }
    // Check for timeout errors
    if (errorMessage.includes("timeout") || errorMessage.includes("AbortError")) {
      return new Error(
        `Timeout error while ${context.toLowerCase()}: The Ollama instance may be slow to respond or unavailable.`
      );
    }
    // Return original error with context
    return new Error(`${context} failed: ${errorMessage}`);
  }
  /**
   * Discover models from multiple Ollama instances
   */
  async discoverModels(options: ModelDiscoveryOptions): Promise<ModelDiscoveryResponse> {
    try {
      if (!options.instanceUrls || options.instanceUrls.length === 0) {
        throw new Error("At least one instance URL is required for model discovery");
      }
      // Build query parameters
      const params = new URLSearchParams();
      options.instanceUrls.forEach(url => {
        params.append('instance_urls', url);
      });
      if (options.includeCapabilities !== undefined) {
        params.append('include_capabilities', options.includeCapabilities.toString());
      }
      const response = await fetch(`${this.baseUrl}/api/ollama/models?${params.toString()}`, {
        method: 'GET',
        headers: {
          'Content-Type': 'application/json',
        },
      });
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`HTTP ${response.status}: ${errorText}`);
      }
      const data = await response.json();
      return data;
    } catch (error) {
      throw this.handleApiError(error, "Model discovery");
    }
  }
  /**
   * Check health status of multiple Ollama instances
   */
  async checkInstanceHealth(instanceUrls: string[], includeModels: boolean = false): Promise<InstanceHealthResponse> {
    try {
      if (!instanceUrls || instanceUrls.length === 0) {
        throw new Error("At least one instance URL is required for health checking");
      }
      // Build query parameters
      const params = new URLSearchParams();
      instanceUrls.forEach(url => {
        params.append('instance_urls', url);
      });
      if (includeModels) {
        params.append('include_models', 'true');
      }
      const response = await fetch(`${this.baseUrl}/api/ollama/instances/health?${params.toString()}`, {
        method: 'GET',
        headers: {
          'Content-Type': 'application/json',
        },
      });
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`HTTP ${response.status}: ${errorText}`);
      }
      const data = await response.json();
      return data;
    } catch (error) {
      throw this.handleApiError(error, "Instance health checking");
    }
  }
  /**
   * Validate a specific Ollama instance with comprehensive testing
   */
  async validateInstance(options: InstanceValidationOptions): Promise<InstanceValidationResponse> {
    try {
      const requestBody = {
        instance_url: options.instanceUrl,
        instance_type: options.instanceType,
        timeout_seconds: options.timeoutSeconds || 30,
      };
      const response = await fetch(`${this.baseUrl}/api/ollama/validate`, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(requestBody),
      });
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`HTTP ${response.status}: ${errorText}`);
      }
      const data = await response.json();
      return data;
    } catch (error) {
      throw this.handleApiError(error, "Instance validation");
    }
  }
  /**
   * Analyze embedding routing for a specific model and instance
   */
  async analyzeEmbeddingRoute(options: EmbeddingRouteOptions): Promise<EmbeddingRouteResponse> {
    try {
      const requestBody = {
        model_name: options.modelName,
        instance_url: options.instanceUrl,
        text_sample: options.textSample,
      };
      const response = await fetch(`${this.baseUrl}/api/ollama/embedding/route`, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(requestBody),
      });
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`HTTP ${response.status}: ${errorText}`);
      }
      const data = await response.json();
      return data;
    } catch (error) {
      throw this.handleApiError(error, "Embedding route analysis");
    }
  }
  /**
   * Get all available embedding routes across multiple instances
   */
  async getEmbeddingRoutes(instanceUrls: string[], sortByPerformance: boolean = true): Promise<EmbeddingRoutesResponse> {
    try {
      if (!instanceUrls || instanceUrls.length === 0) {
        throw new Error("At least one instance URL is required for embedding routes");
      }
      // Build query parameters
      const params = new URLSearchParams();
      instanceUrls.forEach(url => {
        params.append('instance_urls', url);
      });
      if (sortByPerformance) {
        params.append('sort_by_performance', 'true');
      }
      const response = await fetch(`${this.baseUrl}/api/ollama/embedding/routes?${params.toString()}`, {
        method: 'GET',
        headers: {
          'Content-Type': 'application/json',
        },
      });
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`HTTP ${response.status}: ${errorText}`);
      }
      const data = await response.json();
      return data;
    } catch (error) {
      throw this.handleApiError(error, "Getting embedding routes");
    }
  }
  /**
   * Clear all Ollama-related caches
   */
  async clearCaches(): Promise<{ message: string }> {
    try {
      const response = await fetch(`${this.baseUrl}/api/ollama/cache`, {
        method: 'DELETE',
        headers: {
          'Content-Type': 'application/json',
        },
      });
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`HTTP ${response.status}: ${errorText}`);
      }
      const data = await response.json();
      return data;
    } catch (error) {
      throw this.handleApiError(error, "Cache clearing");
    }
  }
  /**
   * Test connectivity to a single Ollama instance (quick health check) with retry logic
   */
  async testConnection(instanceUrl: string, retryCount = 3): Promise<{ isHealthy: boolean; responseTime?: number; error?: string }> {
    const maxRetries = retryCount;
    let lastError: Error | null = null;
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        const startTime = Date.now();
        const healthResponse = await this.checkInstanceHealth([instanceUrl], false);
        const responseTime = Date.now() - startTime;
        const instanceStatus = healthResponse.instance_status[instanceUrl];
        const result = {
          isHealthy: instanceStatus?.is_healthy || false,
          responseTime: instanceStatus?.response_time_ms || responseTime,
          error: instanceStatus?.error_message,
        };
        // If successful, return immediately
        if (result.isHealthy) {
          return result;
        }
        // If not healthy but we got a valid response, store error for potential retry
        lastError = new Error(result.error || 'Instance not available');
      } catch (error) {
        lastError = error instanceof Error ? error : new Error('Unknown error');
      }
      // If this wasn't the last attempt, wait before retrying
      if (attempt < maxRetries) {
        const delayMs = Math.pow(2, attempt - 1) * 1000; // Exponential backoff: 1s, 2s, 4s
        await new Promise(resolve => setTimeout(resolve, delayMs));
      }
    }
    // All retries failed, return error result
    return {
      isHealthy: false,
      error: lastError?.message || 'Connection failed after retries',
    };
  }
  /**
   * Get model capabilities for a specific model
   */
  async getModelCapabilities(modelName: string, instanceUrl: string): Promise<{
    supports_chat: boolean;
    supports_embedding: boolean;
    embedding_dimensions?: number;
    error?: string;
  }> {
    try {
      // Use the validation endpoint to get capabilities
      const validation = await this.validateInstance({
        instanceUrl,
        instanceType: 'both',
      });
      const capabilities = validation.capabilities;
      const chatModels = capabilities.chat_models || [];
      const embeddingModels = capabilities.embedding_models || [];
      // Find the model in the lists
      const supportsChat = chatModels.includes(modelName);
      const supportsEmbedding = embeddingModels.includes(modelName);
      // For embedding dimensions, we need to use the embedding route analysis
      let embeddingDimensions: number | undefined;
      if (supportsEmbedding) {
        try {
          const route = await this.analyzeEmbeddingRoute({
            modelName,
            instanceUrl,
          });
          embeddingDimensions = route.dimensions;
        } catch (error) {
          // Ignore routing errors, just report basic capability
        }
      }
      return {
        supports_chat: supportsChat,
        supports_embedding: supportsEmbedding,
        embedding_dimensions: embeddingDimensions,
      };
    } catch (error) {
      return {
        supports_chat: false,
        supports_embedding: false,
        error: error instanceof Error ? error.message : String(error),
      };
    }
  }
 }
 // Export singleton instance
 export const ollamaService = new OllamaService();
--- a/archon-ui-main/vite.config.ts
+++ b/archon-ui-main/vite.config.ts
@ -307,6 +307,18 @@ export default defineConfig(({ mode }: ConfigEnv): UserConfig => {
              console.log('🔄 [VITE PROXY] Forwarding:', req.method, req.url, 'to', `http://${proxyHost}:${port}${req.url}`);
            });
          }
        },
        // Health check endpoint proxy
        '/health': {
          target: `http://${host}:${port}`,
          changeOrigin: true,
          secure: false
        },
        // Socket.IO specific proxy configuration
        '/socket.io': {
          target: `http://${host}:${port}`,
          changeOrigin: true,
          ws: true
        }
      },
    },
--- a/archon-ui-main/vitest.config.ts
+++ b/archon-ui-main/vitest.config.ts
@ -13,7 +13,17 @@ export default defineConfig({
      'src/**/*.test.{ts,tsx}',     // Colocated tests in features
      'src/**/*.spec.{ts,tsx}',
      'tests/**/*.test.{ts,tsx}',   // Tests in tests directory  
-      'tests/**/*.spec.{ts,tsx}'
+      'tests/**/*.spec.{ts,tsx}',
      'test/components.test.tsx',
      'test/pages.test.tsx', 
      'test/user_flows.test.tsx',
      'test/errors.test.tsx',
      'test/services/projectService.test.ts',
      'test/components/project-tasks/DocsTab.integration.test.tsx',
      'test/config/api.test.ts',
      'test/components/settings/OllamaConfigurationPanel.test.tsx',
      'test/components/settings/OllamaInstanceHealthIndicator.test.tsx',
      'test/components/settings/OllamaModelDiscoveryModal.test.tsx'
    ],
    exclude: ['node_modules', 'dist', '.git', '.cache', 'test.backup', '*.backup/**', 'test-backups'],
    reporters: ['dot', 'json'],
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -151,13 +151,15 @@ services:
    ports:
      - "${ARCHON_UI_PORT:-3737}:3737"
    environment:
-      - VITE_API_URL=http://${HOST:-localhost}:${ARCHON_SERVER_PORT:-8181}
+      # Don't set VITE_API_URL so frontend uses relative URLs through proxy
      # - VITE_API_URL=http://${HOST:-localhost}:${ARCHON_SERVER_PORT:-8181}
      - VITE_ARCHON_SERVER_PORT=${ARCHON_SERVER_PORT:-8181}
      - ARCHON_SERVER_PORT=${ARCHON_SERVER_PORT:-8181}
      - HOST=${HOST:-localhost}
      - PROD=${PROD:-false}
      - VITE_ALLOWED_HOSTS=${VITE_ALLOWED_HOSTS:-}
      - VITE_SHOW_DEVTOOLS=${VITE_SHOW_DEVTOOLS:-false}
      - DOCKER_ENV=true
    networks:
      - app-network
    healthcheck:
--- a/migration/DB_UPGRADE_INSTRUCTIONS.md
+++ b/migration/DB_UPGRADE_INSTRUCTIONS.md
@ -0,0 +1,167 @@
 # Archon Database Migrations
 This folder contains database migration scripts for upgrading existing Archon installations.
 ## Available Migration Scripts
 ### 1. `backup_database.sql` - Pre-Migration Backup
 **Always run this FIRST before any migration!**
 Creates timestamped backup tables of all your existing data:
 - ✅ Complete backup of `archon_crawled_pages`
 - ✅ Complete backup of `archon_code_examples` 
 - ✅ Complete backup of `archon_sources`
 - ✅ Easy restore commands provided
 - ✅ Row count verification
 ### 2. `upgrade_database.sql` - Main Migration Script
 **Use this migration if you:**
 - Have an existing Archon installation from before multi-dimensional embedding support
 - Want to upgrade to the latest features including model tracking
 - Need to migrate existing embedding data to the new schema
 **Features added:**
 - ✅ Multi-dimensional embedding support (384, 768, 1024, 1536, 3072 dimensions)
 - ✅ Model tracking fields (`llm_chat_model`, `embedding_model`, `embedding_dimension`)
 - ✅ Optimized indexes for improved search performance
 - ✅ Enhanced search functions with dimension-aware querying
 - ✅ Automatic migration of existing embedding data
 - ✅ Legacy compatibility maintained
 ### 3. `validate_migration.sql` - Post-Migration Validation
 **Run this after the migration to verify everything worked correctly**
 Validates your migration results:
 - ✅ Verifies all required columns were added
 - ✅ Checks that database indexes were created
 - ✅ Tests that all functions are working
 - ✅ Shows sample data with new fields
 - ✅ Provides clear success/failure reporting
 ## Migration Process (Follow This Order!)
 ### Step 1: Backup Your Data
 ```sql
 -- Run: backup_database.sql
 -- This creates timestamped backup tables of all your data
 ```
 ### Step 2: Run the Main Migration
 ```sql  
 -- Run: upgrade_database.sql
 -- This adds all the new features and migrates existing data
 ```
 ### Step 3: Validate the Results
 ```sql
 -- Run: validate_migration.sql  
 -- This verifies everything worked correctly
 ```
 ### Step 4: Restart Services
 ```bash
 docker compose restart
 ```
 ## How to Run Migrations
 ### Method 1: Using Supabase Dashboard (Recommended)
 1. Open your Supabase project dashboard
 2. Go to **SQL Editor**
 3. Copy and paste the contents of the migration file
 4. Click **Run** to execute the migration
 5. **Important**: Supabase only shows the result of the last query - all our scripts end with a status summary table that shows the complete results
 ### Method 2: Using psql Command Line
 ```bash
 # Connect to your database
 psql -h your-supabase-host -p 5432 -U postgres -d postgres
 # Run the migration
 \i /path/to/upgrade_database.sql
 # Exit
 \q
 ```
 ### Method 3: Using Docker (if using local Supabase)
 ```bash
 # Copy migration to container
 docker cp upgrade_database.sql supabase-db:/tmp/
 # Execute migration
 docker exec -it supabase-db psql -U postgres -d postgres -f /tmp/upgrade_database.sql
 ```
 ## Migration Safety
 - ✅ **Safe to run multiple times** - Uses `IF NOT EXISTS` checks
 - ✅ **Non-destructive** - Preserves all existing data
 - ✅ **Automatic rollback** - Uses database transactions
 - ✅ **Comprehensive logging** - Detailed progress notifications
 ## After Migration
 1. **Restart Archon Services:**
   ```bash
   docker-compose restart
   ```
 2. **Verify Migration:**
   - Check the Archon logs for any errors
   - Try running a test crawl
   - Verify search functionality works
 3. **Configure New Features:**
   - Go to Settings page in Archon UI
   - Configure your preferred LLM and embedding models
   - New crawls will automatically use model tracking
 ## Troubleshooting
 ### Permission Errors
 If you get permission errors, ensure your database user has sufficient privileges:
 ```sql
 GRANT ALL PRIVILEGES ON DATABASE postgres TO your_user;
 GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO your_user;
 ```
 ### Index Creation Failures
 If index creation fails due to resource constraints, the migration will continue. You can create indexes manually later:
 ```sql
 -- Example: Create missing index for 768-dimensional embeddings
 CREATE INDEX idx_archon_crawled_pages_embedding_768 
 ON archon_crawled_pages USING ivfflat (embedding_768 vector_cosine_ops) 
 WITH (lists = 100);
 ```
 ### Migration Verification
 Check that the migration completed successfully:
 ```sql
 -- Verify new columns exist
 SELECT column_name 
 FROM information_schema.columns 
 WHERE table_name = 'archon_crawled_pages' 
 AND column_name IN ('llm_chat_model', 'embedding_model', 'embedding_dimension', 'embedding_384', 'embedding_768');
 -- Verify functions exist
 SELECT routine_name 
 FROM information_schema.routines 
 WHERE routine_name IN ('match_archon_crawled_pages_multi', 'detect_embedding_dimension');
 ```
 ## Support
 If you encounter issues with the migration:
 1. Check the console output for detailed error messages
 2. Verify your database connection and permissions
 3. Ensure you have sufficient disk space for index creation
 4. Create a GitHub issue with the error details if problems persist
 ## Version Compatibility
 - **Archon v2.0+**: Use `upgrade_database.sql`
 - **Earlier versions**: Use `complete_setup.sql` for fresh installations
 This migration is designed to bring any Archon installation up to the latest schema standards while preserving all existing data and functionality.
--- a/migration/backup_database.sql
+++ b/migration/backup_database.sql
@ -0,0 +1,107 @@
 -- ======================================================================
 -- ARCHON PRE-MIGRATION BACKUP SCRIPT
 -- ======================================================================
 -- This script creates backup tables of your existing data before running
 -- the upgrade_to_model_tracking.sql migration.
 -- 
 -- IMPORTANT: Run this BEFORE running the main migration!
 -- ======================================================================
 BEGIN;
 -- Create timestamp for backup tables
 CREATE OR REPLACE FUNCTION get_backup_timestamp()
 RETURNS TEXT AS $$
 BEGIN
    RETURN to_char(now(), 'YYYYMMDD_HH24MISS');
 END;
 $$ LANGUAGE plpgsql;
 -- Get the timestamp for consistent naming
 DO $$
 DECLARE
    backup_suffix TEXT;
 BEGIN
    backup_suffix := get_backup_timestamp();
    -- Backup archon_crawled_pages
    EXECUTE format('CREATE TABLE archon_crawled_pages_backup_%s AS SELECT * FROM archon_crawled_pages', backup_suffix);
    -- Backup archon_code_examples
    EXECUTE format('CREATE TABLE archon_code_examples_backup_%s AS SELECT * FROM archon_code_examples', backup_suffix);
    -- Backup archon_sources
    EXECUTE format('CREATE TABLE archon_sources_backup_%s AS SELECT * FROM archon_sources', backup_suffix);
    RAISE NOTICE '====================================================================';
    RAISE NOTICE '                    BACKUP COMPLETED SUCCESSFULLY';
    RAISE NOTICE '====================================================================';
    RAISE NOTICE 'Created backup tables with suffix: %', backup_suffix;
    RAISE NOTICE '';
    RAISE NOTICE 'Backup tables created:';
    RAISE NOTICE '• archon_crawled_pages_backup_%', backup_suffix;
    RAISE NOTICE '• archon_code_examples_backup_%', backup_suffix;
    RAISE NOTICE '• archon_sources_backup_%', backup_suffix;
    RAISE NOTICE '';
    RAISE NOTICE 'You can now safely run the upgrade_to_model_tracking.sql migration.';
    RAISE NOTICE '';
    RAISE NOTICE 'To restore from backup if needed:';
    RAISE NOTICE 'DROP TABLE archon_crawled_pages;';
    RAISE NOTICE 'ALTER TABLE archon_crawled_pages_backup_% RENAME TO archon_crawled_pages;', backup_suffix;
    RAISE NOTICE '====================================================================';
    -- Get row counts for verification
    DECLARE
        crawled_count INTEGER;
        code_count INTEGER;
        sources_count INTEGER;
    BEGIN
        EXECUTE format('SELECT COUNT(*) FROM archon_crawled_pages_backup_%s', backup_suffix) INTO crawled_count;
        EXECUTE format('SELECT COUNT(*) FROM archon_code_examples_backup_%s', backup_suffix) INTO code_count;
        EXECUTE format('SELECT COUNT(*) FROM archon_sources_backup_%s', backup_suffix) INTO sources_count;
        RAISE NOTICE 'Backup verification:';
        RAISE NOTICE '• Crawled pages backed up: % records', crawled_count;
        RAISE NOTICE '• Code examples backed up: % records', code_count;
        RAISE NOTICE '• Sources backed up: % records', sources_count;
        RAISE NOTICE '====================================================================';
    END;
 END $$;
 -- Clean up the temporary function
 DROP FUNCTION get_backup_timestamp();
 COMMIT;
 -- ======================================================================
 -- BACKUP COMPLETE - SUPABASE-FRIENDLY STATUS REPORT
 -- ======================================================================
 -- This final SELECT statement shows backup status in Supabase SQL Editor
 WITH backup_info AS (
    SELECT 
        to_char(now(), 'YYYYMMDD_HH24MISS') as backup_suffix,
        (SELECT COUNT(*) FROM archon_crawled_pages) as crawled_count,
        (SELECT COUNT(*) FROM archon_code_examples) as code_count,
        (SELECT COUNT(*) FROM archon_sources) as sources_count
 )
 SELECT 
    '🎉 ARCHON DATABASE BACKUP COMPLETED! 🎉' AS status,
    'Your data is now safely backed up' AS message,
    ARRAY[
        'archon_crawled_pages_backup_' || backup_suffix,
        'archon_code_examples_backup_' || backup_suffix,
        'archon_sources_backup_' || backup_suffix
    ] AS backup_tables_created,
    json_build_object(
        'crawled_pages', crawled_count,
        'code_examples', code_count,
        'sources', sources_count
    ) AS records_backed_up,
    ARRAY[
        '1. Run upgrade_database.sql to upgrade your installation',
        '2. Run validate_migration.sql to verify the upgrade',
        '3. Backup tables will be kept for safety'
    ] AS next_steps,
    'DROP TABLE archon_crawled_pages; ALTER TABLE archon_crawled_pages_backup_' || backup_suffix || ' RENAME TO archon_crawled_pages;' AS restore_command_example
 FROM backup_info;
--- a/migration/complete_setup.sql
+++ b/migration/complete_setup.sql
@ -203,7 +203,17 @@ CREATE TABLE IF NOT EXISTS archon_crawled_pages (
    content TEXT NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    source_id TEXT NOT NULL,
-    embedding VECTOR(1536),  -- OpenAI embeddings are 1536 dimensions
+    -- Multi-dimensional embedding support for different models
    embedding_384 VECTOR(384),   -- Small embedding models
    embedding_768 VECTOR(768),   -- Google/Ollama models  
    embedding_1024 VECTOR(1024), -- Ollama large models
    embedding_1536 VECTOR(1536), -- OpenAI standard models
    embedding_3072 VECTOR(3072), -- OpenAI large models
    -- Model tracking columns
    llm_chat_model TEXT,                -- LLM model used for processing (e.g., 'gpt-4', 'llama3:8b')
    embedding_model TEXT,                -- Embedding model used (e.g., 'text-embedding-3-large', 'all-MiniLM-L6-v2')
    embedding_dimension INTEGER,         -- Dimension of the embedding used (384, 768, 1024, 1536, 3072)
    -- Hybrid search support
    content_search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL,
@ -214,12 +224,24 @@ CREATE TABLE IF NOT EXISTS archon_crawled_pages (
    FOREIGN KEY (source_id) REFERENCES archon_sources(source_id)
 );
-- Create indexes for better performance
+-- Multi-dimensional indexes
-CREATE INDEX ON archon_crawled_pages USING ivfflat (embedding vector_cosine_ops);
+CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_384 ON archon_crawled_pages USING ivfflat (embedding_384 vector_cosine_ops) WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_768 ON archon_crawled_pages USING ivfflat (embedding_768 vector_cosine_ops) WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_1024 ON archon_crawled_pages USING ivfflat (embedding_1024 vector_cosine_ops) WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_1536 ON archon_crawled_pages USING ivfflat (embedding_1536 vector_cosine_ops) WITH (lists = 100);
 -- Note: 3072-dimensional embeddings cannot have vector indexes due to PostgreSQL vector extension 2000 dimension limit
 -- The embedding_3072 column exists but cannot be indexed with current pgvector version
 -- Other indexes for archon_crawled_pages
 CREATE INDEX idx_archon_crawled_pages_metadata ON archon_crawled_pages USING GIN (metadata);
 CREATE INDEX idx_archon_crawled_pages_source_id ON archon_crawled_pages (source_id);
 -- Hybrid search indexes
 CREATE INDEX idx_archon_crawled_pages_content_search ON archon_crawled_pages USING GIN (content_search_vector);
 CREATE INDEX idx_archon_crawled_pages_content_trgm ON archon_crawled_pages USING GIN (content gin_trgm_ops);
 -- Multi-dimensional embedding indexes
 CREATE INDEX idx_archon_crawled_pages_embedding_model ON archon_crawled_pages (embedding_model);
 CREATE INDEX idx_archon_crawled_pages_embedding_dimension ON archon_crawled_pages (embedding_dimension);
 CREATE INDEX idx_archon_crawled_pages_llm_chat_model ON archon_crawled_pages (llm_chat_model);
 -- Create the code_examples table
 CREATE TABLE IF NOT EXISTS archon_code_examples (
@ -230,7 +252,17 @@ CREATE TABLE IF NOT EXISTS archon_code_examples (
    summary TEXT NOT NULL,  -- Summary of the code example
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    source_id TEXT NOT NULL,
-    embedding VECTOR(1536),  -- OpenAI embeddings are 1536 dimensions
+    -- Multi-dimensional embedding support for different models
    embedding_384 VECTOR(384),   -- Small embedding models
    embedding_768 VECTOR(768),   -- Google/Ollama models  
    embedding_1024 VECTOR(1024), -- Ollama large models
    embedding_1536 VECTOR(1536), -- OpenAI standard models
    embedding_3072 VECTOR(3072), -- OpenAI large models
    -- Model tracking columns
    llm_chat_model TEXT,                -- LLM model used for processing (e.g., 'gpt-4', 'llama3:8b')
    embedding_model TEXT,                -- Embedding model used (e.g., 'text-embedding-3-large', 'all-MiniLM-L6-v2')
    embedding_dimension INTEGER,         -- Dimension of the embedding used (384, 768, 1024, 1536, 3072)
    -- Hybrid search support
    content_search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', content || ' ' || COALESCE(summary, ''))) STORED,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()) NOT NULL,
@ -241,19 +273,108 @@ CREATE TABLE IF NOT EXISTS archon_code_examples (
    FOREIGN KEY (source_id) REFERENCES archon_sources(source_id)
 );
-- Create indexes for better performance
+-- Multi-dimensional indexes
-CREATE INDEX ON archon_code_examples USING ivfflat (embedding vector_cosine_ops);
+CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_384 ON archon_code_examples USING ivfflat (embedding_384 vector_cosine_ops) WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_768 ON archon_code_examples USING ivfflat (embedding_768 vector_cosine_ops) WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_1024 ON archon_code_examples USING ivfflat (embedding_1024 vector_cosine_ops) WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_1536 ON archon_code_examples USING ivfflat (embedding_1536 vector_cosine_ops) WITH (lists = 100);
 -- Note: 3072-dimensional embeddings cannot have vector indexes due to PostgreSQL vector extension 2000 dimension limit
 -- The embedding_3072 column exists but cannot be indexed with current pgvector version
 -- Other indexes for archon_code_examples
 CREATE INDEX idx_archon_code_examples_metadata ON archon_code_examples USING GIN (metadata);
 CREATE INDEX idx_archon_code_examples_source_id ON archon_code_examples (source_id);
 -- Hybrid search indexes
 CREATE INDEX idx_archon_code_examples_content_search ON archon_code_examples USING GIN (content_search_vector);
 CREATE INDEX idx_archon_code_examples_content_trgm ON archon_code_examples USING GIN (content gin_trgm_ops);
 CREATE INDEX idx_archon_code_examples_summary_trgm ON archon_code_examples USING GIN (summary gin_trgm_ops);
 -- Multi-dimensional embedding indexes
 CREATE INDEX idx_archon_code_examples_embedding_model ON archon_code_examples (embedding_model);
 CREATE INDEX idx_archon_code_examples_embedding_dimension ON archon_code_examples (embedding_dimension);
 CREATE INDEX idx_archon_code_examples_llm_chat_model ON archon_code_examples (llm_chat_model);
 -- =====================================================
 -- SECTION 4.5: MULTI-DIMENSIONAL EMBEDDING HELPER FUNCTIONS
 -- =====================================================
 -- Function to detect embedding dimension from vector
 CREATE OR REPLACE FUNCTION detect_embedding_dimension(embedding_vector vector)
 RETURNS INTEGER AS $$
 BEGIN
    RETURN vector_dims(embedding_vector);
 END;
 $$ LANGUAGE plpgsql IMMUTABLE;
 -- Function to get the appropriate column name for a dimension
 CREATE OR REPLACE FUNCTION get_embedding_column_name(dimension INTEGER)
 RETURNS TEXT AS $$
 BEGIN
    CASE dimension
        WHEN 384 THEN RETURN 'embedding_384';
        WHEN 768 THEN RETURN 'embedding_768';
        WHEN 1024 THEN RETURN 'embedding_1024';
        WHEN 1536 THEN RETURN 'embedding_1536';
        WHEN 3072 THEN RETURN 'embedding_3072';
        ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %. Supported dimensions are: 384, 768, 1024, 1536, 3072', dimension;
    END CASE;
 END;
 $$ LANGUAGE plpgsql IMMUTABLE;
 -- =====================================================
 -- SECTION 5: SEARCH FUNCTIONS
 -- =====================================================
-- Create a function to search for documentation chunks
+-- Create multi-dimensional function to search for documentation chunks
 CREATE OR REPLACE FUNCTION match_archon_crawled_pages_multi (
  query_embedding VECTOR,
  embedding_dimension INTEGER,
  match_count INT DEFAULT 10,
  filter JSONB DEFAULT '{}'::jsonb,
  source_filter TEXT DEFAULT NULL
 ) RETURNS TABLE (
  id BIGINT,
  url VARCHAR,
  chunk_number INTEGER,
  content TEXT,
  metadata JSONB,
  source_id TEXT,
  similarity FLOAT
 )
 LANGUAGE plpgsql
 AS $$
 #variable_conflict use_column
 DECLARE
  sql_query TEXT;
  embedding_column TEXT;
 BEGIN
  -- Determine which embedding column to use based on dimension
  CASE embedding_dimension
    WHEN 384 THEN embedding_column := 'embedding_384';
    WHEN 768 THEN embedding_column := 'embedding_768';
    WHEN 1024 THEN embedding_column := 'embedding_1024';
    WHEN 1536 THEN embedding_column := 'embedding_1536';
    WHEN 3072 THEN embedding_column := 'embedding_3072';
    ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension;
  END CASE;
  -- Build dynamic query
  sql_query := format('
    SELECT id, url, chunk_number, content, metadata, source_id,
           1 - (%I <=> $1) AS similarity
    FROM archon_crawled_pages
    WHERE (%I IS NOT NULL)
      AND metadata @> $3
      AND ($4 IS NULL OR source_id = $4)
    ORDER BY %I <=> $1
    LIMIT $2',
    embedding_column, embedding_column, embedding_column);
  -- Execute dynamic query
  RETURN QUERY EXECUTE sql_query USING query_embedding, match_count, filter, source_filter;
 END;
 $$;
 -- Legacy compatibility function (defaults to 1536D)
 CREATE OR REPLACE FUNCTION match_archon_crawled_pages (
  query_embedding VECTOR(1536),
  match_count INT DEFAULT 10,
@ -270,26 +391,63 @@ CREATE OR REPLACE FUNCTION match_archon_crawled_pages (
 )
 LANGUAGE plpgsql
 AS $$
 #variable_conflict use_column
 BEGIN
-  RETURN QUERY
+  RETURN QUERY SELECT * FROM match_archon_crawled_pages_multi(query_embedding, 1536, match_count, filter, source_filter);
  SELECT
    id,
    url,
    chunk_number,
    content,
    metadata,
    source_id,
    1 - (archon_crawled_pages.embedding <=> query_embedding) AS similarity
  FROM archon_crawled_pages
  WHERE metadata @> filter
    AND (source_filter IS NULL OR source_id = source_filter)
  ORDER BY archon_crawled_pages.embedding <=> query_embedding
  LIMIT match_count;
 END;
 $$;
-- Create a function to search for code examples
+-- Create multi-dimensional function to search for code examples
 CREATE OR REPLACE FUNCTION match_archon_code_examples_multi (
  query_embedding VECTOR,
  embedding_dimension INTEGER,
  match_count INT DEFAULT 10,
  filter JSONB DEFAULT '{}'::jsonb,
  source_filter TEXT DEFAULT NULL
 ) RETURNS TABLE (
  id BIGINT,
  url VARCHAR,
  chunk_number INTEGER,
  content TEXT,
  summary TEXT,
  metadata JSONB,
  source_id TEXT,
  similarity FLOAT
 )
 LANGUAGE plpgsql
 AS $$
 #variable_conflict use_column
 DECLARE
  sql_query TEXT;
  embedding_column TEXT;
 BEGIN
  -- Determine which embedding column to use based on dimension
  CASE embedding_dimension
    WHEN 384 THEN embedding_column := 'embedding_384';
    WHEN 768 THEN embedding_column := 'embedding_768';
    WHEN 1024 THEN embedding_column := 'embedding_1024';
    WHEN 1536 THEN embedding_column := 'embedding_1536';
    WHEN 3072 THEN embedding_column := 'embedding_3072';
    ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension;
  END CASE;
  -- Build dynamic query
  sql_query := format('
    SELECT id, url, chunk_number, content, summary, metadata, source_id,
           1 - (%I <=> $1) AS similarity
    FROM archon_code_examples
    WHERE (%I IS NOT NULL)
      AND metadata @> $3
      AND ($4 IS NULL OR source_id = $4)
    ORDER BY %I <=> $1
    LIMIT $2',
    embedding_column, embedding_column, embedding_column);
  -- Execute dynamic query
  RETURN QUERY EXECUTE sql_query USING query_embedding, match_count, filter, source_filter;
 END;
 $$;
 -- Legacy compatibility function (defaults to 1536D)
 CREATE OR REPLACE FUNCTION match_archon_code_examples (
  query_embedding VECTOR(1536),
  match_count INT DEFAULT 10,
@ -307,23 +465,8 @@ CREATE OR REPLACE FUNCTION match_archon_code_examples (
 )
 LANGUAGE plpgsql
 AS $$
 #variable_conflict use_column
 BEGIN
-  RETURN QUERY
+  RETURN QUERY SELECT * FROM match_archon_code_examples_multi(query_embedding, 1536, match_count, filter, source_filter);
  SELECT
    id,
    url,
    chunk_number,
    content,
    summary,
    metadata,
    source_id,
    1 - (archon_code_examples.embedding <=> query_embedding) AS similarity
  FROM archon_code_examples
  WHERE metadata @> filter
    AND (source_filter IS NULL OR source_id = source_filter)
  ORDER BY archon_code_examples.embedding <=> query_embedding
  LIMIT match_count;
 END;
 $$;
--- a/migration/upgrade_database.sql
+++ b/migration/upgrade_database.sql
@ -0,0 +1,518 @@
 -- ======================================================================
 -- UPGRADE TO MODEL TRACKING AND MULTI-DIMENSIONAL EMBEDDINGS
 -- ======================================================================
 -- This migration upgrades existing Archon installations to support:
 -- 1. Multi-dimensional embedding columns (768, 1024, 1536, 3072)  
 -- 2. Model tracking fields (llm_chat_model, embedding_model, embedding_dimension)
 -- 3. 384-dimension support for smaller embedding models
 -- 4. Enhanced search functions for multi-dimensional support
 -- ======================================================================
 -- 
 -- IMPORTANT: Run this ONLY if you have an existing Archon installation
 -- that was created BEFORE the multi-dimensional embedding support.
 -- 
 -- This script is SAFE to run multiple times - it uses IF NOT EXISTS checks.
 -- ======================================================================
 BEGIN;
 -- ======================================================================
 -- SECTION 1: ADD MULTI-DIMENSIONAL EMBEDDING COLUMNS
 -- ======================================================================
 -- Add multi-dimensional embedding columns to archon_crawled_pages
 ALTER TABLE archon_crawled_pages 
 ADD COLUMN IF NOT EXISTS embedding_384 VECTOR(384),   -- Small embedding models
 ADD COLUMN IF NOT EXISTS embedding_768 VECTOR(768),   -- Google/Ollama models  
 ADD COLUMN IF NOT EXISTS embedding_1024 VECTOR(1024), -- Ollama large models
 ADD COLUMN IF NOT EXISTS embedding_1536 VECTOR(1536), -- OpenAI standard models
 ADD COLUMN IF NOT EXISTS embedding_3072 VECTOR(3072); -- OpenAI large models
 -- Add multi-dimensional embedding columns to archon_code_examples  
 ALTER TABLE archon_code_examples
 ADD COLUMN IF NOT EXISTS embedding_384 VECTOR(384),   -- Small embedding models
 ADD COLUMN IF NOT EXISTS embedding_768 VECTOR(768),   -- Google/Ollama models  
 ADD COLUMN IF NOT EXISTS embedding_1024 VECTOR(1024), -- Ollama large models
 ADD COLUMN IF NOT EXISTS embedding_1536 VECTOR(1536), -- OpenAI standard models
 ADD COLUMN IF NOT EXISTS embedding_3072 VECTOR(3072); -- OpenAI large models
 -- ======================================================================
 -- SECTION 2: ADD MODEL TRACKING COLUMNS
 -- ======================================================================
 -- Add model tracking columns to archon_crawled_pages
 ALTER TABLE archon_crawled_pages 
 ADD COLUMN IF NOT EXISTS llm_chat_model TEXT,         -- LLM model used for processing (e.g., 'gpt-4', 'llama3:8b')
 ADD COLUMN IF NOT EXISTS embedding_model TEXT,        -- Embedding model used (e.g., 'text-embedding-3-large', 'all-MiniLM-L6-v2')
 ADD COLUMN IF NOT EXISTS embedding_dimension INTEGER; -- Dimension of the embedding used (384, 768, 1024, 1536, 3072)
 -- Add model tracking columns to archon_code_examples
 ALTER TABLE archon_code_examples
 ADD COLUMN IF NOT EXISTS llm_chat_model TEXT,         -- LLM model used for processing (e.g., 'gpt-4', 'llama3:8b')
 ADD COLUMN IF NOT EXISTS embedding_model TEXT,        -- Embedding model used (e.g., 'text-embedding-3-large', 'all-MiniLM-L6-v2')
 ADD COLUMN IF NOT EXISTS embedding_dimension INTEGER; -- Dimension of the embedding used (384, 768, 1024, 1536, 3072)
 -- ======================================================================
 -- SECTION 3: MIGRATE EXISTING EMBEDDING DATA
 -- ======================================================================
 -- Check if there's existing embedding data in old 'embedding' column
 DO $$
 DECLARE
    crawled_pages_count INTEGER;
    code_examples_count INTEGER;
    dimension_detected INTEGER;
 BEGIN
    -- Check if old embedding column exists and has data
    SELECT COUNT(*) INTO crawled_pages_count 
    FROM information_schema.columns 
    WHERE table_name = 'archon_crawled_pages' 
    AND column_name = 'embedding';
    SELECT COUNT(*) INTO code_examples_count 
    FROM information_schema.columns 
    WHERE table_name = 'archon_code_examples' 
    AND column_name = 'embedding';
    -- If old embedding columns exist, migrate the data
    IF crawled_pages_count > 0 THEN
        RAISE NOTICE 'Found existing embedding column in archon_crawled_pages - migrating data...';
        -- Detect dimension from first non-null embedding
        SELECT vector_dims(embedding) INTO dimension_detected
        FROM archon_crawled_pages 
        WHERE embedding IS NOT NULL 
        LIMIT 1;
        IF dimension_detected IS NOT NULL THEN
            RAISE NOTICE 'Detected embedding dimension: %', dimension_detected;
            -- Migrate based on detected dimension
            CASE dimension_detected
                WHEN 384 THEN 
                    UPDATE archon_crawled_pages 
                    SET embedding_384 = embedding,
                        embedding_dimension = 384,
                        embedding_model = COALESCE(embedding_model, 'legacy-384d-model')
                    WHERE embedding IS NOT NULL AND embedding_384 IS NULL;
                WHEN 768 THEN 
                    UPDATE archon_crawled_pages 
                    SET embedding_768 = embedding,
                        embedding_dimension = 768,
                        embedding_model = COALESCE(embedding_model, 'legacy-768d-model')
                    WHERE embedding IS NOT NULL AND embedding_768 IS NULL;
                WHEN 1024 THEN 
                    UPDATE archon_crawled_pages 
                    SET embedding_1024 = embedding,
                        embedding_dimension = 1024,
                        embedding_model = COALESCE(embedding_model, 'legacy-1024d-model')
                    WHERE embedding IS NOT NULL AND embedding_1024 IS NULL;
                WHEN 1536 THEN 
                    UPDATE archon_crawled_pages 
                    SET embedding_1536 = embedding,
                        embedding_dimension = 1536,
                        embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')
                    WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
                WHEN 3072 THEN 
                    UPDATE archon_crawled_pages 
                    SET embedding_3072 = embedding,
                        embedding_dimension = 3072,
                        embedding_model = COALESCE(embedding_model, 'text-embedding-3-large')
                    WHERE embedding IS NOT NULL AND embedding_3072 IS NULL;
                ELSE 
                    RAISE NOTICE 'Unsupported embedding dimension detected: %. Skipping migration.', dimension_detected;
            END CASE;
            RAISE NOTICE 'Migrated existing embeddings to dimension-specific columns';
        END IF;
    END IF;
    -- Migrate code examples if they exist
    IF code_examples_count > 0 THEN
        RAISE NOTICE 'Found existing embedding column in archon_code_examples - migrating data...';
        -- Detect dimension from first non-null embedding
        SELECT vector_dims(embedding) INTO dimension_detected
        FROM archon_code_examples 
        WHERE embedding IS NOT NULL 
        LIMIT 1;
        IF dimension_detected IS NOT NULL THEN
            RAISE NOTICE 'Detected code examples embedding dimension: %', dimension_detected;
            -- Migrate based on detected dimension
            CASE dimension_detected
                WHEN 384 THEN 
                    UPDATE archon_code_examples 
                    SET embedding_384 = embedding,
                        embedding_dimension = 384,
                        embedding_model = COALESCE(embedding_model, 'legacy-384d-model')
                    WHERE embedding IS NOT NULL AND embedding_384 IS NULL;
                WHEN 768 THEN 
                    UPDATE archon_code_examples 
                    SET embedding_768 = embedding,
                        embedding_dimension = 768,
                        embedding_model = COALESCE(embedding_model, 'legacy-768d-model')
                    WHERE embedding IS NOT NULL AND embedding_768 IS NULL;
                WHEN 1024 THEN 
                    UPDATE archon_code_examples 
                    SET embedding_1024 = embedding,
                        embedding_dimension = 1024,
                        embedding_model = COALESCE(embedding_model, 'legacy-1024d-model')
                    WHERE embedding IS NOT NULL AND embedding_1024 IS NULL;
                WHEN 1536 THEN 
                    UPDATE archon_code_examples 
                    SET embedding_1536 = embedding,
                        embedding_dimension = 1536,
                        embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')
                    WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
                WHEN 3072 THEN 
                    UPDATE archon_code_examples 
                    SET embedding_3072 = embedding,
                        embedding_dimension = 3072,
                        embedding_model = COALESCE(embedding_model, 'text-embedding-3-large')
                    WHERE embedding IS NOT NULL AND embedding_3072 IS NULL;
                ELSE 
                    RAISE NOTICE 'Unsupported code examples embedding dimension: %. Skipping migration.', dimension_detected;
            END CASE;
            RAISE NOTICE 'Migrated existing code example embeddings to dimension-specific columns';
        END IF;
    END IF;
 END $$;
 -- ======================================================================
 -- SECTION 4: CLEANUP LEGACY EMBEDDING COLUMNS
 -- ======================================================================
 -- Remove old embedding columns after successful migration
 DO $$
 DECLARE
    crawled_pages_count INTEGER;
    code_examples_count INTEGER;
 BEGIN
    -- Check if old embedding column exists in crawled pages
    SELECT COUNT(*) INTO crawled_pages_count 
    FROM information_schema.columns 
    WHERE table_name = 'archon_crawled_pages' 
    AND column_name = 'embedding';
    -- Check if old embedding column exists in code examples
    SELECT COUNT(*) INTO code_examples_count 
    FROM information_schema.columns 
    WHERE table_name = 'archon_code_examples' 
    AND column_name = 'embedding';
    -- Drop old embedding column from crawled pages if it exists
    IF crawled_pages_count > 0 THEN
        RAISE NOTICE 'Dropping legacy embedding column from archon_crawled_pages...';
        ALTER TABLE archon_crawled_pages DROP COLUMN embedding;
        RAISE NOTICE 'Successfully removed legacy embedding column from archon_crawled_pages';
    END IF;
    -- Drop old embedding column from code examples if it exists
    IF code_examples_count > 0 THEN
        RAISE NOTICE 'Dropping legacy embedding column from archon_code_examples...';
        ALTER TABLE archon_code_examples DROP COLUMN embedding;
        RAISE NOTICE 'Successfully removed legacy embedding column from archon_code_examples';
    END IF;
    -- Drop any indexes on the old embedding column if they exist
    DROP INDEX IF EXISTS idx_archon_crawled_pages_embedding;
    DROP INDEX IF EXISTS idx_archon_code_examples_embedding;
    RAISE NOTICE 'Legacy column cleanup completed';
 END $$;
 -- ======================================================================
 -- SECTION 5: CREATE OPTIMIZED INDEXES
 -- ======================================================================
 -- Create indexes for archon_crawled_pages (multi-dimensional support)
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_384 
 ON archon_crawled_pages USING ivfflat (embedding_384 vector_cosine_ops) 
 WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_768 
 ON archon_crawled_pages USING ivfflat (embedding_768 vector_cosine_ops) 
 WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_1024 
 ON archon_crawled_pages USING ivfflat (embedding_1024 vector_cosine_ops) 
 WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_1536 
 ON archon_crawled_pages USING ivfflat (embedding_1536 vector_cosine_ops) 
 WITH (lists = 100);
 -- Note: 3072-dimensional embeddings cannot have vector indexes due to PostgreSQL vector extension 2000 dimension limit
 -- The embedding_3072 column exists but cannot be indexed with current pgvector version
 -- Brute force search will be used for 3072-dimensional vectors
 -- CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_3072 
 -- ON archon_crawled_pages USING hnsw (embedding_3072 vector_cosine_ops);
 -- Create indexes for archon_code_examples (multi-dimensional support)
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_384 
 ON archon_code_examples USING ivfflat (embedding_384 vector_cosine_ops) 
 WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_768 
 ON archon_code_examples USING ivfflat (embedding_768 vector_cosine_ops) 
 WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_1024 
 ON archon_code_examples USING ivfflat (embedding_1024 vector_cosine_ops) 
 WITH (lists = 100);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_1536 
 ON archon_code_examples USING ivfflat (embedding_1536 vector_cosine_ops) 
 WITH (lists = 100);
 -- Note: 3072-dimensional embeddings cannot have vector indexes due to PostgreSQL vector extension 2000 dimension limit
 -- The embedding_3072 column exists but cannot be indexed with current pgvector version
 -- Brute force search will be used for 3072-dimensional vectors
 -- CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_3072 
 -- ON archon_code_examples USING hnsw (embedding_3072 vector_cosine_ops);
 -- Create indexes for model tracking columns
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_model 
 ON archon_crawled_pages (embedding_model);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_dimension 
 ON archon_crawled_pages (embedding_dimension);
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_llm_chat_model 
 ON archon_crawled_pages (llm_chat_model);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_model 
 ON archon_code_examples (embedding_model);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_dimension 
 ON archon_code_examples (embedding_dimension);
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_llm_chat_model 
 ON archon_code_examples (llm_chat_model);
 -- ======================================================================
 -- SECTION 6: HELPER FUNCTIONS FOR MULTI-DIMENSIONAL SUPPORT
 -- ======================================================================
 -- Function to detect embedding dimension from vector
 CREATE OR REPLACE FUNCTION detect_embedding_dimension(embedding_vector vector)
 RETURNS INTEGER AS $$
 BEGIN
    RETURN vector_dims(embedding_vector);
 END;
 $$ LANGUAGE plpgsql IMMUTABLE;
 -- Function to get the appropriate column name for a dimension
 CREATE OR REPLACE FUNCTION get_embedding_column_name(dimension INTEGER)
 RETURNS TEXT AS $$
 BEGIN
    CASE dimension
        WHEN 384 THEN RETURN 'embedding_384';
        WHEN 768 THEN RETURN 'embedding_768';
        WHEN 1024 THEN RETURN 'embedding_1024';
        WHEN 1536 THEN RETURN 'embedding_1536';
        WHEN 3072 THEN RETURN 'embedding_3072';
        ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %. Supported dimensions are: 384, 768, 1024, 1536, 3072', dimension;
    END CASE;
 END;
 $$ LANGUAGE plpgsql IMMUTABLE;
 -- ======================================================================
 -- SECTION 7: ENHANCED SEARCH FUNCTIONS
 -- ======================================================================
 -- Create multi-dimensional function to search for documentation chunks
 CREATE OR REPLACE FUNCTION match_archon_crawled_pages_multi (
  query_embedding VECTOR,
  embedding_dimension INTEGER,
  match_count INT DEFAULT 10,
  filter JSONB DEFAULT '{}'::jsonb,
  source_filter TEXT DEFAULT NULL
 ) RETURNS TABLE (
  id BIGINT,
  url VARCHAR,
  chunk_number INTEGER,
  content TEXT,
  metadata JSONB,
  source_id TEXT,
  similarity FLOAT
 )
 LANGUAGE plpgsql
 AS $$
 #variable_conflict use_column
 DECLARE
  sql_query TEXT;
  embedding_column TEXT;
 BEGIN
  -- Determine which embedding column to use based on dimension
  CASE embedding_dimension
    WHEN 384 THEN embedding_column := 'embedding_384';
    WHEN 768 THEN embedding_column := 'embedding_768';
    WHEN 1024 THEN embedding_column := 'embedding_1024';
    WHEN 1536 THEN embedding_column := 'embedding_1536';
    WHEN 3072 THEN embedding_column := 'embedding_3072';
    ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension;
  END CASE;
  -- Build dynamic query
  sql_query := format('
    SELECT id, url, chunk_number, content, metadata, source_id,
           1 - (%I <=> $1) AS similarity
    FROM archon_crawled_pages
    WHERE (%I IS NOT NULL)
      AND metadata @> $3
      AND ($4 IS NULL OR source_id = $4)
    ORDER BY %I <=> $1
    LIMIT $2',
    embedding_column, embedding_column, embedding_column);
  -- Execute dynamic query
  RETURN QUERY EXECUTE sql_query USING query_embedding, match_count, filter, source_filter;
 END;
 $$;
 -- Create multi-dimensional function to search for code examples
 CREATE OR REPLACE FUNCTION match_archon_code_examples_multi (
  query_embedding VECTOR,
  embedding_dimension INTEGER,
  match_count INT DEFAULT 10,
  filter JSONB DEFAULT '{}'::jsonb,
  source_filter TEXT DEFAULT NULL
 ) RETURNS TABLE (
  id BIGINT,
  url VARCHAR,
  chunk_number INTEGER,
  content TEXT,
  summary TEXT,
  metadata JSONB,
  source_id TEXT,
  similarity FLOAT
 )
 LANGUAGE plpgsql
 AS $$
 #variable_conflict use_column
 DECLARE
  sql_query TEXT;
  embedding_column TEXT;
 BEGIN
  -- Determine which embedding column to use based on dimension
  CASE embedding_dimension
    WHEN 384 THEN embedding_column := 'embedding_384';
    WHEN 768 THEN embedding_column := 'embedding_768';
    WHEN 1024 THEN embedding_column := 'embedding_1024';
    WHEN 1536 THEN embedding_column := 'embedding_1536';
    WHEN 3072 THEN embedding_column := 'embedding_3072';
    ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension;
  END CASE;
  -- Build dynamic query
  sql_query := format('
    SELECT id, url, chunk_number, content, summary, metadata, source_id,
           1 - (%I <=> $1) AS similarity
    FROM archon_code_examples
    WHERE (%I IS NOT NULL)
      AND metadata @> $3
      AND ($4 IS NULL OR source_id = $4)
    ORDER BY %I <=> $1
    LIMIT $2',
    embedding_column, embedding_column, embedding_column);
  -- Execute dynamic query
  RETURN QUERY EXECUTE sql_query USING query_embedding, match_count, filter, source_filter;
 END;
 $$;
 -- ======================================================================
 -- SECTION 8: LEGACY COMPATIBILITY FUNCTIONS
 -- ======================================================================
 -- Legacy compatibility function for crawled pages (defaults to 1536D)
 CREATE OR REPLACE FUNCTION match_archon_crawled_pages (
  query_embedding VECTOR(1536),
  match_count INT DEFAULT 10,
  filter JSONB DEFAULT '{}'::jsonb,
  source_filter TEXT DEFAULT NULL
 ) RETURNS TABLE (
  id BIGINT,
  url VARCHAR,
  chunk_number INTEGER,
  content TEXT,
  metadata JSONB,
  source_id TEXT,
  similarity FLOAT
 )
 LANGUAGE plpgsql
 AS $$
 BEGIN
  RETURN QUERY SELECT * FROM match_archon_crawled_pages_multi(query_embedding, 1536, match_count, filter, source_filter);
 END;
 $$;
 -- Legacy compatibility function for code examples (defaults to 1536D)
 CREATE OR REPLACE FUNCTION match_archon_code_examples (
  query_embedding VECTOR(1536),
  match_count INT DEFAULT 10,
  filter JSONB DEFAULT '{}'::jsonb,
  source_filter TEXT DEFAULT NULL
 ) RETURNS TABLE (
  id BIGINT,
  url VARCHAR,
  chunk_number INTEGER,
  content TEXT,
  summary TEXT,
  metadata JSONB,
  source_id TEXT,
  similarity FLOAT
 )
 LANGUAGE plpgsql
 AS $$
 BEGIN
  RETURN QUERY SELECT * FROM match_archon_code_examples_multi(query_embedding, 1536, match_count, filter, source_filter);
 END;
 $$;
 COMMIT;
 -- ======================================================================
 -- MIGRATION COMPLETE - SUPABASE-FRIENDLY STATUS REPORT
 -- ======================================================================
 -- This final SELECT statement consolidates all status information for
 -- display in Supabase SQL Editor (users only see the last query result)
 SELECT 
    '🎉 ARCHON MODEL TRACKING UPGRADE COMPLETED! 🎉' AS status,
    'Successfully upgraded your Archon installation' AS message,
    ARRAY[
        '✅ Multi-dimensional embedding support (384, 768, 1024, 1536, 3072)',
        '✅ Model tracking fields (llm_chat_model, embedding_model, embedding_dimension)',
        '✅ Optimized indexes for improved search performance',
        '✅ Enhanced search functions with dimension-aware querying',
        '✅ Legacy compatibility maintained for existing code',
        '✅ Existing embedding data migrated (if any was found)',
        '✅ Support for 3072-dimensional vectors (using brute force search)'
    ] AS features_added,
    ARRAY[
        '• Multiple embedding providers (OpenAI, Ollama, Google, etc.)',
        '• Automatic model detection and tracking',
        '• Improved search accuracy with dimension-specific indexing',
        '• Full audit trail of which models processed your data'
    ] AS capabilities_enabled,
    ARRAY[
        '1. Restart your Archon services: docker compose restart',
        '2. New crawls will automatically use the enhanced features',
        '3. Check the Settings page to configure your preferred models',
        '4. Run validate_migration.sql to verify everything works'
    ] AS next_steps;
--- a/migration/validate_migration.sql
+++ b/migration/validate_migration.sql
@ -0,0 +1,287 @@
 -- ======================================================================
 -- ARCHON MIGRATION VALIDATION SCRIPT
 -- ======================================================================
 -- This script validates that the upgrade_to_model_tracking.sql migration
 -- completed successfully and all features are working.
 -- ======================================================================
 DO $$
 DECLARE
    crawled_pages_columns INTEGER := 0;
    code_examples_columns INTEGER := 0;
    crawled_pages_indexes INTEGER := 0;
    code_examples_indexes INTEGER := 0;
    functions_count INTEGER := 0;
    migration_success BOOLEAN := TRUE;
    error_messages TEXT := '';
 BEGIN
    RAISE NOTICE '====================================================================';
    RAISE NOTICE '              VALIDATING ARCHON MIGRATION RESULTS';
    RAISE NOTICE '====================================================================';
    -- Check if required columns exist in archon_crawled_pages
    SELECT COUNT(*) INTO crawled_pages_columns
    FROM information_schema.columns 
    WHERE table_name = 'archon_crawled_pages' 
    AND column_name IN (
        'embedding_384', 'embedding_768', 'embedding_1024', 'embedding_1536', 'embedding_3072',
        'llm_chat_model', 'embedding_model', 'embedding_dimension'
    );
    -- Check if required columns exist in archon_code_examples
    SELECT COUNT(*) INTO code_examples_columns
    FROM information_schema.columns 
    WHERE table_name = 'archon_code_examples' 
    AND column_name IN (
        'embedding_384', 'embedding_768', 'embedding_1024', 'embedding_1536', 'embedding_3072',
        'llm_chat_model', 'embedding_model', 'embedding_dimension'
    );
    -- Check if indexes were created for archon_crawled_pages
    SELECT COUNT(*) INTO crawled_pages_indexes
    FROM pg_indexes 
    WHERE tablename = 'archon_crawled_pages' 
    AND indexname IN (
        'idx_archon_crawled_pages_embedding_384',
        'idx_archon_crawled_pages_embedding_768',
        'idx_archon_crawled_pages_embedding_1024',
        'idx_archon_crawled_pages_embedding_1536',
        'idx_archon_crawled_pages_embedding_model',
        'idx_archon_crawled_pages_embedding_dimension',
        'idx_archon_crawled_pages_llm_chat_model'
    );
    -- Check if indexes were created for archon_code_examples
    SELECT COUNT(*) INTO code_examples_indexes
    FROM pg_indexes 
    WHERE tablename = 'archon_code_examples' 
    AND indexname IN (
        'idx_archon_code_examples_embedding_384',
        'idx_archon_code_examples_embedding_768', 
        'idx_archon_code_examples_embedding_1024',
        'idx_archon_code_examples_embedding_1536',
        'idx_archon_code_examples_embedding_model',
        'idx_archon_code_examples_embedding_dimension',
        'idx_archon_code_examples_llm_chat_model'
    );
    -- Check if required functions exist
    SELECT COUNT(*) INTO functions_count
    FROM information_schema.routines 
    WHERE routine_name IN (
        'match_archon_crawled_pages_multi',
        'match_archon_code_examples_multi',
        'detect_embedding_dimension',
        'get_embedding_column_name'
    );
    -- Validate results
    RAISE NOTICE 'COLUMN VALIDATION:';
    IF crawled_pages_columns = 8 THEN
        RAISE NOTICE '✅ archon_crawled_pages: All 8 required columns found';
    ELSE
        RAISE NOTICE '❌ archon_crawled_pages: Expected 8 columns, found %', crawled_pages_columns;
        migration_success := FALSE;
        error_messages := error_messages || '• Missing columns in archon_crawled_pages' || chr(10);
    END IF;
    IF code_examples_columns = 8 THEN
        RAISE NOTICE '✅ archon_code_examples: All 8 required columns found';
    ELSE
        RAISE NOTICE '❌ archon_code_examples: Expected 8 columns, found %', code_examples_columns;
        migration_success := FALSE;
        error_messages := error_messages || '• Missing columns in archon_code_examples' || chr(10);
    END IF;
    RAISE NOTICE '';
    RAISE NOTICE 'INDEX VALIDATION:';
    IF crawled_pages_indexes >= 6 THEN
        RAISE NOTICE '✅ archon_crawled_pages: % indexes created (expected 6+)', crawled_pages_indexes;
    ELSE
        RAISE NOTICE '⚠️  archon_crawled_pages: % indexes created (expected 6+)', crawled_pages_indexes;
        RAISE NOTICE '   Note: Some indexes may have failed due to resource constraints - this is OK';
    END IF;
    IF code_examples_indexes >= 6 THEN
        RAISE NOTICE '✅ archon_code_examples: % indexes created (expected 6+)', code_examples_indexes;
    ELSE
        RAISE NOTICE '⚠️  archon_code_examples: % indexes created (expected 6+)', code_examples_indexes;
        RAISE NOTICE '   Note: Some indexes may have failed due to resource constraints - this is OK';
    END IF;
    RAISE NOTICE '';
    RAISE NOTICE 'FUNCTION VALIDATION:';
    IF functions_count = 4 THEN
        RAISE NOTICE '✅ All 4 required functions created successfully';
    ELSE
        RAISE NOTICE '❌ Expected 4 functions, found %', functions_count;
        migration_success := FALSE;
        error_messages := error_messages || '• Missing database functions' || chr(10);
    END IF;
    -- Test function functionality
    BEGIN
        PERFORM detect_embedding_dimension(ARRAY[1,2,3]::vector);
        RAISE NOTICE '✅ detect_embedding_dimension function working';
    EXCEPTION WHEN OTHERS THEN
        RAISE NOTICE '❌ detect_embedding_dimension function failed: %', SQLERRM;
        migration_success := FALSE;
        error_messages := error_messages || '• detect_embedding_dimension function not working' || chr(10);
    END;
    BEGIN
        PERFORM get_embedding_column_name(1536);
        RAISE NOTICE '✅ get_embedding_column_name function working';
    EXCEPTION WHEN OTHERS THEN
        RAISE NOTICE '❌ get_embedding_column_name function failed: %', SQLERRM;
        migration_success := FALSE;
        error_messages := error_messages || '• get_embedding_column_name function not working' || chr(10);
    END;
    RAISE NOTICE '';
    RAISE NOTICE '====================================================================';
    IF migration_success THEN
        RAISE NOTICE '🎉 MIGRATION VALIDATION SUCCESSFUL!';
        RAISE NOTICE '';
        RAISE NOTICE 'Your Archon installation has been successfully upgraded with:';
        RAISE NOTICE '✅ Multi-dimensional embedding support';
        RAISE NOTICE '✅ Model tracking capabilities';
        RAISE NOTICE '✅ Enhanced search functions';
        RAISE NOTICE '✅ Optimized database indexes';
        RAISE NOTICE '';
        RAISE NOTICE 'Next steps:';
        RAISE NOTICE '1. Restart your Archon services: docker compose restart';
        RAISE NOTICE '2. Test with a small crawl to verify functionality';
        RAISE NOTICE '3. Configure your preferred models in Settings';
    ELSE
        RAISE NOTICE '❌ MIGRATION VALIDATION FAILED!';
        RAISE NOTICE '';
        RAISE NOTICE 'Issues found:';
        RAISE NOTICE '%', error_messages;
        RAISE NOTICE 'Please check the migration logs and re-run if necessary.';
    END IF;
    RAISE NOTICE '====================================================================';
    -- Show sample of existing data if any
    DECLARE
        sample_count INTEGER;
        r RECORD;  -- Declare the loop variable as RECORD type
    BEGIN
        SELECT COUNT(*) INTO sample_count FROM archon_crawled_pages LIMIT 1;
        IF sample_count > 0 THEN
            RAISE NOTICE '';
            RAISE NOTICE 'SAMPLE DATA CHECK:';
            -- Show one record with the new columns
            FOR r IN 
                SELECT url, embedding_model, embedding_dimension, 
                       CASE WHEN llm_chat_model IS NOT NULL THEN '✅' ELSE '⚪' END as llm_status,
                       CASE WHEN embedding_384 IS NOT NULL THEN '✅ 384' 
                            WHEN embedding_768 IS NOT NULL THEN '✅ 768'
                            WHEN embedding_1024 IS NOT NULL THEN '✅ 1024'
                            WHEN embedding_1536 IS NOT NULL THEN '✅ 1536'
                            WHEN embedding_3072 IS NOT NULL THEN '✅ 3072'
                            ELSE '⚪ None' END as embedding_status
                FROM archon_crawled_pages 
                LIMIT 3
            LOOP
                RAISE NOTICE 'Record: % | Model: % | Dimension: % | LLM: % | Embedding: %', 
                    substring(r.url from 1 for 40), 
                    COALESCE(r.embedding_model, 'None'), 
                    COALESCE(r.embedding_dimension::text, 'None'),
                    r.llm_status,
                    r.embedding_status;
            END LOOP;
        END IF;
    END;
 END $$;
 -- ======================================================================
 -- VALIDATION COMPLETE - SUPABASE-FRIENDLY STATUS REPORT
 -- ======================================================================
 -- This final SELECT statement consolidates validation results for 
 -- display in Supabase SQL Editor (users only see the last query result)
 WITH validation_results AS (
    -- Check if all required columns exist
    SELECT 
        COUNT(*) FILTER (WHERE column_name IN ('embedding_384', 'embedding_768', 'embedding_1024', 'embedding_1536', 'embedding_3072')) as embedding_columns,
        COUNT(*) FILTER (WHERE column_name IN ('llm_chat_model', 'embedding_model', 'embedding_dimension')) as tracking_columns
    FROM information_schema.columns 
    WHERE table_name = 'archon_crawled_pages'
 ),
 function_check AS (
    -- Check if required functions exist
    SELECT 
        COUNT(*) FILTER (WHERE routine_name IN ('match_archon_crawled_pages_multi', 'match_archon_code_examples_multi', 'detect_embedding_dimension', 'get_embedding_column_name')) as functions_count
    FROM information_schema.routines 
    WHERE routine_type = 'FUNCTION'
 ),
 index_check AS (
    -- Check if indexes exist
    SELECT 
        COUNT(*) FILTER (WHERE indexname LIKE '%embedding_%') as embedding_indexes
    FROM pg_indexes 
    WHERE tablename IN ('archon_crawled_pages', 'archon_code_examples')
 ),
 data_sample AS (
    -- Get sample of data with new columns
    SELECT 
        COUNT(*) as total_records,
        COUNT(*) FILTER (WHERE embedding_model IS NOT NULL) as records_with_model_tracking,
        COUNT(*) FILTER (WHERE embedding_384 IS NOT NULL OR embedding_768 IS NOT NULL OR embedding_1024 IS NOT NULL OR embedding_1536 IS NOT NULL OR embedding_3072 IS NOT NULL) as records_with_multi_dim_embeddings
    FROM archon_crawled_pages
 ),
 overall_status AS (
    SELECT 
        CASE 
            WHEN v.embedding_columns = 5 AND v.tracking_columns = 3 AND f.functions_count >= 4 AND i.embedding_indexes > 0 
            THEN '✅ MIGRATION VALIDATION SUCCESSFUL!'
            ELSE '❌ MIGRATION VALIDATION FAILED!'
        END as status,
        v.embedding_columns,
        v.tracking_columns, 
        f.functions_count,
        i.embedding_indexes,
        d.total_records,
        d.records_with_model_tracking,
        d.records_with_multi_dim_embeddings
    FROM validation_results v, function_check f, index_check i, data_sample d
 )
 SELECT 
    status,
    CASE 
        WHEN embedding_columns = 5 AND tracking_columns = 3 AND functions_count >= 4 AND embedding_indexes > 0 
        THEN 'All validation checks passed successfully'
        ELSE 'Some validation checks failed - please review the results'
    END as message,
    json_build_object(
        'embedding_columns_added', embedding_columns || '/5',
        'tracking_columns_added', tracking_columns || '/3', 
        'search_functions_created', functions_count || '+ functions',
        'embedding_indexes_created', embedding_indexes || '+ indexes'
    ) as technical_validation,
    json_build_object(
        'total_records', total_records,
        'records_with_model_tracking', records_with_model_tracking,
        'records_with_multi_dimensional_embeddings', records_with_multi_dim_embeddings
    ) as data_status,
    CASE 
        WHEN embedding_columns = 5 AND tracking_columns = 3 AND functions_count >= 4 AND embedding_indexes > 0 
        THEN ARRAY[
            '1. Restart Archon services: docker compose restart',
            '2. Test with a small crawl to verify functionality', 
            '3. Configure your preferred models in Settings',
            '4. New crawls will automatically use model tracking'
        ]
        ELSE ARRAY[
            '1. Check migration logs for specific errors',
            '2. Re-run upgrade_database.sql if needed',
            '3. Ensure database has sufficient permissions',
            '4. Contact support if issues persist'
        ]
    END as next_steps
 FROM overall_status;
--- a/python/src/server/api_routes/ollama_api.py
+++ b/python/src/server/api_routes/ollama_api.py
--- a/python/src/server/api_routes/settings_api.py
+++ b/python/src/server/api_routes/settings_api.py
@ -341,3 +341,51 @@ async def settings_health():
    result = {"status": "healthy", "service": "settings"}
    return result
@router.post("/credentials/status-check")
 async def check_credential_status(request: dict[str, list[str]]):
    """Check status of API credentials by actually decrypting and validating them.
    This endpoint is specifically for frontend status indicators and returns
    decrypted credential values for connectivity testing.
    """
    try:
        credential_keys = request.get("keys", [])
        logfire.info(f"Checking status for credentials: {credential_keys}")
        result = {}
        for key in credential_keys:
            try:
                # Get decrypted value for status checking
                decrypted_value = await credential_service.get_credential(key, decrypt=True)
                if decrypted_value and isinstance(decrypted_value, str) and decrypted_value.strip():
                    result[key] = {
                        "key": key,
                        "value": decrypted_value,
                        "has_value": True
                    }
                else:
                    result[key] = {
                        "key": key,
                        "value": None,
                        "has_value": False
                    }
            except Exception as e:
                logfire.warning(f"Failed to get credential for status check: {key} | error={str(e)}")
                result[key] = {
                    "key": key,
                    "value": None,
                    "has_value": False,
                    "error": str(e)
                }
        logfire.info(f"Credential status check completed | checked={len(credential_keys)} | found={len([k for k, v in result.items() if v.get('has_value')])}")
        return result
    except Exception as e:
        logfire.error(f"Error in credential status check | error={str(e)}")
        raise HTTPException(status_code=500, detail={"error": str(e)})
--- a/python/src/server/main.py
+++ b/python/src/server/main.py
@ -23,6 +23,7 @@ from .api_routes.bug_report_api import router as bug_report_router
 from .api_routes.internal_api import router as internal_router
 from .api_routes.knowledge_api import router as knowledge_router
 from .api_routes.mcp_api import router as mcp_router
 from .api_routes.ollama_api import router as ollama_router
 from .api_routes.progress_api import router as progress_router
 from .api_routes.projects_api import router as projects_router
@ -179,6 +180,7 @@ app.include_router(settings_router)
 app.include_router(mcp_router)
 # app.include_router(mcp_client_router)  # Removed - not part of new architecture
 app.include_router(knowledge_router)
 app.include_router(ollama_router)
 app.include_router(projects_router)
 app.include_router(progress_router)
 app.include_router(agent_chat_router)
--- a/python/src/server/services/credential_service.py
+++ b/python/src/server/services/credential_service.py
@ -239,6 +239,20 @@ class CredentialService:
                self._rag_cache_timestamp = None
                logger.debug(f"Invalidated RAG settings cache due to update of {key}")
                # Also invalidate LLM provider service cache for provider config
                try:
                    from . import llm_provider_service
                    # Clear the provider config caches that depend on RAG settings
                    cache_keys_to_clear = ["provider_config_llm", "provider_config_embedding", "rag_strategy_settings"]
                    for cache_key in cache_keys_to_clear:
                        if cache_key in llm_provider_service._settings_cache:
                            del llm_provider_service._settings_cache[cache_key]
                            logger.debug(f"Invalidated LLM provider service cache key: {cache_key}")
                except ImportError:
                    logger.warning("Could not import llm_provider_service to invalidate cache")
                except Exception as e:
                    logger.error(f"Error invalidating LLM provider service cache: {e}")
            logger.info(
                f"Successfully {'encrypted and ' if is_encrypted else ''}stored credential: {key}"
            )
@ -267,6 +281,20 @@ class CredentialService:
                self._rag_cache_timestamp = None
                logger.debug(f"Invalidated RAG settings cache due to deletion of {key}")
                # Also invalidate LLM provider service cache for provider config
                try:
                    from . import llm_provider_service
                    # Clear the provider config caches that depend on RAG settings
                    cache_keys_to_clear = ["provider_config_llm", "provider_config_embedding", "rag_strategy_settings"]
                    for cache_key in cache_keys_to_clear:
                        if cache_key in llm_provider_service._settings_cache:
                            del llm_provider_service._settings_cache[cache_key]
                            logger.debug(f"Invalidated LLM provider service cache key: {cache_key}")
                except ImportError:
                    logger.warning("Could not import llm_provider_service to invalidate cache")
                except Exception as e:
                    logger.error(f"Error invalidating LLM provider service cache: {e}")
            logger.info(f"Successfully deleted credential: {key}")
            return True
@ -400,8 +428,15 @@ class CredentialService:
            # Get base URL if needed
            base_url = self._get_provider_base_url(provider, rag_settings)
-            # Get models
+            # Get models with provider-specific fallback logic
            chat_model = rag_settings.get("MODEL_CHOICE", "")
            # If MODEL_CHOICE is empty, try provider-specific model settings
            if not chat_model and provider == "ollama":
                chat_model = rag_settings.get("OLLAMA_CHAT_MODEL", "")
                if chat_model:
                    logger.debug(f"Using OLLAMA_CHAT_MODEL: {chat_model}")
            embedding_model = rag_settings.get("EMBEDDING_MODEL", "")
            return {
--- a/python/src/server/services/embeddings/init.py
+++ b/python/src/server/services/embeddings/init.py
@ -10,6 +10,7 @@ from .contextual_embedding_service import (
    process_chunk_with_context,
 )
 from .embedding_service import create_embedding, create_embeddings_batch, get_openai_client
 from .multi_dimensional_embedding_service import multi_dimensional_embedding_service
 __all__ = [
    # Embedding functions
@ -20,4 +21,6 @@ __all__ = [
    "generate_contextual_embedding",
    "generate_contextual_embeddings_batch",
    "process_chunk_with_context",
    # Multi-dimensional embedding service
    "multi_dimensional_embedding_service",
 ]
--- a/python/src/server/services/embeddings/contextual_embedding_service.py
+++ b/python/src/server/services/embeddings/contextual_embedding_service.py
@ -116,7 +116,33 @@ async def _get_model_choice(provider: str | None = None) -> str:
    # Get the active provider configuration
    provider_config = await credential_service.get_active_provider("llm")
-    model = provider_config.get("chat_model", "gpt-4.1-nano")
+    model = provider_config.get("chat_model", "").strip()  # Strip whitespace
    provider_name = provider_config.get("provider", "openai")
    # Handle empty model case - fallback to provider-specific defaults or explicit config
    if not model:
        search_logger.warning(f"chat_model is empty for provider {provider_name}, using fallback logic")
        if provider_name == "ollama":
            # Try to get OLLAMA_CHAT_MODEL specifically
            try:
                ollama_model = await credential_service.get_credential("OLLAMA_CHAT_MODEL")
                if ollama_model and ollama_model.strip():
                    model = ollama_model.strip()
                    search_logger.info(f"Using OLLAMA_CHAT_MODEL fallback: {model}")
                else:
                    # Use a sensible Ollama default
                    model = "llama3.2:latest"
                    search_logger.info(f"Using Ollama default model: {model}")
            except Exception as e:
                search_logger.error(f"Error getting OLLAMA_CHAT_MODEL: {e}")
                model = "llama3.2:latest"
                search_logger.info(f"Using Ollama fallback model: {model}")
        elif provider_name == "google":
            model = "gemini-1.5-flash"
        else:
            # OpenAI or other providers
            model = "gpt-4o-mini"
    search_logger.debug(f"Using model from credential service: {model}")
--- a/python/src/server/services/embeddings/multi_dimensional_embedding_service.py
+++ b/python/src/server/services/embeddings/multi_dimensional_embedding_service.py
@ -0,0 +1,76 @@
 """
 Multi-Dimensional Embedding Service
 Manages embeddings with different dimensions (768, 1024, 1536, 3072) to support
 various embedding models from OpenAI, Google, Ollama, and other providers.
 This service works with the tested database schema that has been validated.
 """
 from typing import Any
 from ...config.logfire_config import get_logger
 logger = get_logger(__name__)
 # Supported embedding dimensions based on tested database schema
 # Note: Model lists are dynamically determined by providers, not hardcoded
 SUPPORTED_DIMENSIONS = {
    768: [],   # Common dimensions for various providers (Google, etc.)
    1024: [],  # Ollama and other providers
    1536: [],  # OpenAI models (text-embedding-3-small, ada-002)
    3072: []   # OpenAI large models (text-embedding-3-large)
 }
 class MultiDimensionalEmbeddingService:
    """Service for managing embeddings with multiple dimensions."""
    def __init__(self):
        pass
    def get_supported_dimensions(self) -> dict[int, list[str]]:
        """Get all supported embedding dimensions and their associated models."""
        return SUPPORTED_DIMENSIONS.copy()
    def get_dimension_for_model(self, model_name: str) -> int:
        """Get the embedding dimension for a specific model name using heuristics."""
        model_lower = model_name.lower()
        # Use heuristics to determine dimension based on model name patterns
        # OpenAI models
        if "text-embedding-3-large" in model_lower:
            return 3072
        elif "text-embedding-3-small" in model_lower or "text-embedding-ada" in model_lower:
            return 1536
        # Google models
        elif "text-embedding-004" in model_lower or "gemini-text-embedding" in model_lower:
            return 768
        # Ollama models (common patterns)
        elif "mxbai-embed" in model_lower:
            return 1024
        elif "nomic-embed" in model_lower:
            return 768
        elif "embed" in model_lower:
            # Generic embedding model, assume common dimension
            return 768
        # Default fallback for unknown models (most common OpenAI dimension)
        logger.warning(f"Unknown model {model_name}, defaulting to 1536 dimensions")
        return 1536
    def get_embedding_column_name(self, dimension: int) -> str:
        """Get the appropriate database column name for the given dimension."""
        if dimension in SUPPORTED_DIMENSIONS:
            return f"embedding_{dimension}"
        else:
            logger.warning(f"Unsupported dimension {dimension}, using fallback column")
            return "embedding"  # Fallback to original column
    def is_dimension_supported(self, dimension: int) -> bool:
        """Check if a dimension is supported by the database schema."""
        return dimension in SUPPORTED_DIMENSIONS
 # Global instance
 multi_dimensional_embedding_service = MultiDimensionalEmbeddingService()
--- a/python/src/server/services/llm_provider_service.py
+++ b/python/src/server/services/llm_provider_service.py
@ -39,16 +39,20 @@ def _set_cached_settings(key: str, value: Any) -> None:
@asynccontextmanager
-async def get_llm_client(provider: str | None = None, use_embedding_provider: bool = False):
+async def get_llm_client(provider: str | None = None, use_embedding_provider: bool = False,
                        instance_type: str | None = None, base_url: str | None = None):
    """
    Create an async OpenAI-compatible client based on the configured provider.
    This context manager handles client creation for different LLM providers
-    that support the OpenAI API format.
+    that support the OpenAI API format, with enhanced support for multi-instance
    Ollama configurations and intelligent instance routing.
    Args:
        provider: Override provider selection
        use_embedding_provider: Use the embedding-specific provider if different
        instance_type: For Ollama multi-instance: 'chat', 'embedding', or None for auto-select
        base_url: Override base URL for specific instance routing
    Yields:
        openai.AsyncOpenAI: An OpenAI-compatible client configured for the selected provider
@ -72,7 +76,8 @@ async def get_llm_client(provider: str | None = None, use_embedding_provider: bo
            else:
                logger.debug("Using cached rag_strategy settings")
-            base_url = credential_service._get_provider_base_url(provider, rag_settings)
+            # For Ollama, don't use the base_url from config - let _get_optimal_ollama_instance decide
            base_url = credential_service._get_provider_base_url(provider, rag_settings) if provider != "ollama" else None
        else:
            # Get configured provider from database
            service_type = "embedding" if use_embedding_provider else "llm"
@ -89,24 +94,56 @@ async def get_llm_client(provider: str | None = None, use_embedding_provider: bo
            provider_name = provider_config["provider"]
            api_key = provider_config["api_key"]
-            base_url = provider_config["base_url"]
+            # For Ollama, don't use the base_url from config - let _get_optimal_ollama_instance decide
            base_url = provider_config["base_url"] if provider_name != "ollama" else None
        logger.info(f"Creating LLM client for provider: {provider_name}")
        if provider_name == "openai":
            if not api_key:
-                raise ValueError("OpenAI API key not found")
+                # Check if Ollama instances are available as fallback
-
+                logger.warning("OpenAI API key not found, attempting Ollama fallback")
                try:
                    # Try to get an optimal Ollama instance for fallback
                    ollama_base_url = await _get_optimal_ollama_instance(
                        instance_type="embedding" if use_embedding_provider else "chat",
                        use_embedding_provider=use_embedding_provider
                    )
                    if ollama_base_url:
                        logger.info(f"Falling back to Ollama instance: {ollama_base_url}")
                        provider_name = "ollama"
                        api_key = "ollama"  # Ollama doesn't need a real API key
                        base_url = ollama_base_url
                        # Create Ollama client after fallback
                        client = openai.AsyncOpenAI(
                            api_key="ollama",
                            base_url=ollama_base_url,
                        )
                        logger.info(f"Ollama fallback client created successfully with base URL: {ollama_base_url}")
                    else:
                        raise ValueError("OpenAI API key not found and no Ollama instances available")
                except Exception as ollama_error:
                    logger.error(f"Ollama fallback failed: {ollama_error}")
                    raise ValueError("OpenAI API key not found and Ollama fallback failed") from ollama_error
            else:
                # Only create OpenAI client if we have an API key (didn't fallback to Ollama)
                client = openai.AsyncOpenAI(api_key=api_key)
                logger.info("OpenAI client created successfully")
        elif provider_name == "ollama":
            # Enhanced Ollama client creation with multi-instance support
            ollama_base_url = await _get_optimal_ollama_instance(
                instance_type=instance_type,
                use_embedding_provider=use_embedding_provider,
                base_url_override=base_url
            )
            # Ollama requires an API key in the client but doesn't actually use it
            client = openai.AsyncOpenAI(
                api_key="ollama",  # Required but unused by Ollama
-                base_url=base_url or "http://localhost:11434/v1",
+                base_url=ollama_base_url,
            )
-            logger.info(f"Ollama client created successfully with base URL: {base_url}")
+            logger.info(f"Ollama client created successfully with base URL: {ollama_base_url}")
        elif provider_name == "google":
            if not api_key:
@ -133,6 +170,54 @@ async def get_llm_client(provider: str | None = None, use_embedding_provider: bo
        pass
 async def _get_optimal_ollama_instance(instance_type: str | None = None,
                                       use_embedding_provider: bool = False,
                                       base_url_override: str | None = None) -> str:
    """
    Get the optimal Ollama instance URL based on configuration and health status.
    Args:
        instance_type: Preferred instance type ('chat', 'embedding', 'both', or None)
        use_embedding_provider: Whether this is for embedding operations
        base_url_override: Override URL if specified
    Returns:
        Best available Ollama instance URL
    """
    # If override URL provided, use it directly
    if base_url_override:
        return base_url_override if base_url_override.endswith('/v1') else f"{base_url_override}/v1"
    try:
        # For now, we don't have multi-instance support, so skip to single instance config
        # TODO: Implement get_ollama_instances() method in CredentialService for multi-instance support
        logger.info("Using single instance Ollama configuration")
        # Get single instance configuration from RAG settings
        rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
        # Check if we need embedding provider and have separate embedding URL
        if use_embedding_provider or instance_type == "embedding":
            embedding_url = rag_settings.get("OLLAMA_EMBEDDING_URL")
            if embedding_url:
                return embedding_url if embedding_url.endswith('/v1') else f"{embedding_url}/v1"
        # Default to LLM base URL for chat operations
        fallback_url = rag_settings.get("LLM_BASE_URL", "http://localhost:11434")
        return fallback_url if fallback_url.endswith('/v1') else f"{fallback_url}/v1"
    except Exception as e:
        logger.error(f"Error getting Ollama configuration: {e}")
        # Final fallback to localhost only if we can't get RAG settings
        try:
            rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
            fallback_url = rag_settings.get("LLM_BASE_URL", "http://localhost:11434")
            return fallback_url if fallback_url.endswith('/v1') else f"{fallback_url}/v1"
        except Exception as fallback_error:
            logger.error(f"Could not retrieve fallback configuration: {fallback_error}")
            return "http://localhost:11434/v1"
 async def get_embedding_model(provider: str | None = None) -> str:
    """
    Get the configured embedding model based on the provider.
@ -186,3 +271,115 @@ async def get_embedding_model(provider: str | None = None) -> str:
        logger.error(f"Error getting embedding model: {e}")
        # Fallback to OpenAI default
        return "text-embedding-3-small"
 async def get_embedding_model_with_routing(provider: str | None = None, instance_url: str | None = None) -> tuple[str, str]:
    """
    Get the embedding model with intelligent routing for multi-instance setups.
    Args:
        provider: Override provider selection
        instance_url: Specific instance URL to use
    Returns:
        Tuple of (model_name, instance_url) for embedding operations
    """
    try:
        # Get base embedding model
        model_name = await get_embedding_model(provider)
        # If specific instance URL provided, use it
        if instance_url:
            final_url = instance_url if instance_url.endswith('/v1') else f"{instance_url}/v1"
            return model_name, final_url
        # For Ollama provider, use intelligent instance routing
        if provider == "ollama" or (not provider and (await credential_service.get_credentials_by_category("rag_strategy")).get("LLM_PROVIDER") == "ollama"):
            optimal_url = await _get_optimal_ollama_instance(
                instance_type="embedding",
                use_embedding_provider=True
            )
            return model_name, optimal_url
        # For other providers, return model with None URL (use default)
        return model_name, None
    except Exception as e:
        logger.error(f"Error getting embedding model with routing: {e}")
        return "text-embedding-3-small", None
 async def validate_provider_instance(provider: str, instance_url: str | None = None) -> dict[str, any]:
    """
    Validate a provider instance and return health information.
    Args:
        provider: Provider name (openai, ollama, google, etc.)
        instance_url: Instance URL for providers that support multiple instances
    Returns:
        Dictionary with validation results and health status
    """
    try:
        if provider == "ollama":
            # Use the Ollama model discovery service for health checking
            from .ollama.model_discovery_service import model_discovery_service
            # Use provided URL or get optimal instance
            if not instance_url:
                instance_url = await _get_optimal_ollama_instance()
                # Remove /v1 suffix for health checking
                if instance_url.endswith('/v1'):
                    instance_url = instance_url[:-3]
            health_status = await model_discovery_service.check_instance_health(instance_url)
            return {
                "provider": provider,
                "instance_url": instance_url,
                "is_available": health_status.is_healthy,
                "response_time_ms": health_status.response_time_ms,
                "models_available": health_status.models_available,
                "error_message": health_status.error_message,
                "validation_timestamp": time.time()
            }
        else:
            # For other providers, do basic validation
            async with get_llm_client(provider=provider) as client:
                # Try a simple operation to validate the provider
                start_time = time.time()
                if provider == "openai":
                    # List models to validate API key
                    models = await client.models.list()
                    model_count = len(models.data) if hasattr(models, 'data') else 0
                elif provider == "google":
                    # For Google, we can't easily list models, just validate client creation
                    model_count = 1  # Assume available if client creation succeeded
                else:
                    model_count = 1
                response_time = (time.time() - start_time) * 1000
                return {
                    "provider": provider,
                    "instance_url": instance_url,
                    "is_available": True,
                    "response_time_ms": response_time,
                    "models_available": model_count,
                    "error_message": None,
                    "validation_timestamp": time.time()
                }
    except Exception as e:
        logger.error(f"Error validating provider {provider}: {e}")
        return {
            "provider": provider,
            "instance_url": instance_url,
            "is_available": False,
            "response_time_ms": None,
            "models_available": 0,
            "error_message": str(e),
            "validation_timestamp": time.time()
        }
--- a/python/src/server/services/ollama/init.py
+++ b/python/src/server/services/ollama/init.py
@ -0,0 +1,8 @@
 """
 Ollama Service Module
 Specialized services for Ollama provider management including:
 - Model discovery and capability detection
 - Multi-instance health monitoring
 - Dimension-aware embedding routing
 """
--- a/python/src/server/services/ollama/embedding_router.py
+++ b/python/src/server/services/ollama/embedding_router.py
@ -0,0 +1,451 @@
 """
 Ollama Embedding Router
 Provides intelligent routing for embeddings based on model capabilities and dimensions.
 Integrates with ModelDiscoveryService for real-time dimension detection and supports
 automatic fallback strategies for optimal performance across distributed Ollama instances.
 """
 from dataclasses import dataclass
 from typing import Any
 from ...config.logfire_config import get_logger
 from ..embeddings.multi_dimensional_embedding_service import multi_dimensional_embedding_service
 from .model_discovery_service import model_discovery_service
 logger = get_logger(__name__)
@dataclass
 class RoutingDecision:
    """Represents a routing decision for embedding generation."""
    target_column: str
    model_name: str
    instance_url: str
    dimensions: int
    confidence: float  # 0.0 to 1.0
    fallback_applied: bool = False
    routing_strategy: str = "auto-detect"  # auto-detect, model-mapping, fallback
@dataclass
 class EmbeddingRoute:
    """Configuration for embedding routing."""
    model_name: str
    instance_url: str
    dimensions: int
    column_name: str
    performance_score: float = 1.0  # Higher is better
 class EmbeddingRouter:
    """
    Intelligent router for Ollama embedding operations with dimension-aware routing.
    Features:
    - Automatic dimension detection from model capabilities
    - Intelligent routing to appropriate database columns
    - Fallback strategies for unknown models
    - Performance optimization for different vector sizes
    - Multi-instance load balancing consideration
    """
    # Database column mapping for different dimensions
    DIMENSION_COLUMNS = {
        768: "embedding_768",
        1024: "embedding_1024",
        1536: "embedding_1536",
        3072: "embedding_3072"
    }
    # Index type preferences for performance optimization
    INDEX_PREFERENCES = {
        768: "ivfflat",   # Good for smaller dimensions
        1024: "ivfflat",  # Good for medium dimensions
        1536: "ivfflat",  # Good for standard OpenAI dimensions
        3072: "hnsw"      # Better for high dimensions
    }
    def __init__(self):
        self.routing_cache: dict[str, RoutingDecision] = {}
        self.cache_ttl = 300  # 5 minutes cache TTL
    async def route_embedding(self, model_name: str, instance_url: str,
                            text_content: str | None = None) -> RoutingDecision:
        """
        Determine the optimal routing for an embedding operation.
        Args:
            model_name: Name of the embedding model to use
            instance_url: URL of the Ollama instance
            text_content: Optional text content for dynamic optimization
        Returns:
            RoutingDecision with target column and routing information
        """
        # Check cache first
        cache_key = f"{model_name}@{instance_url}"
        if cache_key in self.routing_cache:
            cached_decision = self.routing_cache[cache_key]
            logger.debug(f"Using cached routing decision for {model_name}")
            return cached_decision
        try:
            logger.info(f"Determining routing for model {model_name} on {instance_url}")
            # Step 1: Auto-detect dimensions from model capabilities
            dimensions = await self._detect_model_dimensions(model_name, instance_url)
            if dimensions:
                # Step 2: Route to appropriate column based on detected dimensions
                decision = await self._route_by_dimensions(
                    model_name, instance_url, dimensions, strategy="auto-detect"
                )
                logger.info(f"Auto-detected routing: {model_name} -> {decision.target_column} ({dimensions}D)")
            else:
                # Step 3: Fallback to model name mapping
                decision = await self._route_by_model_mapping(model_name, instance_url)
                logger.warning(f"Fallback routing applied for {model_name} -> {decision.target_column}")
            # Cache the decision
            self.routing_cache[cache_key] = decision
            return decision
        except Exception as e:
            logger.error(f"Error routing embedding for {model_name}: {e}")
            # Emergency fallback to largest supported dimension
            return RoutingDecision(
                target_column="embedding_3072",
                model_name=model_name,
                instance_url=instance_url,
                dimensions=3072,
                confidence=0.1,
                fallback_applied=True,
                routing_strategy="emergency-fallback"
            )
    async def _detect_model_dimensions(self, model_name: str, instance_url: str) -> int | None:
        """
        Detect embedding dimensions using the ModelDiscoveryService.
        Args:
            model_name: Name of the model
            instance_url: Ollama instance URL
        Returns:
            Detected dimensions or None if detection failed
        """
        try:
            # Get model info from discovery service
            model_info = await model_discovery_service.get_model_info(model_name, instance_url)
            if model_info and model_info.embedding_dimensions:
                dimensions = model_info.embedding_dimensions
                logger.debug(f"Detected {dimensions} dimensions for {model_name}")
                return dimensions
            # Try capability detection if model info doesn't have dimensions
            capabilities = await model_discovery_service._detect_model_capabilities(
                model_name, instance_url
            )
            if capabilities.embedding_dimensions:
                dimensions = capabilities.embedding_dimensions
                logger.debug(f"Detected {dimensions} dimensions via capabilities for {model_name}")
                return dimensions
            logger.warning(f"Could not detect dimensions for {model_name}")
            return None
        except Exception as e:
            logger.error(f"Error detecting dimensions for {model_name}: {e}")
            return None
    async def _route_by_dimensions(self, model_name: str, instance_url: str,
                                 dimensions: int, strategy: str) -> RoutingDecision:
        """
        Route embedding based on detected dimensions.
        Args:
            model_name: Name of the model
            instance_url: Ollama instance URL
            dimensions: Detected embedding dimensions
            strategy: Routing strategy used
        Returns:
            RoutingDecision for the detected dimensions
        """
        # Get target column for dimensions
        target_column = self._get_target_column(dimensions)
        # Calculate confidence based on exact dimension match
        confidence = 1.0 if dimensions in self.DIMENSION_COLUMNS else 0.7
        # Check if fallback was applied
        fallback_applied = dimensions not in self.DIMENSION_COLUMNS
        if fallback_applied:
            logger.warning(f"Model {model_name} dimensions {dimensions} not directly supported, "
                          f"using {target_column} with padding/truncation")
        return RoutingDecision(
            target_column=target_column,
            model_name=model_name,
            instance_url=instance_url,
            dimensions=dimensions,
            confidence=confidence,
            fallback_applied=fallback_applied,
            routing_strategy=strategy
        )
    async def _route_by_model_mapping(self, model_name: str, instance_url: str) -> RoutingDecision:
        """
        Route embedding based on model name mapping when auto-detection fails.
        Args:
            model_name: Name of the model
            instance_url: Ollama instance URL
        Returns:
            RoutingDecision based on model name mapping
        """
        # Use the existing multi-dimensional service for model mapping
        dimensions = multi_dimensional_embedding_service.get_dimension_for_model(model_name)
        target_column = multi_dimensional_embedding_service.get_embedding_column_name(dimensions)
        logger.info(f"Model mapping: {model_name} -> {dimensions}D -> {target_column}")
        return RoutingDecision(
            target_column=target_column,
            model_name=model_name,
            instance_url=instance_url,
            dimensions=dimensions,
            confidence=0.8,  # Medium confidence for model mapping
            fallback_applied=True,
            routing_strategy="model-mapping"
        )
    def _get_target_column(self, dimensions: int) -> str:
        """
        Get the appropriate database column for the given dimensions.
        Args:
            dimensions: Embedding dimensions
        Returns:
            Target column name for storage
        """
        # Direct mapping if supported
        if dimensions in self.DIMENSION_COLUMNS:
            return self.DIMENSION_COLUMNS[dimensions]
        # Fallback logic for unsupported dimensions
        if dimensions <= 768:
            logger.warning(f"Dimensions {dimensions} ≤ 768, using embedding_768 with padding")
            return "embedding_768"
        elif dimensions <= 1024:
            logger.warning(f"Dimensions {dimensions} ≤ 1024, using embedding_1024 with padding")
            return "embedding_1024"
        elif dimensions <= 1536:
            logger.warning(f"Dimensions {dimensions} ≤ 1536, using embedding_1536 with padding")
            return "embedding_1536"
        else:
            logger.warning(f"Dimensions {dimensions} > 1536, using embedding_3072 (may truncate)")
            return "embedding_3072"
    def get_optimal_index_type(self, dimensions: int) -> str:
        """
        Get the optimal index type for the given dimensions.
        Args:
            dimensions: Embedding dimensions
        Returns:
            Recommended index type (ivfflat or hnsw)
        """
        return self.INDEX_PREFERENCES.get(dimensions, "hnsw")
    async def get_available_embedding_routes(self, instance_urls: list[str]) -> list[EmbeddingRoute]:
        """
        Get all available embedding routes across multiple instances.
        Args:
            instance_urls: List of Ollama instance URLs to check
        Returns:
            List of available embedding routes with performance scores
        """
        routes = []
        try:
            # Discover models from all instances
            discovery_result = await model_discovery_service.discover_models_from_multiple_instances(
                instance_urls
            )
            # Process embedding models
            for embedding_model in discovery_result["embedding_models"]:
                model_name = embedding_model["name"]
                instance_url = embedding_model["instance_url"]
                dimensions = embedding_model.get("dimensions")
                if dimensions:
                    target_column = self._get_target_column(dimensions)
                    # Calculate performance score based on dimension efficiency
                    performance_score = self._calculate_performance_score(dimensions)
                    route = EmbeddingRoute(
                        model_name=model_name,
                        instance_url=instance_url,
                        dimensions=dimensions,
                        column_name=target_column,
                        performance_score=performance_score
                    )
                    routes.append(route)
            # Sort by performance score (highest first)
            routes.sort(key=lambda r: r.performance_score, reverse=True)
            logger.info(f"Found {len(routes)} embedding routes across {len(instance_urls)} instances")
        except Exception as e:
            logger.error(f"Error getting embedding routes: {e}")
        return routes
    def _calculate_performance_score(self, dimensions: int) -> float:
        """
        Calculate performance score for embedding dimensions.
        Args:
            dimensions: Embedding dimensions
        Returns:
            Performance score (0.0 to 1.0, higher is better)
        """
        # Base score on standard dimensions (exact matches get higher scores)
        if dimensions in self.DIMENSION_COLUMNS:
            base_score = 1.0
        else:
            base_score = 0.7  # Penalize non-standard dimensions
        # Adjust based on index performance characteristics
        if dimensions <= 1536:
            # IVFFlat performs well for smaller dimensions
            index_bonus = 0.0
        else:
            # HNSW needed for larger dimensions, slight penalty for complexity
            index_bonus = -0.1
        # Dimension efficiency (smaller = faster, but less semantic information)
        if dimensions == 1536:
            # Sweet spot for most applications
            dimension_bonus = 0.1
        elif dimensions == 768:
            # Good balance of speed and quality
            dimension_bonus = 0.05
        else:
            dimension_bonus = 0.0
        final_score = max(0.0, min(1.0, base_score + index_bonus + dimension_bonus))
        logger.debug(f"Performance score for {dimensions}D: {final_score}")
        return final_score
    async def validate_routing_decision(self, decision: RoutingDecision) -> bool:
        """
        Validate that a routing decision is still valid.
        Args:
            decision: RoutingDecision to validate
        Returns:
            True if decision is valid, False otherwise
        """
        try:
            # Check if the model still supports embeddings
            is_valid = await model_discovery_service.validate_model_capabilities(
                decision.model_name,
                decision.instance_url,
                "embedding"
            )
            if not is_valid:
                logger.warning(f"Routing decision invalid: {decision.model_name} no longer supports embeddings")
                # Remove from cache if invalid
                cache_key = f"{decision.model_name}@{decision.instance_url}"
                if cache_key in self.routing_cache:
                    del self.routing_cache[cache_key]
            return is_valid
        except Exception as e:
            logger.error(f"Error validating routing decision: {e}")
            return False
    def clear_routing_cache(self) -> None:
        """Clear the routing decision cache."""
        self.routing_cache.clear()
        logger.info("Routing cache cleared")
    def get_routing_statistics(self) -> dict[str, Any]:
        """
        Get statistics about current routing decisions.
        Returns:
            Dictionary with routing statistics
        """
        # Use explicit counters with proper types
        auto_detect_routes = 0
        model_mapping_routes = 0
        fallback_routes = 0
        dimension_distribution: dict[str, int] = {}
        confidence_high = 0
        confidence_medium = 0
        confidence_low = 0
        for decision in self.routing_cache.values():
            # Count routing strategies
            if decision.routing_strategy == "auto-detect":
                auto_detect_routes += 1
            elif decision.routing_strategy == "model-mapping":
                model_mapping_routes += 1
            else:
                fallback_routes += 1
            # Count dimensions
            dim_key = f"{decision.dimensions}D"
            dimension_distribution[dim_key] = dimension_distribution.get(dim_key, 0) + 1
            # Count confidence levels
            if decision.confidence >= 0.9:
                confidence_high += 1
            elif decision.confidence >= 0.7:
                confidence_medium += 1
            else:
                confidence_low += 1
        return {
            "total_cached_routes": len(self.routing_cache),
            "auto_detect_routes": auto_detect_routes,
            "model_mapping_routes": model_mapping_routes,
            "fallback_routes": fallback_routes,
            "dimension_distribution": dimension_distribution,
            "confidence_distribution": {
                "high": confidence_high,
                "medium": confidence_medium,
                "low": confidence_low
            }
        }
 # Global service instance
 embedding_router = EmbeddingRouter()
--- a/python/src/server/services/ollama/model_discovery_service.py
+++ b/python/src/server/services/ollama/model_discovery_service.py
--- a/python/src/server/services/provider_discovery_service.py
+++ b/python/src/server/services/provider_discovery_service.py
@ -0,0 +1,505 @@
 """
 Provider Discovery Service
 Discovers available models, checks provider health, and provides model specifications
 for OpenAI, Google Gemini, Ollama, and Anthropic providers.
 """
 import time
 from dataclasses import dataclass
 from typing import Any
 from urllib.parse import urlparse
 import aiohttp
 import openai
 from ..config.logfire_config import get_logger
 from .credential_service import credential_service
 logger = get_logger(__name__)
 # Provider capabilities and model specifications cache
 _provider_cache: dict[str, tuple[Any, float]] = {}
 _CACHE_TTL_SECONDS = 300  # 5 minutes
 # Default Ollama instance URL (configurable via environment/settings)
 DEFAULT_OLLAMA_URL = "http://localhost:11434"
 # Model pattern detection for dynamic capabilities (no hardcoded model names)
 CHAT_MODEL_PATTERNS = ["llama", "qwen", "mistral", "codellama", "phi", "gemma", "vicuna", "orca"]
 EMBEDDING_MODEL_PATTERNS = ["embed", "embedding"]
 VISION_MODEL_PATTERNS = ["vision", "llava", "moondream"]
 # Context window estimates by model family (heuristics, not hardcoded requirements)
 MODEL_CONTEXT_WINDOWS = {
    "llama3": 8192,
    "qwen": 32768,
    "mistral": 8192,
    "codellama": 16384,
    "phi": 4096,
    "gemma": 8192,
 }
 # Embedding dimensions for common models (heuristics)
 EMBEDDING_DIMENSIONS = {
    "nomic-embed": 768,
    "mxbai-embed": 1024,
    "all-minilm": 384,
 }
@dataclass
 class ModelSpec:
    """Model specification with capabilities and constraints."""
    name: str
    provider: str
    context_window: int
    supports_tools: bool = False
    supports_vision: bool = False
    supports_embeddings: bool = False
    embedding_dimensions: int | None = None
    pricing_input: float | None = None  # Per million tokens
    pricing_output: float | None = None  # Per million tokens
    description: str = ""
    aliases: list[str] = None
    def __post_init__(self):
        if self.aliases is None:
            self.aliases = []
@dataclass
 class ProviderStatus:
    """Provider health and connectivity status."""
    provider: str
    is_available: bool
    response_time_ms: float | None = None
    error_message: str | None = None
    models_available: int = 0
    base_url: str | None = None
    last_checked: float | None = None
 class ProviderDiscoveryService:
    """Service for discovering models and checking provider health."""
    def __init__(self):
        self._session: aiohttp.ClientSession | None = None
    async def _get_session(self) -> aiohttp.ClientSession:
        """Get or create HTTP session for provider requests."""
        if self._session is None:
            timeout = aiohttp.ClientTimeout(total=30, connect=10)
            self._session = aiohttp.ClientSession(timeout=timeout)
        return self._session
    async def close(self):
        """Close HTTP session."""
        if self._session:
            await self._session.close()
            self._session = None
    def _get_cached_result(self, cache_key: str) -> Any | None:
        """Get cached result if not expired."""
        if cache_key in _provider_cache:
            result, timestamp = _provider_cache[cache_key]
            if time.time() - timestamp < _CACHE_TTL_SECONDS:
                return result
            else:
                del _provider_cache[cache_key]
        return None
    def _cache_result(self, cache_key: str, result: Any) -> None:
        """Cache result with current timestamp."""
        _provider_cache[cache_key] = (result, time.time())
    async def _test_tool_support(self, model_name: str, api_url: str) -> bool:
        """
        Test if a model supports function/tool calling by making an actual API call.
        Args:
            model_name: Name of the model to test
            api_url: Base URL of the Ollama instance
        Returns:
            True if tool calling is supported, False otherwise
        """
        try:
            import openai
            # Use OpenAI-compatible client for function calling test
            client = openai.AsyncOpenAI(
                base_url=f"{api_url}/v1",
                api_key="ollama"  # Dummy API key for Ollama
            )
            # Define a simple test function
            test_function = {
                "name": "test_function",
                "description": "A test function",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "test_param": {
                            "type": "string",
                            "description": "A test parameter"
                        }
                    },
                    "required": ["test_param"]
                }
            }
            # Try to make a function calling request
            response = await client.chat.completions.create(
                model=model_name,
                messages=[{"role": "user", "content": "Call the test function with parameter 'hello'"}],
                tools=[{"type": "function", "function": test_function}],
                max_tokens=50,
                timeout=5  # Short timeout for quick testing
            )
            # Check if the model attempted to use the function
            if response.choices and len(response.choices) > 0:
                choice = response.choices[0]
                if hasattr(choice.message, 'tool_calls') and choice.message.tool_calls:
                    logger.info(f"Model {model_name} supports tool calling")
                    return True
            return False
        except Exception as e:
            logger.debug(f"Tool support test failed for {model_name}: {e}")
            # Fall back to name-based heuristics for known models
            return any(pattern in model_name.lower() 
                      for pattern in CHAT_MODEL_PATTERNS)
        finally:
            if 'client' in locals():
                await client.close()
    async def discover_openai_models(self, api_key: str) -> list[ModelSpec]:
        """Discover available OpenAI models."""
        cache_key = f"openai_models_{hash(api_key)}"
        cached = self._get_cached_result(cache_key)
        if cached:
            return cached
        models = []
        try:
            client = openai.AsyncOpenAI(api_key=api_key)
            response = await client.models.list()
            # OpenAI model specifications
            model_specs = {
                "gpt-4o": ModelSpec("gpt-4o", "openai", 128000, True, True, False, None, 2.50, 10.00, "Most capable GPT-4 model with vision"),
                "gpt-4o-mini": ModelSpec("gpt-4o-mini", "openai", 128000, True, True, False, None, 0.15, 0.60, "Affordable GPT-4 model"),
                "gpt-4-turbo": ModelSpec("gpt-4-turbo", "openai", 128000, True, True, False, None, 10.00, 30.00, "GPT-4 Turbo with vision"),
                "gpt-3.5-turbo": ModelSpec("gpt-3.5-turbo", "openai", 16385, True, False, False, None, 0.50, 1.50, "Fast and efficient model"),
                "text-embedding-3-large": ModelSpec("text-embedding-3-large", "openai", 8191, False, False, True, 3072, 0.13, 0, "High-quality embedding model"),
                "text-embedding-3-small": ModelSpec("text-embedding-3-small", "openai", 8191, False, False, True, 1536, 0.02, 0, "Efficient embedding model"),
                "text-embedding-ada-002": ModelSpec("text-embedding-ada-002", "openai", 8191, False, False, True, 1536, 0.10, 0, "Legacy embedding model"),
            }
            for model in response.data:
                if model.id in model_specs:
                    models.append(model_specs[model.id])
                else:
                    # Create basic spec for unknown models
                    models.append(ModelSpec(
                        name=model.id,
                        provider="openai",
                        context_window=4096,  # Default assumption
                        description=f"OpenAI model {model.id}"
                    ))
            self._cache_result(cache_key, models)
            logger.info(f"Discovered {len(models)} OpenAI models")
        except Exception as e:
            logger.error(f"Error discovering OpenAI models: {e}")
        return models
    async def discover_google_models(self, api_key: str) -> list[ModelSpec]:
        """Discover available Google Gemini models."""
        cache_key = f"google_models_{hash(api_key)}"
        cached = self._get_cached_result(cache_key)
        if cached:
            return cached
        models = []
        try:
            # Google Gemini model specifications
            model_specs = [
                ModelSpec("gemini-1.5-pro", "google", 2097152, True, True, False, None, 1.25, 5.00, "Advanced reasoning and multimodal capabilities"),
                ModelSpec("gemini-1.5-flash", "google", 1048576, True, True, False, None, 0.075, 0.30, "Fast and versatile performance"),
                ModelSpec("gemini-1.0-pro", "google", 30720, True, False, False, None, 0.50, 1.50, "Efficient model for text tasks"),
                ModelSpec("text-embedding-004", "google", 2048, False, False, True, 768, 0.00, 0, "Google's latest embedding model"),
            ]
            # Test connectivity with a simple request
            session = await self._get_session()
            base_url = "https://generativelanguage.googleapis.com/v1beta/models"
            headers = {"Authorization": f"Bearer {api_key}"}
            async with session.get(f"{base_url}?key={api_key}", headers=headers) as response:
                if response.status == 200:
                    models = model_specs
                    self._cache_result(cache_key, models)
                    logger.info(f"Discovered {len(models)} Google models")
                else:
                    logger.warning(f"Google API returned status {response.status}")
        except Exception as e:
            logger.error(f"Error discovering Google models: {e}")
        return models
    async def discover_ollama_models(self, base_urls: list[str]) -> list[ModelSpec]:
        """Discover available Ollama models from multiple instances."""
        all_models = []
        for base_url in base_urls:
            cache_key = f"ollama_models_{base_url}"
            cached = self._get_cached_result(cache_key)
            if cached:
                all_models.extend(cached)
                continue
            try:
                # Clean up URL - remove /v1 suffix if present for raw Ollama API
                parsed = urlparse(base_url)
                if parsed.path.endswith('/v1'):
                    api_url = base_url.replace('/v1', '')
                else:
                    api_url = base_url
                session = await self._get_session()
                # Get installed models
                async with session.get(f"{api_url}/api/tags") as response:
                    if response.status == 200:
                        data = await response.json()
                        models = []
                        for model_info in data.get("models", []):
                            model_name = model_info.get("name", "").split(':')[0]  # Remove tag
                            # Determine model capabilities based on testing and name patterns
                            # Test for function calling capabilities via actual API calls
                            supports_tools = await self._test_tool_support(model_name, api_url)
                            # Vision support is typically indicated by name patterns (reliable indicator)
                            supports_vision = any(pattern in model_name.lower() for pattern in VISION_MODEL_PATTERNS)
                            # Embedding support is typically indicated by name patterns (reliable indicator)  
                            supports_embeddings = any(pattern in model_name.lower() for pattern in EMBEDDING_MODEL_PATTERNS)
                            # Estimate context window based on model family
                            context_window = 4096  # Default
                            for family, window_size in MODEL_CONTEXT_WINDOWS.items():
                                if family in model_name.lower():
                                    context_window = window_size
                                    break
                            # Set embedding dimensions for known embedding models
                            embedding_dims = None
                            for model_pattern, dims in EMBEDDING_DIMENSIONS.items():
                                if model_pattern in model_name.lower():
                                    embedding_dims = dims
                                    break
                            spec = ModelSpec(
                                name=model_info.get("name", model_name),
                                provider="ollama",
                                context_window=context_window,
                                supports_tools=supports_tools,
                                supports_vision=supports_vision,
                                supports_embeddings=supports_embeddings,
                                embedding_dimensions=embedding_dims,
                                description=f"Ollama model on {base_url}",
                                aliases=[model_name] if ':' in model_info.get("name", "") else []
                            )
                            models.append(spec)
                        self._cache_result(cache_key, models)
                        all_models.extend(models)
                        logger.info(f"Discovered {len(models)} Ollama models from {base_url}")
                    else:
                        logger.warning(f"Ollama instance at {base_url} returned status {response.status}")
            except Exception as e:
                logger.error(f"Error discovering Ollama models from {base_url}: {e}")
        return all_models
    async def discover_anthropic_models(self, api_key: str) -> list[ModelSpec]:
        """Discover available Anthropic Claude models."""
        cache_key = f"anthropic_models_{hash(api_key)}"
        cached = self._get_cached_result(cache_key)
        if cached:
            return cached
        models = []
        try:
            # Anthropic Claude model specifications
            model_specs = [
                ModelSpec("claude-3-5-sonnet-20241022", "anthropic", 200000, True, True, False, None, 3.00, 15.00, "Most intelligent Claude model"),
                ModelSpec("claude-3-5-haiku-20241022", "anthropic", 200000, True, False, False, None, 0.25, 1.25, "Fast and cost-effective Claude model"),
                ModelSpec("claude-3-opus-20240229", "anthropic", 200000, True, True, False, None, 15.00, 75.00, "Powerful model for complex tasks"),
                ModelSpec("claude-3-sonnet-20240229", "anthropic", 200000, True, True, False, None, 3.00, 15.00, "Balanced performance and cost"),
                ModelSpec("claude-3-haiku-20240307", "anthropic", 200000, True, False, False, None, 0.25, 1.25, "Fast responses and cost-effective"),
            ]
            # Test connectivity - Anthropic doesn't have a models list endpoint,
            # so we'll just return the known models if API key is provided
            if api_key:
                models = model_specs
                self._cache_result(cache_key, models)
                logger.info(f"Discovered {len(models)} Anthropic models")
        except Exception as e:
            logger.error(f"Error discovering Anthropic models: {e}")
        return models
    async def check_provider_health(self, provider: str, config: dict[str, Any]) -> ProviderStatus:
        """Check health and connectivity status of a provider."""
        start_time = time.time()
        try:
            if provider == "openai":
                api_key = config.get("api_key")
                if not api_key:
                    return ProviderStatus(provider, False, None, "API key not configured")
                client = openai.AsyncOpenAI(api_key=api_key)
                models = await client.models.list()
                response_time = (time.time() - start_time) * 1000
                return ProviderStatus(
                    provider="openai",
                    is_available=True,
                    response_time_ms=response_time,
                    models_available=len(models.data),
                    last_checked=time.time()
                )
            elif provider == "google":
                api_key = config.get("api_key")
                if not api_key:
                    return ProviderStatus(provider, False, None, "API key not configured")
                session = await self._get_session()
                base_url = "https://generativelanguage.googleapis.com/v1beta/models"
                async with session.get(f"{base_url}?key={api_key}") as response:
                    response_time = (time.time() - start_time) * 1000
                    if response.status == 200:
                        data = await response.json()
                        return ProviderStatus(
                            provider="google",
                            is_available=True,
                            response_time_ms=response_time,
                            models_available=len(data.get("models", [])),
                            base_url=base_url,
                            last_checked=time.time()
                        )
                    else:
                        return ProviderStatus(provider, False, response_time, f"HTTP {response.status}")
            elif provider == "ollama":
                base_urls = config.get("base_urls", [config.get("base_url", DEFAULT_OLLAMA_URL)])
                if isinstance(base_urls, str):
                    base_urls = [base_urls]
                # Check the first available Ollama instance
                for base_url in base_urls:
                    try:
                        # Clean up URL for raw Ollama API
                        parsed = urlparse(base_url)
                        if parsed.path.endswith('/v1'):
                            api_url = base_url.replace('/v1', '')
                        else:
                            api_url = base_url
                        session = await self._get_session()
                        async with session.get(f"{api_url}/api/tags") as response:
                            response_time = (time.time() - start_time) * 1000
                            if response.status == 200:
                                data = await response.json()
                                return ProviderStatus(
                                    provider="ollama",
                                    is_available=True,
                                    response_time_ms=response_time,
                                    models_available=len(data.get("models", [])),
                                    base_url=api_url,
                                    last_checked=time.time()
                                )
                    except Exception:
                        continue  # Try next URL
                return ProviderStatus(provider, False, None, "No Ollama instances available")
            elif provider == "anthropic":
                api_key = config.get("api_key")
                if not api_key:
                    return ProviderStatus(provider, False, None, "API key not configured")
                # Anthropic doesn't have a health check endpoint, so we'll assume it's available
                # if API key is provided. In a real implementation, you might want to make a
                # small test request to verify the key is valid.
                response_time = (time.time() - start_time) * 1000
                return ProviderStatus(
                    provider="anthropic",
                    is_available=True,
                    response_time_ms=response_time,
                    models_available=5,  # Known model count
                    last_checked=time.time()
                )
            else:
                return ProviderStatus(provider, False, None, f"Unknown provider: {provider}")
        except Exception as e:
            response_time = (time.time() - start_time) * 1000
            return ProviderStatus(
                provider=provider,
                is_available=False,
                response_time_ms=response_time,
                error_message=str(e),
                last_checked=time.time()
            )
    async def get_all_available_models(self) -> dict[str, list[ModelSpec]]:
        """Get all available models from all configured providers."""
        providers = {}
        try:
            # Get provider configurations
            rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
            # OpenAI
            openai_key = await credential_service.get_credential("OPENAI_API_KEY")
            if openai_key:
                providers["openai"] = await self.discover_openai_models(openai_key)
            # Google
            google_key = await credential_service.get_credential("GOOGLE_API_KEY")
            if google_key:
                providers["google"] = await self.discover_google_models(google_key)
            # Ollama
            ollama_urls = [rag_settings.get("LLM_BASE_URL", DEFAULT_OLLAMA_URL)]
            providers["ollama"] = await self.discover_ollama_models(ollama_urls)
            # Anthropic
            anthropic_key = await credential_service.get_credential("ANTHROPIC_API_KEY")
            if anthropic_key:
                providers["anthropic"] = await self.discover_anthropic_models(anthropic_key)
        except Exception as e:
            logger.error(f"Error getting all available models: {e}")
        return providers
 # Global instance
 provider_discovery_service = ProviderDiscoveryService()
--- a/python/src/server/services/storage/code_storage_service.py
+++ b/python/src/server/services/storage/code_storage_service.py
@ -506,6 +506,20 @@ def generate_code_example_summary(
    Returns:
        A dictionary with 'summary' and 'example_name'
    """
    import asyncio
    # Run the async version in the current thread
    return asyncio.run(_generate_code_example_summary_async(code, context_before, context_after, language, provider))
 async def _generate_code_example_summary_async(
    code: str, context_before: str, context_after: str, language: str = "", provider: str = None
 ) -> dict[str, str]:
    """
    Async version of generate_code_example_summary using unified LLM provider service.
    """
    from ..llm_provider_service import get_llm_client
    # Get model choice from credential service (RAG setting)
    model_choice = _get_model_choice()
@ -536,47 +550,13 @@ Format your response as JSON:
 """
    try:
-        # Get LLM client using fallback
+        # Use unified LLM provider service
-        try:
+        async with get_llm_client(provider=provider) as client:
-            import os
+            search_logger.info(
-
+                f"Generating summary for {hash(code) & 0xffffff:06x} using model: {model_choice}"
            import openai
            api_key = os.getenv("OPENAI_API_KEY")
            if not api_key:
                # Try to get from credential service with direct fallback
                from ..credential_service import credential_service
                if (
                    credential_service._cache_initialized
                    and "OPENAI_API_KEY" in credential_service._cache
                ):
                    cached_key = credential_service._cache["OPENAI_API_KEY"]
                    if isinstance(cached_key, dict) and cached_key.get("is_encrypted"):
                        api_key = credential_service._decrypt_value(cached_key["encrypted_value"])
                    else:
                        api_key = cached_key
                else:
                    api_key = os.getenv("OPENAI_API_KEY", "")
            if not api_key:
                raise ValueError("No OpenAI API key available")
            client = openai.OpenAI(api_key=api_key)
        except Exception as e:
            search_logger.error(
                f"Failed to create LLM client fallback: {e} - returning default values"
            )
            return {
                "example_name": f"Code Example{f' ({language})' if language else ''}",
                "summary": "Code example for demonstration purposes.",
            }
        search_logger.debug(
            f"Calling OpenAI API with model: {model_choice}, language: {language}, code length: {len(code)}"
            )
-        response = client.chat.completions.create(
+            response = await client.chat.completions.create(
                model=model_choice,
                messages=[
                    {
@ -586,16 +566,18 @@ Format your response as JSON:
                    {"role": "user", "content": prompt},
                ],
                response_format={"type": "json_object"},
                max_tokens=500,
                temperature=0.3,
            )
            response_content = response.choices[0].message.content.strip()
-        search_logger.debug(f"OpenAI API response: {repr(response_content[:200])}...")
+            search_logger.debug(f"LLM API response: {repr(response_content[:200])}...")
            result = json.loads(response_content)
            # Validate the response has the required fields
            if not result.get("example_name") or not result.get("summary"):
-            search_logger.warning(f"Incomplete response from OpenAI: {result}")
+                search_logger.warning(f"Incomplete response from LLM: {result}")
            final_result = {
                "example_name": result.get(
@ -611,14 +593,14 @@ Format your response as JSON:
    except json.JSONDecodeError as e:
        search_logger.error(
-            f"Failed to parse JSON response from OpenAI: {e}, Response: {repr(response_content) if 'response_content' in locals() else 'No response'}"
+            f"Failed to parse JSON response from LLM: {e}, Response: {repr(response_content) if 'response_content' in locals() else 'No response'}"
        )
        return {
            "example_name": f"Code Example{f' ({language})' if language else ''}",
            "summary": "Code example for demonstration purposes.",
        }
    except Exception as e:
-        search_logger.error(f"Error generating code example summary: {e}, Model: {model_choice}")
+        search_logger.error(f"Error generating code summary using unified LLM provider: {e}")
        return {
            "example_name": f"Code Example{f' ({language})' if language else ''}",
            "summary": "Code example for demonstration purposes.",
@ -867,6 +849,30 @@ async def add_code_examples_to_supabase(
        valid_embeddings = result.embeddings
        successful_texts = result.texts_processed
        # Get model information for tracking
        from ..llm_provider_service import get_embedding_model
        from ..credential_service import credential_service
        # Get embedding model name
        embedding_model_name = await get_embedding_model(provider=provider)
        # Get LLM chat model (used for code summaries and contextual embeddings if enabled)
        llm_chat_model = None
        try:
            # First check if contextual embeddings were used
            if use_contextual_embeddings:
                provider_config = await credential_service.get_active_provider("llm")
                llm_chat_model = provider_config.get("chat_model", "")
                if not llm_chat_model:
                    # Fallback to MODEL_CHOICE
                    llm_chat_model = await credential_service.get_credential("MODEL_CHOICE", "gpt-4o-mini")
            else:
                # For code summaries, we use MODEL_CHOICE
                llm_chat_model = _get_model_choice()
        except Exception as e:
            search_logger.warning(f"Failed to get LLM chat model: {e}")
            llm_chat_model = "gpt-4o-mini"  # Default fallback
        if not valid_embeddings:
            search_logger.warning("Skipping batch - no successful embeddings created")
            continue
@ -899,6 +905,23 @@ async def add_code_examples_to_supabase(
                parsed_url = urlparse(urls[idx])
                source_id = parsed_url.netloc or parsed_url.path
            # Determine the correct embedding column based on dimension
            embedding_dim = len(embedding) if isinstance(embedding, list) else len(embedding.tolist())
            embedding_column = None
            if embedding_dim == 768:
                embedding_column = "embedding_768"
            elif embedding_dim == 1024:
                embedding_column = "embedding_1024"
            elif embedding_dim == 1536:
                embedding_column = "embedding_1536"
            elif embedding_dim == 3072:
                embedding_column = "embedding_3072"
            else:
                # Default to closest supported dimension
                search_logger.warning(f"Unsupported embedding dimension {embedding_dim}, using embedding_1536")
                embedding_column = "embedding_1536"
            batch_data.append({
                "url": urls[idx],
                "chunk_number": chunk_numbers[idx],
@ -906,7 +929,10 @@ async def add_code_examples_to_supabase(
                "summary": summaries[idx],
                "metadata": metadatas[idx],  # Store as JSON object, not string
                "source_id": source_id,
-                "embedding": embedding,
+                embedding_column: embedding,
                "llm_chat_model": llm_chat_model,  # Add LLM model tracking
                "embedding_model": embedding_model_name,  # Add embedding model tracking
                "embedding_dimension": embedding_dim,  # Add dimension tracking
            })
        if not batch_data:
--- a/python/src/server/services/storage/document_storage_service.py
+++ b/python/src/server/services/storage/document_storage_service.py
@ -9,7 +9,6 @@ import os
 from typing import Any
 from ...config.logfire_config import safe_span, search_logger
 from ..credential_service import credential_service
 from ..embeddings.contextual_embedding_service import generate_contextual_embeddings_batch
 from ..embeddings.embedding_service import create_embeddings_batch
@ -59,7 +58,9 @@ async def add_documents_to_supabase(
        # Load settings from database
        try:
-            rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
+            # Defensive import to handle any initialization issues
            from ..credential_service import credential_service as cred_service
            rag_settings = await cred_service.get_credentials_by_category("rag_strategy")
            if batch_size is None:
                batch_size = int(rag_settings.get("DOCUMENT_STORAGE_BATCH_SIZE", "50"))
            # Clamp batch sizes to sane minimums to prevent crashes
@ -327,6 +328,26 @@ async def add_documents_to_supabase(
            batch_embeddings = result.embeddings
            successful_texts = result.texts_processed
            # Get model information for tracking
            from ..llm_provider_service import get_embedding_model
            from ..credential_service import credential_service
            # Get embedding model name
            embedding_model_name = await get_embedding_model(provider=provider)
            # Get LLM chat model (used for contextual embeddings if enabled)
            llm_chat_model = None
            if use_contextual_embeddings:
                try:
                    provider_config = await credential_service.get_active_provider("llm")
                    llm_chat_model = provider_config.get("chat_model", "")
                    if not llm_chat_model:
                        # Fallback to MODEL_CHOICE or provider defaults
                        llm_chat_model = await credential_service.get_credential("MODEL_CHOICE", "gpt-4o-mini")
                except Exception as e:
                    search_logger.warning(f"Failed to get LLM chat model: {e}")
                    llm_chat_model = "gpt-4o-mini"  # Default fallback
            if not batch_embeddings:
                search_logger.warning(
                    f"Skipping batch {batch_num} - no successful embeddings created"
@ -361,13 +382,33 @@ async def add_documents_to_supabase(
                    )
                    continue
                # Determine the correct embedding column based on dimension
                embedding_dim = len(embedding) if isinstance(embedding, list) else len(embedding.tolist())
                embedding_column = None
                if embedding_dim == 768:
                    embedding_column = "embedding_768"
                elif embedding_dim == 1024:
                    embedding_column = "embedding_1024"
                elif embedding_dim == 1536:
                    embedding_column = "embedding_1536"
                elif embedding_dim == 3072:
                    embedding_column = "embedding_3072"
                else:
                    # Default to closest supported dimension
                    search_logger.warning(f"Unsupported embedding dimension {embedding_dim}, using embedding_1536")
                    embedding_column = "embedding_1536"
                data = {
                    "url": batch_urls[j],
                    "chunk_number": batch_chunk_numbers[j],
                    "content": text,  # Use the successful text
                    "metadata": {"chunk_size": len(text), **batch_metadatas[j]},
                    "source_id": source_id,
-                    "embedding": embedding,  # Use the successful embedding
+                    embedding_column: embedding,  # Use the successful embedding with correct column
                    "llm_chat_model": llm_chat_model,  # Add LLM model tracking
                    "embedding_model": embedding_model_name,  # Add embedding model tracking
                    "embedding_dimension": embedding_dim,  # Add dimension tracking
                }
                batch_data.append(data)
--- a/python/tests/test_async_llm_provider_service.py
+++ b/python/tests/test_async_llm_provider_service.py
@ -205,8 +205,8 @@ class TestAsyncLLMProviderService:
                mock_credential_service.get_active_provider.assert_called_once_with("embedding")
    @pytest.mark.asyncio
-    async def test_get_llm_client_missing_openai_key(self, mock_credential_service):
+    async def test_get_llm_client_missing_openai_key_with_ollama_fallback(self, mock_credential_service):
-        """Test error handling when OpenAI API key is missing"""
+        """Test successful fallback to Ollama when OpenAI API key is missing"""
        config_without_key = {
            "provider": "openai",
            "api_key": None,
@ -215,11 +215,49 @@ class TestAsyncLLMProviderService:
            "embedding_model": "text-embedding-3-small",
        }
        mock_credential_service.get_active_provider.return_value = config_without_key
        mock_credential_service.get_credentials_by_category = AsyncMock(return_value={
            "LLM_BASE_URL": "http://localhost:11434"
        })
        with patch(
            "src.server.services.llm_provider_service.credential_service", mock_credential_service
        ):
-            with pytest.raises(ValueError, match="OpenAI API key not found"):
+            with patch(
                "src.server.services.llm_provider_service.openai.AsyncOpenAI"
            ) as mock_openai:
                mock_client = MagicMock()
                mock_openai.return_value = mock_client
                # Should fallback to Ollama instead of raising an error
                async with get_llm_client() as client:
                    assert client == mock_client
                    # Verify it created an Ollama client with correct params
                    mock_openai.assert_called_once_with(
                        api_key="ollama",
                        base_url="http://localhost:11434/v1"
                    )
    @pytest.mark.asyncio
    async def test_get_llm_client_missing_openai_key(self, mock_credential_service):
        """Test error when OpenAI API key is missing and Ollama fallback fails"""
        config_without_key = {
            "provider": "openai",
            "api_key": None,
            "base_url": None,
            "chat_model": "gpt-4",
            "embedding_model": "text-embedding-3-small",
        }
        mock_credential_service.get_active_provider.return_value = config_without_key
        # Mock get_credentials_by_category to raise an exception, simulating Ollama fallback failure
        mock_credential_service.get_credentials_by_category = AsyncMock(side_effect=Exception("Database error"))
        # Mock openai.AsyncOpenAI to fail when creating Ollama client with fallback URL
        with patch(
            "src.server.services.llm_provider_service.credential_service", mock_credential_service
        ), patch("src.server.services.llm_provider_service.openai.AsyncOpenAI") as mock_openai:
            mock_openai.side_effect = Exception("Connection failed")
            with pytest.raises(ValueError, match="OpenAI API key not found and Ollama fallback failed"):
                async with get_llm_client():
                    pass