OllamaChat interface showing a conversation with RAG citations, memory badge, and dark-themed sidebar

OllamaChat

Self Hosted AI Chat Platform

A self-hosted ChatGPT alternative that runs entirely on your own machine using Ollama. It can search your documents to answer questions, remember things across conversations, automatically switch to the right model for coding or vision tasks, use tools to search the web, and optionally speak and listen, all without sending anything to the cloud.

MKV
Role
Full-Stack Engineer
Duration
Ongoing
Team
Solo Project
Year
2026

Project Details

I built this project to truly understand how AI tools work under the hood, not just use them as a black box. What started as a simple Ollama playground grew into a fully-featured local AI platform. It searches your own documents to give grounded answers with confidence scoring (RAG), remembers useful things you've told it across sessions, detects when you're asking a coding question or sending an image and routes to the right model, runs an agentic tool-use loop for web search and URL fetching, supports extended reasoning with think blocks, and offers optional voice input and output via a locally-hosted speech service. Everything runs on your own hardware: no subscriptions, no data leaving your machine.

Results & Impact

Document search directly

Into SQLite using native vector embeddings with no external vector database needed

A multi-stage message pipeline (system prompt → memory → document context → history)

With live streaming

A memory system

That automatically captures, ranks, and recalls preferences and facts across conversations

Smart model routing

That detects coding intent, image attachments, and vision capability to switch models transparently

An agentic tool-use loop

With web search and URL fetching across up to five rounds per turn

Grounding confidence scoring (high/medium/low) on RAG answers

With source citations

Extended reasoning

With streaming think-block detection and buffering

Optional voice I/O

Using locally-hosted Whisper (STT) and Kokoro (TTS) with intelligent sentence splitting

Document ingestion

For Markdown, PDFs, code files, and live web URLs with language-aware chunking

Per-conversation toggles

For RAG, memory, agent mode, and custom system prompts

Challenge to Solution

What had to be solved

Build a self-hosted AI chat app that matches cloud tools in capability while running completely locally.

The hard parts: implementing document search without a third-party vector database, building a memory system that captures useful context without flooding every message with noise, detecting coding and vision intent to route to the right model, adding an agentic tool-use loop, supporting extended model reasoning, and adding voice I/O, all in a clean, fast UI.

How it came together

Document search is powered by SQLite with native vector support (libSQL), so there's no need for an external service like Pinecone.

Every message goes through a multi-stage pipeline: system instructions, then relevant memories ranked by relevance and recency, then matching document excerpts with grounding confidence scores, then conversation history, all streamed live. An agentic loop lets the model call tools like web search and URL fetching across up to five rounds before composing a final answer. Model routing auto-detects coding patterns, image attachments, and vision capability to transparently switch models mid-conversation. The memory system auto-extracts facts per turn, scores them by relevance, recency, and frequency, and supports superseding outdated memories. Voice runs through a Docker sidecar using Whisper for speech-to-text and Kokoro for text-to-speech, with intelligent sentence splitting for natural speech pacing.

Product in Use

Chat interface showing a live conversation with streaming response and sidebar
Real-time streaming chat with conversation history, model selector, and RAG toggle
Memory page showing auto-captured facts and preferences across conversations
Memory manager with auto-captured items, search, filter, and usage tracking
Settings page showing model configuration, voice options, pipeline parameters, and watched folders
Configurable model settings, voice I/O, pipeline parameters, and file watcher paths
Clean overview of the OllamaChat interface without an active conversation
Full application overview with sidebar, model selector, and empty chat state
Knowledge base page showing uploaded documents with indexing status
Document management with drag-and-drop upload, URL ingestion, and chunk browsing

Key Features

Chat with any locally installed Ollama model

With real-time streaming responses

Automatically switches

To a dedicated coding model when it detects coding questions

Auto-routes image attachments

To vision-capable models with drag-and-drop and clipboard paste support

Upload documents

PDFs, code files, or web URLs to a searchable knowledge base

Searches your documents

And injects relevant excerpts into every answer, with confidence scoring and source citations

Remembers preferences

And facts across conversations with automatic extraction, relevance ranking, and memory superseding

Agentic tool-use loop

That can search the web and fetch URLs across multiple rounds before answering

Supports extended reasoning

With think-block streaming for compatible models

Push-to-talk voice input

And spoken responses via locally-hosted speech models with natural sentence pacing

Per-conversation toggles

For RAG, memory, agent mode, and custom system prompts

Persistent conversation history

With automatically generated titles

Watch a folder

And automatically index new files as they are added