Building a Local AI Playground with Ollama and Next.js

February 9, 2026·10 min read

Next.jsReactOllamaAI Integration

Running large language models locally is no longer a niche hobby. With Ollama, you can pull and run open-source models like Llama 3, Mistral, and Gemma on your own machine with a single command. No API keys, no cloud costs, no data leaving your network.

In this post, we will build a local AI playground using Ollama and Next.js, complete with streaming responses and a clean chat interface.

What is Ollama

Ollama is a runtime for open-source language models. It handles model management (downloading, versioning, configuration) and exposes a local REST API at http://localhost:11434. Think of it as Docker for LLMs.

Key things to know:

Models run entirely on your machine (CPU or GPU)
No API keys required for local usage
The API is compatible with the OpenAI chat format
It ships with a CLI for model management (ollama pull, ollama run, ollama list)

Prerequisites

Install Ollama from ollama.com and pull a model:

# Install Ollama (macOS)
brew install ollama
 
# Start the Ollama server
ollama serve
 
# In another terminal, pull a model
ollama pull llama3.1

Verify the server is running:

curl http://localhost:11434/api/tags

You should see a JSON response listing your downloaded models.

Project Setup

Scaffold a Next.js project and install the official Ollama client:

npx create-next-app@latest ai-playground --typescript --app
cd ai-playground
npm install ollama

The Server Side: Route Handler

Ollama runs locally, but the client that talks to it should live on the server. This keeps the architecture clean and avoids CORS issues from the browser. We will use a Next.js Route Handler that streams tokens back to the client.

// app/api/chat/route.ts
import { Ollama } from 'ollama';
import { NextRequest } from 'next/server';
 
const ollama = new Ollama({ host: 'http://127.0.0.1:11434' });
 
export async function POST(request: NextRequest) {
  const { messages, model } = await request.json();
 
  const response = await ollama.chat({
    model: model ?? 'llama3.1',
    messages,
    stream: true,
  });
 
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const data = JSON.stringify({
          content: chunk.message.content,
          done: chunk.done,
        });
        controller.enqueue(encoder.encode(`data: ${data}\n\n`));
      }
      controller.close();
    },
  });
 
  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}

A few things to note:

The Ollama client is instantiated on the server, pointing at the local Ollama daemon
We use stream: true to get an AsyncGenerator of token chunks
The response is formatted as Server-Sent Events (SSE) so the client can consume tokens incrementally

The Client Side: Chat Interface

Now let's build a hook that consumes the SSE stream and a component that renders the chat.

Stream Hook

// hooks/useChat.ts
'use client';
 
import { useState, useCallback, useRef } from 'react';
 
interface Message {
  role: 'user' | 'assistant';
  content: string;
}
 
export function useChat(model: string = 'llama3.1') {
  const [messages, setMessages] = useState<Message[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const abortRef = useRef<AbortController | null>(null);
 
  const send = useCallback(
    async (input: string) => {
      const userMessage: Message = { role: 'user', content: input };
      const updatedMessages = [...messages, userMessage];
 
      setMessages([...updatedMessages, { role: 'assistant', content: '' }]);
      setIsStreaming(true);
 
      abortRef.current = new AbortController();
 
      try {
        const res = await fetch('/api/chat', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ messages: updatedMessages, model }),
          signal: abortRef.current.signal,
        });
 
        if (!res.ok || !res.body) {
          throw new Error(`Request failed: ${res.status}`);
        }
 
        const reader = res.body.getReader();
        const decoder = new TextDecoder();
        let assistantContent = '';
 
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;
 
          const text = decoder.decode(value, { stream: true });
          const lines = text.split('\n').filter((line) => line.startsWith('data: '));
 
          for (const line of lines) {
            const json = JSON.parse(line.slice(6));
            assistantContent += json.content;
 
            setMessages([
              ...updatedMessages,
              { role: 'assistant', content: assistantContent },
            ]);
          }
        }
      } catch (error) {
        if (error instanceof DOMException && error.name === 'AbortError') return;
        console.error('Chat error:', error);
      } finally {
        setIsStreaming(false);
        abortRef.current = null;
      }
    },
    [messages, model]
  );
 
  const stop = useCallback(() => {
    abortRef.current?.abort();
  }, []);
 
  return { messages, isStreaming, send, stop };
}

This hook:

Manages the full conversation history
Streams tokens into the latest assistant message as they arrive
Exposes an AbortController via stop() so the user can cancel generation
Handles errors without crashing the UI

Chat Component

// app/page.tsx
'use client';
 
import { useState, type FormEvent } from 'react';
import { useChat } from '../hooks/useChat';
 
export default function Playground() {
  const [input, setInput] = useState('');
  const { messages, isStreaming, send, stop } = useChat();
 
  const handleSubmit = (e: FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isStreaming) return;
    send(input);
    setInput('');
  };
 
  return (
    <main>
      <h1>Local AI Playground</h1>
      <div>
        {messages.map((msg, i) => (
          <div key={i}>
            <strong>{msg.role}:</strong> {msg.content}
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask something..."
          disabled={isStreaming}
        />
        {isStreaming ? (
          <button type="button" onClick={stop}>Stop</button>
        ) : (
          <button type="submit">Send</button>
        )}
      </form>
    </main>
  );
}

Running It

Start both Ollama and the dev server:

# Terminal 1
ollama serve
 
# Terminal 2
npm run dev

Open http://localhost:3000, type a prompt, and you should see tokens streaming in from your local model.

Where to Go From Here

This is a minimal playground. Some natural next steps:

Model switching - call ollama.list() in a server action and let users pick a model from a dropdown
System prompts - prepend a { role: 'system', content: '...' } message to steer model behavior
Conversation persistence - save chat history to localStorage or a database
Embeddings and RAG - use ollama.embed() to build retrieval-augmented generation over your own documents

I built OllamaChat as a full implementation of these ideas. It is a self-hosted ChatGPT-style interface powered by Ollama, built with Next.js, React 19, Tailwind CSS v4, and Prisma with SQLite for conversation persistence. It also includes smart model routing that automatically picks the best model for each query, sending coding questions to code-specialized models and simple tasks to faster ones. If you want to skip the boilerplate and jump straight to a working local AI chat app, check it out.

The point of running models locally is control. No rate limits, no usage costs, no data leaving your machine. For prototyping, internal tools, and experimentation, it is hard to beat.

For more on how I incorporate cutting-edge technologies into my projects, feel free to explore my projects. If you're interested in leveraging these capabilities for your own applications, check out what I offer.