Local RAG with Ollama: How to Build a Private AI Chatbot for Your Files

Chat with Your Files: Building a 100% Local RAG System with Ollama

Large Language Models (LLMs) are incredibly powerful, but they suffer from two major limitations: they are cut off from data past their training date, and they know absolutely nothing about your private files (like your company's wiki, codebases, or personal PDFs). Feeding this data to online APIs like OpenAI can be a security and privacy nightmare.

The solution? **Retrieval-Augmented Generation (RAG)**. And the best part is that you can build one that runs **100% locally** on your own computer, ensuring your data never leaves your hard drive. In this guide, we'll build a private RAG pipeline using **Ollama** and a few lines of Python.

🧠 What is RAG?

RAG works by dividing your search query into three distinct phases:

  1. Indexing: Your documents are split into small paragraphs (chunks) and converted into numerical vectors (embeddings) representing their semantic meaning, which are stored in a database.
  2. Retrieval: When you ask a question, the system searches the database to find the chunks that are most semantically similar to your question.
  3. Generation: The system feeds your question *and* those retrieved text chunks into the local LLM, instructing it to answer your question *only* using that context.

🛠️ Setting Up Your Local Environment

First, make sure you have **Ollama** installed on your system. Run the following commands in your terminal to download a lightweight LLM and an embedding model:


# Download the Llama 3 model for text generation
ollama pull llama3

# Download the nomic-embed-text model for creating vectors
ollama pull nomic-embed-text

Next, install the required Python libraries for document parsing and vector storage:


pip install langchain langchain-community chromadb

🐍 The Python RAG Script

Here is a complete, lightweight Python script that reads a local text file, indexes it, and lets you query it using Ollama:


from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA

# 1. Load your private document
loader = TextLoader("my_private_notes.txt")
docs = loader.load()

# 2. Split document into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

# 3. Create Local Embeddings and store in Chroma Vector DB
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vector_store = Chroma.from_documents(chunks, embeddings)

# 4. Set up the local LLM (Llama 3)
llm = Ollama(model="llama3")

# 5. Create the QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vector_store.as_retriever(),
    return_source_documents=True
)

# 6. Query your local files privately!
query = "What were the main project milestones mentioned in my notes?"
response = qa_chain.invoke({"query": query})

print("Answer:", response["result"])
print("\nSources used:")
for doc in response["source_documents"]:
    print(f"- {doc.metadata['source']} (Content preview: {doc.page_content[:60]}...)")

🔒 Why Going Local Matters

Building a local RAG system gives you full ownership over your AI workflow. There are no API keys to manage, no monthly subscription fees, and no risk of sensitive financial data, medical records, or proprietary source code being leaked to third-party model training sets. You get absolute privacy at the cost of using your local GPU/CPU cycles.

Are you ready to migrate your AI assistants to local infrastructure? What private documents are you planning to index first? Let’s start a conversation in the comments below!

Comments