Building an Offline Searchable Vault

Theory is useful. Tools are nice. But at some point you need to build something.

This chapter is a step-by-step guide to constructing a fully offline, searchable personal knowledge base. By the end, you will have a system that stores your notes as plain markdown files, indexes them for instant full-text search using SQLite FTS5, provides a command-line search interface, and optionally serves a simple web UI for browsing your vault — all running entirely on your own hardware, with no cloud dependency, no subscription, and no telemetry phoning home to report your reading habits.

We will use Obsidian as the note-taking frontend, but the search infrastructure we build is tool-agnostic. It works with any collection of markdown files, regardless of which editor created them.

Setting Up Your Obsidian Vault

If you have not already created an Obsidian vault, now is the time. A vault in Obsidian terms is simply a folder on your filesystem that contains markdown files. That is all it is. Obsidian adds a .obsidian configuration folder for its own settings, but your notes are just .md files in directories.

Folder Structure

Folder structure is a religious topic in PKM circles. Some people advocate a flat structure with no folders at all, relying entirely on links and tags for organization. Others build deep hierarchies that would make a librarian weep with joy. The right answer, as usual, is somewhere in the middle.

Here is a structure that balances organization with simplicity:

vault/
├── 00-inbox/          # New captures, unprocessed notes
├── 01-projects/       # Active project notes
├── 02-areas/          # Ongoing areas of responsibility
├── 03-resources/      # Reference material by topic
├── 04-archive/        # Completed/inactive material
├── 05-templates/      # Note templates
├── assets/            # Images, PDFs, attachments
└── daily/             # Daily notes (if you use them)

The numbered prefixes keep folders in a consistent order in file managers. The inbox is critical — it is where everything lands before you decide where it belongs. The PARA-inspired breakdown (Projects, Areas, Resources, Archive) provides actionability-based organization without excessive depth.

Note Naming Conventions

Consistency in naming saves you grief later. A few conventions that work well:

  • Use lowercase with hyphens: building-an-offline-vault.md rather than Building An Offline Vault.md. Hyphens survive URL encoding, are easy to type, and avoid case-sensitivity issues on different operating systems.
  • Prefix date-specific notes with ISO dates: 2026-03-21-meeting-notes.md. This sorts chronologically in any file manager.
  • Keep names descriptive but concise. You should be able to guess a note's content from its filename without opening it.

Essential Obsidian Settings

A few settings to configure in a fresh vault:

  • Default location for new notes: Set to your inbox folder. Every new note lands there until you explicitly move it.
  • Attachment folder: Point to assets/ so images and files stay organized.
  • Use wikilinks: Enable [[wikilinks]] for internal linking. They are more readable and Obsidian resolves them regardless of folder location.
  • Strict line breaks: Disable this unless you have a specific reason. It makes markdown render more naturally.

Obsidian has built-in search, and it is decent. But we want something that works independently of Obsidian — a search system we control completely, that we can extend with custom ranking, integrate into scripts, and query from the command line or a web interface.

SQLite's FTS5 (Full-Text Search, version 5) extension is perfect for this. It is included in every standard SQLite distribution, requires no separate server, stores everything in a single file, and handles full-text search with sophisticated ranking right out of the box.

Creating the Search Index

Here is a Python script that walks your vault, reads every markdown file, and indexes it in an FTS5 table:

#!/usr/bin/env python3
"""index_vault.py — Index a markdown vault into SQLite FTS5."""

import sqlite3
import os
import sys
import time
from pathlib import Path
from datetime import datetime

VAULT_PATH = os.environ.get("VAULT_PATH", os.path.expanduser("~/vault"))
DB_PATH = os.environ.get("VAULT_DB", os.path.expanduser("~/vault/.search.db"))

def create_database(db_path: str) -> sqlite3.Connection:
    """Create or open the search database with FTS5 table."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")

    conn.executescript("""
        CREATE TABLE IF NOT EXISTS notes (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            path TEXT UNIQUE NOT NULL,
            title TEXT,
            content TEXT,
            modified REAL,
            indexed_at REAL
        );

        CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5(
            title,
            content,
            path UNINDEXED,
            content='notes',
            content_rowid='id',
            tokenize='porter unicode61 remove_diacritics 2'
        );

        -- Triggers to keep FTS index in sync with notes table
        CREATE TRIGGER IF NOT EXISTS notes_ai AFTER INSERT ON notes BEGIN
            INSERT INTO notes_fts(rowid, title, content, path)
            VALUES (new.id, new.title, new.content, new.path);
        END;

        CREATE TRIGGER IF NOT EXISTS notes_ad AFTER DELETE ON notes BEGIN
            INSERT INTO notes_fts(notes_fts, rowid, title, content, path)
            VALUES('delete', old.id, old.title, old.content, old.path);
        END;

        CREATE TRIGGER IF NOT EXISTS notes_au AFTER UPDATE ON notes BEGIN
            INSERT INTO notes_fts(notes_fts, rowid, title, content, path)
            VALUES('delete', old.id, old.title, old.content, old.path);
            INSERT INTO notes_fts(rowid, title, content, path)
            VALUES (new.id, new.title, new.content, new.path);
        END;
    """)

    return conn

def extract_title(content: str, filepath: Path) -> str:
    """Extract title from first H1 heading, or fall back to filename."""
    for line in content.split('\n'):
        line = line.strip()
        if line.startswith('# '):
            return line[2:].strip()
    return filepath.stem.replace('-', ' ').title()

def index_vault(vault_path: str, conn: sqlite3.Connection) -> dict:
    """Walk the vault and index all markdown files."""
    vault = Path(vault_path)
    stats = {"added": 0, "updated": 0, "skipped": 0, "deleted": 0}

    # Gather all current markdown files
    current_files = set()
    for md_file in vault.rglob("*.md"):
        # Skip hidden directories (like .obsidian)
        if any(part.startswith('.') for part in md_file.parts):
            continue
        rel_path = str(md_file.relative_to(vault))
        current_files.add(rel_path)

        modified = md_file.stat().st_mtime

        # Check if file needs reindexing
        existing = conn.execute(
            "SELECT modified FROM notes WHERE path = ?", (rel_path,)
        ).fetchone()

        if existing and existing[0] >= modified:
            stats["skipped"] += 1
            continue

        content = md_file.read_text(encoding="utf-8", errors="replace")
        title = extract_title(content, md_file)
        now = time.time()

        if existing:
            conn.execute(
                """UPDATE notes
                   SET title=?, content=?, modified=?, indexed_at=?
                   WHERE path=?""",
                (title, content, modified, now, rel_path)
            )
            stats["updated"] += 1
        else:
            conn.execute(
                """INSERT INTO notes (path, title, content, modified, indexed_at)
                   VALUES (?, ?, ?, ?, ?)""",
                (rel_path, title, content, modified, now)
            )
            stats["added"] += 1

    # Remove notes for deleted files
    db_paths = conn.execute("SELECT path FROM notes").fetchall()
    for (db_path,) in db_paths:
        if db_path not in current_files:
            conn.execute("DELETE FROM notes WHERE path = ?", (db_path,))
            stats["deleted"] += 1

    conn.commit()
    return stats

if __name__ == "__main__":
    print(f"Indexing vault: {VAULT_PATH}")
    conn = create_database(DB_PATH)
    stats = index_vault(VAULT_PATH, conn)
    total = conn.execute("SELECT COUNT(*) FROM notes").fetchone()[0]
    print(f"Done. Added: {stats['added']}, Updated: {stats['updated']}, "
          f"Skipped: {stats['skipped']}, Deleted: {stats['deleted']}, "
          f"Total: {total}")
    conn.close()

The FTS5 table uses the porter tokenizer for stemming (so searching for "running" also matches "run" and "runs") and unicode61 for proper Unicode handling. The remove_diacritics option ensures that searching for "cafe" matches "café."

Incremental Indexing

Notice that the script checks file modification times and skips unchanged files. This makes re-indexing fast — on a vault with 5,000 notes, a re-index after editing a few files takes milliseconds rather than seconds. You can run this indexer on a cron job (every few minutes) or trigger it with a filesystem watcher like fswatch or watchman.

# Add to crontab: reindex every 5 minutes
*/5 * * * * cd /path/to/scripts && python3 index_vault.py >> /tmp/vault-index.log 2>&1

For real-time indexing, use fswatch:

fswatch -o ~/vault --include='\.md$' --exclude='\.obsidian' | \
  xargs -n1 -I{} python3 index_vault.py

Building a CLI Search Tool

With the index in place, searching is straightforward:

#!/usr/bin/env python3
"""search_vault.py — Search your vault from the command line."""

import sqlite3
import sys
import os
import textwrap

DB_PATH = os.environ.get("VAULT_DB", os.path.expanduser("~/vault/.search.db"))

def search(query: str, limit: int = 20) -> list:
    """Search the vault using FTS5 and return ranked results."""
    conn = sqlite3.connect(DB_PATH)

    results = conn.execute("""
        SELECT
            notes.path,
            notes.title,
            snippet(notes_fts, 1, '>>>', '<<<', '...', 64) as snippet,
            rank
        FROM notes_fts
        JOIN notes ON notes.id = notes_fts.rowid
        WHERE notes_fts MATCH ?
        ORDER BY rank
        LIMIT ?
    """, (query, limit)).fetchall()

    conn.close()
    return results

def highlight(text: str) -> str:
    """Replace >>> <<< markers with ANSI bold."""
    return text.replace('>>>', '\033[1;33m').replace('<<<', '\033[0m')

def main():
    if len(sys.argv) < 2:
        print("Usage: search_vault.py <query>")
        print("Examples:")
        print('  search_vault.py "knowledge management"')
        print('  search_vault.py "sqlite AND fts5"')
        print('  search_vault.py "embed*"')
        sys.exit(1)

    query = " ".join(sys.argv[1:])
    results = search(query)

    if not results:
        print(f"No results for: {query}")
        sys.exit(0)

    print(f"\n{'='*60}")
    print(f" {len(results)} results for: {query}")
    print(f"{'='*60}\n")

    for i, (path, title, snippet, rank) in enumerate(results, 1):
        print(f"  {i}. \033[1m{title}\033[0m")
        print(f"     {path}")
        snippet_clean = highlight(snippet.replace('\n', ' '))
        wrapped = textwrap.fill(snippet_clean, width=72,
                                initial_indent="     ",
                                subsequent_indent="     ")
        print(wrapped)
        print()

if __name__ == "__main__":
    main()

FTS5 supports a rich query syntax out of the box:

  • Simple terms: knowledge management — matches notes containing both words.
  • Phrases: "knowledge management" — matches the exact phrase.
  • Boolean operators: sqlite AND fts5, obsidian OR logseq, python NOT javascript.
  • Prefix matching: embed* — matches "embed," "embedding," "embeddings," etc.
  • Column filters: title:zettelkasten — searches only the title field.
  • NEAR queries: NEAR(sqlite fts5, 10) — matches when the terms appear within 10 tokens of each other.

This gives you a search capability that rivals most commercial tools, running entirely on your machine, in a single-file database.

Power Searching with ripgrep and fzf

SQLite FTS5 handles structured full-text search beautifully, but sometimes you want raw speed and flexibility. This is where ripgrep and fzf come in — two command-line tools that together provide an interactive search experience that is almost unreasonably fast.

ripgrep (rg)

ripgrep is a line-oriented search tool that recursively searches directories for a regex pattern. It is written in Rust and is fast enough to search tens of thousands of files in milliseconds.

# Basic search
rg "knowledge management" ~/vault

# Case-insensitive search
rg -i "zettelkasten" ~/vault

# Search only markdown files
rg --type md "embedding" ~/vault

# Show context around matches (2 lines before and after)
rg -C 2 "SQLite FTS" ~/vault

# Search for a pattern, excluding certain directories
rg "TODO" ~/vault --glob '!.obsidian/*' --glob '!assets/*'

# Count matches per file
rg -c "import" ~/vault --type md

ripgrep respects .gitignore files by default, which means it automatically skips directories like .obsidian, node_modules, and other cruft. For a vault managed with git (as yours should be), this means searches are fast and focused on your actual content.

fzf (Fuzzy Finder)

fzf is an interactive fuzzy finder that reads lines from stdin and lets you filter them interactively with a fuzzy matching algorithm. Combined with ripgrep, it creates a search experience that is genuinely addictive:

# Interactive file finder in your vault
find ~/vault -name '*.md' | fzf --preview 'head -50 {}'

# Interactive content search with preview
rg --line-number --no-heading --color=always "" ~/vault/*.md | \
  fzf --ansi --delimiter=: \
      --preview 'bat --color=always --highlight-line {2} {1}' \
      --preview-window '+{2}-10'

The power move is combining these into a shell function:

# Add to your .zshrc or .bashrc
vs() {
    # Vault Search: interactive ripgrep + fzf with preview
    local vault="${VAULT_PATH:-$HOME/vault}"
    local query="${1:-}"

    rg --column --line-number --no-heading --color=always \
       --smart-case "${query}" "$vault" --glob '*.md' |
    fzf --ansi \
        --delimiter=: \
        --bind 'change:reload:rg --column --line-number --no-heading \
                --color=always --smart-case {q} '"$vault"' --glob "*.md" || true' \
        --preview 'bat --color=always --highlight-line {2} {1} 2>/dev/null || head -50 {1}' \
        --preview-window 'right:60%:+{2}-10' \
        --bind 'enter:become(${EDITOR:-vim} {1} +{2})'
}

Now typing vs in your terminal gives you interactive, real-time search across your entire vault with preview panes and direct-to-editor jumping. It is the kind of thing that makes you wonder why you ever clicked through folder hierarchies.

A Simple Web UI for Browsing Your Vault

Command-line tools are excellent for focused searching, but sometimes you want to browse. Here is a minimal web interface using FastAPI that lets you search and read your notes through a browser:

#!/usr/bin/env python3
"""vault_web.py — A minimal web UI for browsing and searching your vault."""

import sqlite3
import os
from pathlib import Path
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
import markdown
import uvicorn

VAULT_PATH = os.environ.get("VAULT_PATH", os.path.expanduser("~/vault"))
DB_PATH = os.environ.get("VAULT_DB", os.path.expanduser("~/vault/.search.db"))

app = FastAPI(title="Vault Browser")

STYLES = """
<style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body {
        font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
        max-width: 800px; margin: 0 auto; padding: 20px;
        background: #1a1a2e; color: #e0e0e0; line-height: 1.6;
    }
    h1 { color: #e94560; margin-bottom: 20px; }
    h2 { color: #e94560; margin: 20px 0 10px; }
    a { color: #0f3460; }
    .search-box {
        width: 100%; padding: 12px; font-size: 16px;
        border: 2px solid #333; border-radius: 8px;
        background: #16213e; color: #e0e0e0;
        margin-bottom: 20px;
    }
    .result { padding: 15px; margin: 10px 0; background: #16213e;
              border-radius: 8px; border-left: 3px solid #e94560; }
    .result h3 { margin-bottom: 5px; }
    .result h3 a { color: #e94560; text-decoration: none; }
    .result .path { color: #888; font-size: 0.85em; }
    .result .snippet { margin-top: 8px; color: #ccc; }
    .result .snippet mark { background: #e94560; color: white;
                             padding: 1px 3px; border-radius: 2px; }
    .note-content { background: #16213e; padding: 20px; border-radius: 8px; }
    .note-content h1, .note-content h2, .note-content h3 { color: #e94560; }
    .note-content code { background: #0f3460; padding: 2px 6px;
                          border-radius: 3px; }
    .note-content pre { background: #0f3460; padding: 15px;
                         border-radius: 8px; overflow-x: auto; }
    .note-content a { color: #4db8ff; }
    .back-link { display: inline-block; margin-bottom: 15px; color: #4db8ff;
                  text-decoration: none; }
    .nav { margin-bottom: 20px; }
    .nav a { color: #4db8ff; text-decoration: none; margin-right: 15px; }
</style>
"""

def get_db():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn

@app.get("/", response_class=HTMLResponse)
async def home(q: str = ""):
    html = f"""<!DOCTYPE html><html><head><title>Vault</title>{STYLES}</head><body>
    <h1>Vault Search</h1>
    <form method="get" action="/">
        <input class="search-box" type="text" name="q"
               value="{q}" placeholder="Search your vault..."
               autofocus>
    </form>"""

    if q:
        conn = get_db()
        results = conn.execute("""
            SELECT notes.path, notes.title,
                   snippet(notes_fts, 1, '<mark>', '</mark>', '...', 48) as snip
            FROM notes_fts
            JOIN notes ON notes.id = notes_fts.rowid
            WHERE notes_fts MATCH ?
            ORDER BY rank LIMIT 30
        """, (q,)).fetchall()
        conn.close()

        html += f"<p>{len(results)} results</p>"
        for r in results:
            html += f"""<div class="result">
                <h3><a href="/note/{r['path']}">{r['title']}</a></h3>
                <div class="path">{r['path']}</div>
                <div class="snippet">{r['snip']}</div>
            </div>"""
    else:
        # Show recent notes
        conn = get_db()
        recent = conn.execute(
            "SELECT path, title FROM notes ORDER BY modified DESC LIMIT 20"
        ).fetchall()
        conn.close()

        html += "<h2>Recent Notes</h2>"
        for r in recent:
            html += f"""<div class="result">
                <h3><a href="/note/{r['path']}">{r['title']}</a></h3>
                <div class="path">{r['path']}</div>
            </div>"""

    html += "</body></html>"
    return html

@app.get("/note/{path:path}", response_class=HTMLResponse)
async def view_note(path: str):
    filepath = Path(VAULT_PATH) / path
    if not filepath.exists() or not filepath.suffix == '.md':
        return HTMLResponse("<h1>Not found</h1>", status_code=404)

    content = filepath.read_text(encoding="utf-8")

    # Convert markdown to HTML
    md = markdown.Markdown(extensions=['fenced_code', 'tables', 'toc'])
    html_content = md.convert(content)

    html = f"""<!DOCTYPE html><html><head><title>{path}</title>{STYLES}</head><body>
    <a class="back-link" href="/">&#8592; Back to search</a>
    <div class="note-content">{html_content}</div>
    </body></html>"""
    return html

if __name__ == "__main__":
    uvicorn.run(app, host="127.0.0.1", port=8888)

Install the dependencies and run:

pip install fastapi uvicorn markdown
python3 vault_web.py

Open http://127.0.0.1:8888 and you have a searchable, browsable view of your vault. The search is backed by the same FTS5 index, so it is fast and supports the same query syntax.

This is intentionally minimal. A production-grade version might add:

  • Wikilink resolution (converting [[note name]] to clickable links).
  • Tag-based filtering.
  • Backlink display.
  • A graph visualization of note connections.
  • WebSocket-based live search (results appear as you type).

But even the minimal version is useful. It gives you a way to search and read your notes from any device on your local network — a phone, a tablet, another computer — without installing anything.

Indexing Strategies

As your vault grows, a few indexing strategies keep things performant and useful.

What to Index

Index all markdown files. Skip binary files (images, PDFs), configuration files (.obsidian/), and any generated content. The exclusion of .obsidian/ is particularly important — it contains JSON configuration files that would pollute your search results with irrelevant matches.

Metadata Extraction

Beyond raw content, consider extracting and indexing structured metadata:

import re
import yaml

def extract_metadata(content: str) -> dict:
    """Extract YAML frontmatter and inline metadata from a note."""
    metadata = {}

    # YAML frontmatter
    if content.startswith('---'):
        parts = content.split('---', 2)
        if len(parts) >= 3:
            try:
                metadata = yaml.safe_load(parts[1]) or {}
            except yaml.YAMLError:
                pass

    # Extract tags (both #tag and tags: in frontmatter)
    tags = set(metadata.get('tags', []) or [])
    tags.update(re.findall(r'(?:^|\s)#([a-zA-Z][\w/-]*)', content))
    metadata['tags'] = list(tags)

    # Extract wikilinks
    metadata['links'] = re.findall(r'\[\[([^\]|]+)(?:\|[^\]]+)?\]\]', content)

    # Word count
    metadata['word_count'] = len(content.split())

    return metadata

Storing tags and links in separate database tables allows for powerful queries: "Find all notes tagged #machine-learning that link to my note on embeddings."

Rebuild vs. Incremental

The indexer we built uses incremental updates based on file modification times. This is correct for routine use. But occasionally — after reorganizing your vault, renaming files, or upgrading the indexer itself — you want a full rebuild:

# Full rebuild: delete the database and reindex
rm ~/vault/.search.db
python3 index_vault.py

A full rebuild of a 10,000-note vault typically takes 5-10 seconds. Fast enough that you can afford to do it whenever something feels off.

Putting It All Together

Here is the complete workflow:

  1. Write notes in Obsidian (or any markdown editor).
  2. Index automatically via cron or filesystem watcher.
  3. Search from the command line with search_vault.py or the vs shell function for interactive fuzzy search.
  4. Browse from the web with vault_web.py for reading and exploration.
  5. Back up with git — your entire vault, including the search database, can be version-controlled.
# Initialize git in your vault
cd ~/vault
git init
echo ".obsidian/workspace.json" >> .gitignore
echo ".search.db" >> .gitignore

# Regular backups
git add -A && git commit -m "Vault snapshot $(date +%Y-%m-%d)"

Note that we exclude the search database from git — it is a derived artifact that can be regenerated from the markdown files at any time. We also exclude workspace.json (which changes constantly as you navigate in Obsidian) to keep the commit history clean.

This system is entirely self-contained. It runs on your hardware, depends on no external services, and is built from standard, well-maintained components (Python, SQLite, ripgrep, fzf). If any single component breaks or becomes unavailable, your data remains accessible as plain text files. That is the resilience you get from building on open standards and simple tools.

In the next chapter, we will add AI to this foundation — local embedding models and LLMs that can understand your notes semantically, not just match keywords. But even without AI, what you have here is a personal knowledge base that outperforms most commercial offerings in the areas that matter most: speed, reliability, privacy, and durability.