No description
Find a file
2026-05-06 22:49:03 +02:00
.agents/skills feat(ai-docs): cross-source search, input validation, UX improvements 2026-03-12 17:59:32 +01:00
.claude/skills refactor(ai-docs): replace kb-index with streamlined 4-command query skill 2026-03-12 17:10:25 +01:00
.kiro/skills refactor(ai-docs): replace kb-index with streamlined 4-command query skill 2026-03-12 17:10:25 +01:00
ai-docs fix(ai-docs): quote YAML description to fix skill discovery 2026-03-13 13:59:32 +01:00
.gitattributes feat: implement offline knowledge database system for AI assistants 2026-02-27 15:34:03 +01:00
.gitignore bd init: initialize beads issue tracking 2026-03-05 18:36:15 +01:00
pyproject.toml fix(kb): resolve crawl4ai noise and improve indexing reliability 2026-03-02 15:20:16 +01:00
README.md docs: update gitlab.com path (xarif/base/ → xarif/) after restructuring 2026-05-06 22:49:03 +02:00
uv.lock docs: update gitlab.com path (xarif/base/ → xarif/) after restructuring 2026-05-06 22:49:03 +02:00

Offline Knowledge Base for AI

A reusable system for AI assistants to maintain and query local caches of online documentation and source code.

Overview

This system provides two main components:

  • Part A (Scraping): Agent skills that download git repositories and scrape websites into structured local knowledge bases
  • Part B (Querying): A distributable agent skill that enables efficient searching and retrieval from knowledge bases

Features

  • 📥 Git Repository Cloning: Clone and update source code repositories
  • 🌐 Website Scraping: Crawl documentation sites with JavaScript support
  • 🔍 Efficient Search: ripgrep-powered search with context options
  • ⏱️ Update Frequency Control: Configurable update intervals
  • 🔒 Version Tracking: Lock files track commit hashes, scrape dates, and source IDs
  • 🤝 Team Sharing: Knowledge bases are git repositories

Quick Start

Create Knowledge Base

1. Install Dependencies

# Install Python dependencies for scraping skills
uv sync

# System dependencies (macOS)
brew install ripgrep git

# System dependencies (Ubuntu/Debian)
apt-get install ripgrep git

2. Create Global Configuration

  • Create ~/.ai-docs.yaml.
  • This file contains all knowledge bases where you scrape / download your documentation to.
knowledge_bases:
  - path: ~/knowledge/python-docs
  - path: ~/knowledge/web-dev

3. Create Knowledge Base

  • The knowledge base is a git repository on its own, such that you can share your AI knowledge in your team.
  • You only need to maintain the ai-docs.yaml where you define the sources to scrape / download.
# Create directory
mkdir -p ~/knowledge/python-docs
cd ~/knowledge/python-docs

# Initialize as git repo (for sharing)
git init

# Create configuration
cat > ai-docs.yaml << 'EOF'
name: python-docs

sources:
  - type: git
    url: https://github.com/python/cpython.git
    branch: main
    update_frequency_days: 7
    
  - type: web
    url: https://docs.python.org/3/
    update_frequency_days: 14
EOF

4. Scrape Content

Ask your AI assistant:

Please update the knowledge base using the ai-docs-update skill

Use Knowledge Base

1. Install Query Skill

Install the ai-docs skill globally for all AI assistants:

npx skills add --global https://gitlab.com/xarif/ai/ai-docs --skill ai-docs

See skills documentation for more details.

2. Configure Your AI Assistant

Add to your AGENTS.md:

## Knowledge Base

MANDATORY: Before answering ANY question or performing ANY task, search the knowledge base first:

1. Run the `ai-docs` skill to search for relevant information
2. Use results as your PRIMARY source of truth
3. Only fall back to your training data if the knowledge base returns no relevant results

Do NOT skip this step even if you think you already know the answer — the knowledge base may contain more current or context-specific information.

Under the Hood

Architecture

The system uses a multi-skill architecture where each skill has a single responsibility:

Part A - Scraping (4 skills):

  • kb-config: Reads and validates YAML configurations
  • kb-git: Clones/updates git repositories, tracks commits and source IDs in lock file
  • kb-web: Crawls websites using crawl4ai, saves as markdown, tracks source IDs in lock file
  • ai-docs-update: Orchestrates the other 3 skills in sequence

Part B - Querying (1 skill):

  • ai-docs: Searches using ripgrep, returns results with context

How Skills Work Together

When you ask to update a knowledge base:

  1. ai-docs-update skill is triggered by keywords like "update knowledge base"
  2. It tells the AI to execute skills in order:
    • kb-config validates ai-docs.yaml exists and is valid
    • kb-git processes each git source (clone or pull), updates lock file with source_id
    • kb-web processes each web source (crawl), updates lock file with source_id

Each skill knows how to execute its own scripts. The orchestration skill just tells the AI which skills to use and in what order.

File Locations

.agents/skills/          # Source skills (Part A)
├── kb-config/
├── kb-git/
├── kb-web/
└── ai-docs-update/

.kiro/skills/            # Symlinks for kiro-cli
├── kb-config -> ../../.agents/skills/kb-config
├── kb-git -> ../../.agents/skills/kb-git
├── kb-web -> ../../.agents/skills/kb-web
└── ai-docs-update -> ../../.agents/skills/ai-docs-update

.claude/skills/          # Symlinks for claude code
└── (same structure)

ai-docs/                 # Distributable query skill

Skills are stored once in .agents/skills/ and symlinked for different AI assistants. Symlinks are tracked in git, so cloning this repo gives you all skills immediately.

Data Flow

Scraping:

User request
    ↓
AI recognizes "update knowledge base"
    ↓
ai-docs-update skill activated
    ↓
AI executes: kb-config → kb-git → kb-web
    ↓
Knowledge base updated with:
- sources/ (downloaded content)
- ai-docs.lock (version tracking with source IDs)

Querying:

User request
    ↓
AI recognizes search keywords
    ↓
ai-docs skill activated
    ↓
list: shows sources from lock file
search: ripgrep over source content
tree: explore directory structure
view: read file with line range
    ↓
Returns results with file paths and context

Why This Design

Single Responsibility: Each skill does one thing well. Easy to test, debug, and maintain.

Composable: Skills can be used individually or orchestrated together.

Reusable: Same skills work across different AI assistants (kiro, claude, opencode).

Git-Friendly: Symlinks are tracked, so skills are versioned and shareable.

Progressive Disclosure: AI loads skill documentation only when needed, keeping context window efficient.

Configuration Files

~/.ai-docs.yaml (Global Registry)

knowledge_bases:
  - path: ~/knowledge/python-docs

ai-docs.yaml (Per-Knowledge Base Config)

name: python-docs

sources:
  - type: git
    url: https://github.com/python/cpython.git
    branch: main  # optional
    update_frequency_days: 7  # optional, default 30
    
  - type: web
    url: https://docs.python.org/3/
    update_frequency_days: 14  # optional, default 30

ai-docs.lock (Auto-Generated)

Tracks versions, scrape dates, and source IDs:

last_updated: "2026-02-27T12:00:00Z"
sources:
  - url: https://github.com/python/cpython.git
    source_id: a1b2c3d4
    type: git
    commit: a1b2c3d4...
    commit_date: "2026-02-25T10:30:00Z"
    last_scraped: "2026-02-27T12:00:00Z"
    local_path: sources/github.com/python/cpython
  - url: https://docs.python.org/3/
    source_id: e5f6g7h8
    type: web
    last_scraped: "2026-02-27T12:00:00Z"
    local_path: sources/docs.python.org/3

Directory Structure

Knowledge Base Structure

~/knowledge/python-docs/
├── ai-docs.yaml          # Source configuration
├── ai-docs.lock          # Version tracking with source IDs
├── sources/              # Downloaded content
│   ├── github.com/
│   │   └── python/
│   │       └── cpython/  # Git repo clone
│   └── docs.python.org/
│       └── 3/            # Scraped website
│           ├── index.md
│           └── ...
└── .git/                 # Knowledge base is a git repo

Regular Updates

Ask your AI assistant:

Please update all sources in my knowledge base at ~/knowledge/python-docs using the ai-docs-update skill

Or to force update (ignore frequency):

Please force update my knowledge base at ~/knowledge/python-docs using the ai-docs-update skill with --force flag

Querying

Ask your AI assistant:

Search my knowledge bases for "async programming"

For more specific searches:

Do a deep search in my knowledge bases for "asyncio.run"

Team Sharing

Knowledge bases are git repositories and can be shared:

cd ~/knowledge/python-docs
git add .
git commit -m "Update Python docs"
git push origin main

# Team members can clone
git clone <repo-url> ~/knowledge/python-docs

Troubleshooting

crawl4ai not found

uv add crawl4ai
crawl4ai-setup

ripgrep not found

# macOS
brew install ripgrep

# Ubuntu/Debian
apt-get install ripgrep

# Or use grep (automatic fallback)

No knowledge bases found

Check ~/.ai-docs.yaml exists and contains valid paths.

Source not updating

Use --force flag to ignore update frequency:

python .kiro/skills/kb-git/scripts/scrape_git.py ~/knowledge/python-docs --force

Architecture

See architecture.md for detailed system design.

Requirements

See requirements.md for complete requirements.

Design

See design.md for implementation details.

License

MIT