- Python 100%
| .agents/skills | ||
| .claude/skills | ||
| .kiro/skills | ||
| ai-docs | ||
| .gitattributes | ||
| .gitignore | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
Offline Knowledge Base for AI
A reusable system for AI assistants to maintain and query local caches of online documentation and source code.
Overview
This system provides two main components:
- Part A (Scraping): Agent skills that download git repositories and scrape websites into structured local knowledge bases
- Part B (Querying): A distributable agent skill that enables efficient searching and retrieval from knowledge bases
Features
- 📥 Git Repository Cloning: Clone and update source code repositories
- 🌐 Website Scraping: Crawl documentation sites with JavaScript support
- 🔍 Efficient Search: ripgrep-powered search with context options
- ⏱️ Update Frequency Control: Configurable update intervals
- 🔒 Version Tracking: Lock files track commit hashes, scrape dates, and source IDs
- 🤝 Team Sharing: Knowledge bases are git repositories
Quick Start
Create Knowledge Base
1. Install Dependencies
# Install Python dependencies for scraping skills
uv sync
# System dependencies (macOS)
brew install ripgrep git
# System dependencies (Ubuntu/Debian)
apt-get install ripgrep git
2. Create Global Configuration
- Create
~/.ai-docs.yaml. - This file contains all knowledge bases where you scrape / download your documentation to.
knowledge_bases:
- path: ~/knowledge/python-docs
- path: ~/knowledge/web-dev
3. Create Knowledge Base
- The knowledge base is a git repository on its own, such that you can share your AI knowledge in your team.
- You only need to maintain the
ai-docs.yamlwhere you define the sources to scrape / download.
# Create directory
mkdir -p ~/knowledge/python-docs
cd ~/knowledge/python-docs
# Initialize as git repo (for sharing)
git init
# Create configuration
cat > ai-docs.yaml << 'EOF'
name: python-docs
sources:
- type: git
url: https://github.com/python/cpython.git
branch: main
update_frequency_days: 7
- type: web
url: https://docs.python.org/3/
update_frequency_days: 14
EOF
4. Scrape Content
Ask your AI assistant:
Please update the knowledge base using the ai-docs-update skill
Use Knowledge Base
1. Install Query Skill
Install the ai-docs skill globally for all AI assistants:
npx skills add --global https://gitlab.com/xarif/ai/ai-docs --skill ai-docs
See skills documentation for more details.
2. Configure Your AI Assistant
Add to your AGENTS.md:
## Knowledge Base
MANDATORY: Before answering ANY question or performing ANY task, search the knowledge base first:
1. Run the `ai-docs` skill to search for relevant information
2. Use results as your PRIMARY source of truth
3. Only fall back to your training data if the knowledge base returns no relevant results
Do NOT skip this step even if you think you already know the answer — the knowledge base may contain more current or context-specific information.
Under the Hood
Architecture
The system uses a multi-skill architecture where each skill has a single responsibility:
Part A - Scraping (4 skills):
kb-config: Reads and validates YAML configurationskb-git: Clones/updates git repositories, tracks commits and source IDs in lock filekb-web: Crawls websites using crawl4ai, saves as markdown, tracks source IDs in lock fileai-docs-update: Orchestrates the other 3 skills in sequence
Part B - Querying (1 skill):
ai-docs: Searches using ripgrep, returns results with context
How Skills Work Together
When you ask to update a knowledge base:
- ai-docs-update skill is triggered by keywords like "update knowledge base"
- It tells the AI to execute skills in order:
- kb-config validates
ai-docs.yamlexists and is valid - kb-git processes each git source (clone or pull), updates lock file with source_id
- kb-web processes each web source (crawl), updates lock file with source_id
- kb-config validates
Each skill knows how to execute its own scripts. The orchestration skill just tells the AI which skills to use and in what order.
File Locations
.agents/skills/ # Source skills (Part A)
├── kb-config/
├── kb-git/
├── kb-web/
└── ai-docs-update/
.kiro/skills/ # Symlinks for kiro-cli
├── kb-config -> ../../.agents/skills/kb-config
├── kb-git -> ../../.agents/skills/kb-git
├── kb-web -> ../../.agents/skills/kb-web
└── ai-docs-update -> ../../.agents/skills/ai-docs-update
.claude/skills/ # Symlinks for claude code
└── (same structure)
ai-docs/ # Distributable query skill
Skills are stored once in .agents/skills/ and symlinked for different AI assistants. Symlinks are tracked in git, so cloning this repo gives you all skills immediately.
Data Flow
Scraping:
User request
↓
AI recognizes "update knowledge base"
↓
ai-docs-update skill activated
↓
AI executes: kb-config → kb-git → kb-web
↓
Knowledge base updated with:
- sources/ (downloaded content)
- ai-docs.lock (version tracking with source IDs)
Querying:
User request
↓
AI recognizes search keywords
↓
ai-docs skill activated
↓
list: shows sources from lock file
search: ripgrep over source content
tree: explore directory structure
view: read file with line range
↓
Returns results with file paths and context
Why This Design
Single Responsibility: Each skill does one thing well. Easy to test, debug, and maintain.
Composable: Skills can be used individually or orchestrated together.
Reusable: Same skills work across different AI assistants (kiro, claude, opencode).
Git-Friendly: Symlinks are tracked, so skills are versioned and shareable.
Progressive Disclosure: AI loads skill documentation only when needed, keeping context window efficient.
Configuration Files
~/.ai-docs.yaml (Global Registry)
knowledge_bases:
- path: ~/knowledge/python-docs
ai-docs.yaml (Per-Knowledge Base Config)
name: python-docs
sources:
- type: git
url: https://github.com/python/cpython.git
branch: main # optional
update_frequency_days: 7 # optional, default 30
- type: web
url: https://docs.python.org/3/
update_frequency_days: 14 # optional, default 30
ai-docs.lock (Auto-Generated)
Tracks versions, scrape dates, and source IDs:
last_updated: "2026-02-27T12:00:00Z"
sources:
- url: https://github.com/python/cpython.git
source_id: a1b2c3d4
type: git
commit: a1b2c3d4...
commit_date: "2026-02-25T10:30:00Z"
last_scraped: "2026-02-27T12:00:00Z"
local_path: sources/github.com/python/cpython
- url: https://docs.python.org/3/
source_id: e5f6g7h8
type: web
last_scraped: "2026-02-27T12:00:00Z"
local_path: sources/docs.python.org/3
Directory Structure
Knowledge Base Structure
~/knowledge/python-docs/
├── ai-docs.yaml # Source configuration
├── ai-docs.lock # Version tracking with source IDs
├── sources/ # Downloaded content
│ ├── github.com/
│ │ └── python/
│ │ └── cpython/ # Git repo clone
│ └── docs.python.org/
│ └── 3/ # Scraped website
│ ├── index.md
│ └── ...
└── .git/ # Knowledge base is a git repo
Regular Updates
Ask your AI assistant:
Please update all sources in my knowledge base at ~/knowledge/python-docs using the ai-docs-update skill
Or to force update (ignore frequency):
Please force update my knowledge base at ~/knowledge/python-docs using the ai-docs-update skill with --force flag
Querying
Ask your AI assistant:
Search my knowledge bases for "async programming"
For more specific searches:
Do a deep search in my knowledge bases for "asyncio.run"
Team Sharing
Knowledge bases are git repositories and can be shared:
cd ~/knowledge/python-docs
git add .
git commit -m "Update Python docs"
git push origin main
# Team members can clone
git clone <repo-url> ~/knowledge/python-docs
Troubleshooting
crawl4ai not found
uv add crawl4ai
crawl4ai-setup
ripgrep not found
# macOS
brew install ripgrep
# Ubuntu/Debian
apt-get install ripgrep
# Or use grep (automatic fallback)
No knowledge bases found
Check ~/.ai-docs.yaml exists and contains valid paths.
Source not updating
Use --force flag to ignore update frequency:
python .kiro/skills/kb-git/scripts/scrape_git.py ~/knowledge/python-docs --force
Architecture
See architecture.md for detailed system design.
Requirements
See requirements.md for complete requirements.
Design
See design.md for implementation details.
License
MIT