Stellarion
Tools

index_project

Index your codebase for AI-powered semantic search

The index_project tool creates the indexes that power Stellarion's advanced features. It generates vector embeddings for semantic search and builds a dependency graph for structure analysis. Running this tool enables fast, intelligent code search and accurate dependency mapping.

Overview

Stellarion's intelligent indexing engine provides a comprehensive, structured representation of your codebase by extracting and organizing code elements including files, procedures, functions, classes, variables, and their relationships. This powerful capability forms the foundation for enhanced AI-assisted development and natural language code exploration.

What Indexing Captures

When you index your codebase, Stellarion performs deep semantic analysis to create a rich, searchable representation of your code's structure and content. The indexing process captures:

  • File-level metadata: File paths, imports, dependencies, and module relationships
  • Code structures: Functions, methods, classes, interfaces, and their hierarchies
  • Variables and constants: Declarations, types, scopes, and usage patterns
  • Documentation: Inline comments, docstrings, and annotations
  • Code relationships: Call graphs, inheritance chains, and data flow patterns

Key Benefits

1. Optimized AI Interactions

When you request AI assistance for code generation, enhancement, or debugging, Stellarion leverages the index to provide only the most relevant context to the AI model. Instead of passing entire files or large code blocks, the index allows Stellarion to:

  • Dramatically reduce token consumption: By sending only pertinent code elements rather than entire files, you minimize API costs and stay within context window limits
  • Improve AI response quality: The AI receives precisely the context it needs—function signatures, type definitions, related dependencies—resulting in more accurate, contextually-aware suggestions
  • Speed up response times: Smaller, focused context means faster processing and quicker results

The index powers Stellarion's semantic search capabilities, enabling you to explore your codebase using natural language queries. You can:

  • Ask questions like "Where is user authentication handled?" or "Show me all database connection functions"
  • Discover code patterns and implementations without prior knowledge of the codebase
  • Navigate large, unfamiliar codebases quickly and intuitively
  • Find relevant code based on intent and functionality, not just keyword matching

This is particularly valuable when:

  • Onboarding to new projects
  • Working with legacy code
  • Collaborating across teams with different codebases
  • Conducting code reviews or audits

3. Contextual Code Understanding

The semantic understanding provided by indexing allows Stellarion to:

  • Identify dependencies and potential breaking changes
  • Suggest refactoring opportunities
  • Detect code patterns and anti-patterns
  • Provide intelligent autocomplete and code suggestions

How It Works

Stellarion uses local embedding models to generate vector embeddings of your code. Code is split into 50-line chunks, each converted to a high-dimensional vector that captures semantic meaning. These vectors are stored in a RocksDB database for fast similarity search. Simultaneously, import/export statements are analyzed to build a dependency graph in KuzuDB.

When to Use

  • Setting up a new project: First-time indexing to enable semantic search
  • After major codebase changes: Re-index to capture new code
  • When search results seem outdated: Refresh the index
  • Setting up a new development environment: Initialize Stellarion for your project

Parameters

ParameterTypeRequiredDefaultDescription
pathstringNo.Project root to index
forceReindexbooleanNofalseRe-index even if already indexed
maxFilesnumberNo1000Maximum files to index
fileTypesarrayNoAll code filesSpecific extensions to index
buildGraphbooleanNotrueBuild dependency graph
generateEmbeddingsbooleanNotrueGenerate semantic embeddings
includeStatisticsbooleanNofalseInclude detailed indexing statistics
useCachebooleanNofalseUse cache to skip unchanged files

MCP Command Syntax

mcp__stellarion__index_project path:. forceReindex:true includeStatistics:true

Examples

Basic Indexing

Natural Language:

Index this project for semantic search

Direct MCP Call:

mcp__stellarion__index_project

Returns: Indexing summary with file count and storage location


Force Re-indexing

Natural Language:

Re-index this project from scratch

Direct MCP Call:

mcp__stellarion__index_project forceReindex:true

Returns: Fresh index, ignoring any existing data


Index Specific File Types

Natural Language:

Index only TypeScript and JavaScript files

Direct MCP Call:

mcp__stellarion__index_project fileTypes:["ts","tsx","js","jsx"]

Returns: Index containing only the specified file types


Get Detailed Statistics

Natural Language:

Index this project and show me detailed statistics

Direct MCP Call:

mcp__stellarion__index_project includeStatistics:true

Returns: Comprehensive stats including files processed, chunks created, time taken


Index Only the Dependency Graph

Natural Language:

Just rebuild the dependency graph, don't regenerate embeddings

Direct MCP Call:

mcp__stellarion__index_project buildGraph:true generateEmbeddings:false

Returns: Updated dependency graph for structure analysis


Index Only Embeddings

Natural Language:

Regenerate semantic search embeddings only

Direct MCP Call:

mcp__stellarion__index_project generateEmbeddings:true buildGraph:false

Returns: Updated vector embeddings for semantic search


Limit Large Codebases

Natural Language:

Index this large project but limit to 500 files

Direct MCP Call:

mcp__stellarion__index_project maxFiles:500

Returns: Index of the most important 500 files

Output Format

Results include:

FieldDescription
Files indexedNumber of files processed
Chunks createdNumber of code chunks generated
Embeddings generatedNumber of vector embeddings created
Graph nodesNumber of files in dependency graph
Graph edgesNumber of dependency relationships
Time takenDuration of indexing process
Storage locationPath to .stellarion directory

What Gets Indexed

Supported Languages

LanguageExtensions
TypeScript.ts, .tsx
JavaScript.js, .jsx, .mjs, .cjs
Python.py
Rust.rs
Go.go
Java.java
C/C++.c, .cpp, .h, .hpp
Ruby.rb

Files Automatically Excluded

  • node_modules/ and similar dependency directories
  • dist/, build/, target/ (build artifacts)
  • .git/ and other VCS directories
  • .stellarion/ (Stellarion's own data)
  • Files matching .gitignore patterns

What's Captured

  • Functions and methods with their implementations
  • Classes and interfaces
  • Type definitions
  • Documentation comments (JSDoc, docstrings)
  • Import/export relationships

Storage Location

Index data is stored in your project:

.stellarion/
├── vector_store/     # Embeddings (RocksDB)
└── graph_kuzu/       # Dependency graph (KuzuDB)

This directory should be added to .gitignore as it's machine-specific and can be regenerated.

When to Re-index

Re-index your project when:

SituationCommand
Adding significant new codeforceReindex:true
After major refactoringforceReindex:true
Search results seem staleforceReindex:true
Switching branches with different codeforceReindex:true (if major differences)
Fixing corrupted indexforceReindex:true

Indexing Strategies

Full Codebase Indexing

Recommended for: Most projects, especially when working with AI assistance frequently

Indexing your entire codebase provides:

  • Complete context for AI-generated code
  • Comprehensive semantic search across all modules
  • Better understanding of cross-module dependencies
  • Maximum benefit from Stellarion's capabilities

Partial Codebase Indexing

Recommended for: Very large monorepos, multi-language projects, or when you want to focus on specific modules

You can selectively index:

  • Specific directories or modules you're actively working on
  • Core libraries while excluding test fixtures or build artifacts
  • Your application code while excluding third-party dependencies
  • Performance-critical sections that require frequent AI interaction

Best Practices

  1. Index broadly: Start with a full codebase index to maximize benefits
  2. Update regularly: Re-index after significant code changes, merges, or refactoring sessions
  3. Set up automatic updates: Configure Stellarion to automatically refresh indexes on file changes or at scheduled intervals
  4. Exclude generated code: Skip auto-generated files, build outputs, and vendor directories to keep indexes lean and relevant
  5. Version control integration: Index after pulling updates from your repository to stay synchronized with your team

Performance Tips

  1. Initial indexing takes time: Large projects (1000+ files) may take several minutes
  2. Use maxFiles for very large codebases: Focus on the most important directories
  3. Use useCache: true for incremental updates: Skip files that haven't changed
  4. Index only what you need: Use fileTypes to exclude irrelevant languages
  5. Re-index after major changes: Keep the index fresh for accurate results

Troubleshooting

Index seems out of date

mcp__stellarion__index_project forceReindex:true

Semantic search not working

mcp__stellarion__index_project generateEmbeddings:true

Structure analysis not accurate

mcp__stellarion__index_project buildGraph:true forceReindex:true

Index taking too long

mcp__stellarion__index_project maxFiles:500 fileTypes:["ts","js"]