Tools

index_project

Index your codebase for AI-powered semantic search

The index_project tool creates the indexes that power Stellarion's advanced features. It generates vector embeddings for semantic search and builds a dependency graph for structure analysis. Running this tool enables fast, intelligent code search and accurate dependency mapping.

How It Works

Stellarion uses RoBERTa (Robustly optimized BERT approach) to generate vector embeddings of your code. Code is split into 50-line chunks, each converted to a high-dimensional vector that captures semantic meaning. These vectors are stored in a RocksDB database for fast similarity search. Simultaneously, import/export statements are analyzed to build a dependency graph in KuzuDB.

When to Use

  • Setting up a new project: First-time indexing to enable semantic search
  • After major codebase changes: Re-index to capture new code
  • When search results seem outdated: Refresh the index
  • Setting up a new development environment: Initialize Stellarion for your project

Parameters

ParameterTypeRequiredDefaultDescription
pathstringNo.Project root to index
forceReindexbooleanNofalseRe-index even if already indexed
maxFilesnumberNo1000Maximum files to index
fileTypesarrayNoAll code filesSpecific extensions to index
buildGraphbooleanNotrueBuild dependency graph
generateEmbeddingsbooleanNotrueGenerate semantic embeddings
includeStatisticsbooleanNofalseInclude detailed indexing statistics
useCachebooleanNofalseUse cache to skip unchanged files

MCP Command Syntax

mcp__stellarion__index_project path:. forceReindex:true includeStatistics:true

Examples

Basic Indexing

Natural Language:

Index this project for semantic search

Direct MCP Call:

mcp__stellarion__index_project

Returns: Indexing summary with file count and storage location


Force Re-indexing

Natural Language:

Re-index this project from scratch

Direct MCP Call:

mcp__stellarion__index_project forceReindex:true

Returns: Fresh index, ignoring any existing data


Index Specific File Types

Natural Language:

Index only TypeScript and JavaScript files

Direct MCP Call:

mcp__stellarion__index_project fileTypes:["ts","tsx","js","jsx"]

Returns: Index containing only the specified file types


Get Detailed Statistics

Natural Language:

Index this project and show me detailed statistics

Direct MCP Call:

mcp__stellarion__index_project includeStatistics:true

Returns: Comprehensive stats including files processed, chunks created, time taken


Index Only the Dependency Graph

Natural Language:

Just rebuild the dependency graph, don't regenerate embeddings

Direct MCP Call:

mcp__stellarion__index_project buildGraph:true generateEmbeddings:false

Returns: Updated dependency graph for structure analysis


Index Only Embeddings

Natural Language:

Regenerate semantic search embeddings only

Direct MCP Call:

mcp__stellarion__index_project generateEmbeddings:true buildGraph:false

Returns: Updated vector embeddings for semantic search


Limit Large Codebases

Natural Language:

Index this large project but limit to 500 files

Direct MCP Call:

mcp__stellarion__index_project maxFiles:500

Returns: Index of the most important 500 files

Output Format

Results include:

FieldDescription
Files indexedNumber of files processed
Chunks createdNumber of code chunks generated
Embeddings generatedNumber of vector embeddings created
Graph nodesNumber of files in dependency graph
Graph edgesNumber of dependency relationships
Time takenDuration of indexing process
Storage locationPath to .stellarion directory

What Gets Indexed

Supported Languages

LanguageExtensions
TypeScript.ts, .tsx
JavaScript.js, .jsx, .mjs, .cjs
Python.py
Rust.rs
Go.go
Java.java
C/C++.c, .cpp, .h, .hpp
Ruby.rb

Files Automatically Excluded

  • node_modules/ and similar dependency directories
  • dist/, build/, target/ (build artifacts)
  • .git/ and other VCS directories
  • .stellarion/ (Stellarion's own data)
  • Files matching .gitignore patterns

What's Captured

  • Functions and methods with their implementations
  • Classes and interfaces
  • Type definitions
  • Documentation comments (JSDoc, docstrings)
  • Import/export relationships

Storage Location

Index data is stored in your project:

.stellarion/
├── vector_store/     # RoBERTa embeddings (RocksDB)
└── graph_kuzu/       # Dependency graph (KuzuDB)

This directory should be added to .gitignore as it's machine-specific and can be regenerated.

When to Re-index

Re-index your project when:

SituationCommand
Adding significant new codeforceReindex:true
After major refactoringforceReindex:true
Search results seem staleforceReindex:true
Switching branches with different codeforceReindex:true (if major differences)
Fixing corrupted indexforceReindex:true

Performance Tips

  1. Initial indexing takes time: Large projects (1000+ files) may take several minutes
  2. Use maxFiles for very large codebases: Focus on the most important directories
  3. Use useCache: true for incremental updates: Skip files that haven't changed
  4. Index only what you need: Use fileTypes to exclude irrelevant languages
  5. Re-index after major changes: Keep the index fresh for accurate results

Troubleshooting

Index seems out of date

mcp__stellarion__index_project forceReindex:true

Semantic search not working

mcp__stellarion__index_project generateEmbeddings:true

Structure analysis not accurate

mcp__stellarion__index_project buildGraph:true forceReindex:true

Index taking too long

mcp__stellarion__index_project maxFiles:500 fileTypes:["ts","js"]