Stellarion
Tools

stellarion_find_duplicates [Pro]

Detect duplicate and near-duplicate functions using semantic embeddings
This is a Pro tool. Requires a Stellarion Pro license. A 180-day free trial starts automatically.

Detects duplicate and near-duplicate functions across your codebase using semantic embeddings. Works cross-language — can find a Python function that does the same thing as a JavaScript function. Similarity threshold controls how strict the matching is.

When to Use

  • During cleanup to consolidate duplicated logic into shared utilities
  • After a merge to find accidentally duplicated code from different branches
  • When onboarding to a large codebase to understand where similar logic lives
  • To enforce DRY principles across a multi-language monorepo

Parameters

ParameterTypeRequiredDefaultDescription
thresholdnumberNo0.7Similarity threshold (0.0 to 1.0). Lower values find looser matches.
limitnumberNo20Maximum number of duplicate pairs to return
uristringNoworkspaceScope the search to a specific file or directory

Examples

Find all duplicates in the project

Are there any duplicate functions in this codebase?

Scans the workspace with the default 0.7 threshold and returns up to 20 duplicate pairs.

Find near-exact copies

Find near-exact duplicate functions — threshold 0.9 or higher.

Uses threshold: 0.9 to find functions that are almost identical (e.g., copy-pasted with minor renaming).

Cross-language duplicates

Find functions in the Python backend that duplicate logic from the TypeScript frontend.

The semantic embedding approach naturally works across languages — no special configuration needed.

Scope to a directory

Check src/handlers/ for duplicate functions.

Sets uri to scope the search to the handlers directory.

Output Format

Returns a list of duplicate pairs, each containing:

  • Function A — name, file path, line number, language
  • Function B — name, file path, line number, language
  • Similarity score — 0.0 to 1.0

Threshold interpretation:

ScoreMeaning
0.95–1.0Near-exact duplicate — likely copy-pasted
0.85–0.95Very similar — same logic with minor differences
0.7–0.85Renamed clone — same structure, different names
0.5–0.7Similar purpose — may or may not be true duplicates

Tips

  • Start with threshold: 0.7 for a broad scan, then increase to 0.85 if there are too many results
  • Cross-language detection is a key strength — a formatDate() in JavaScript will match a format_date() in Python if they do the same thing
  • Not all duplicates should be consolidated — sometimes duplication is intentional (e.g., different error handling per context)
  • Use stellarion_compare_symbols to do a deep side-by-side comparison of a specific duplicate pair
  • Large codebases may have many results at low thresholds — use limit and uri to keep output manageable