Files
cortex/docs/milestones/04-codebase-indexing.md
omigamedev d484f61b29 Add development plan with 13 milestone specifications
- docs/plan.md: Master roadmap with phases and priorities
- docs/milestones/01-13: Detailed specs for each feature
- Updated CLAUDE.md with plan references and build commands

Milestones cover:
- Phase 1: Temporal versioning, auto-capture, context injection, codebase indexing
- Phase 2: Daily journal, content ingestion, graph visualization, import/export
- Phase 3: Multi-graph, smart retrieval, TUI dashboard, browser extension, shell completions
2026-02-03 09:36:08 +01:00

6.0 KiB

Milestone 4: Codebase Indexing

Overview

Automatically scan and index project structure, creating component nodes for modules, services, and architectural patterns. Claude understands your codebase from day one.

Motivation

  • New projects require extensive explanation to Claude
  • Architecture decisions are scattered across files
  • Component relationships aren't captured anywhere
  • Supermemory's /index command is highly valued

Features

4.1 Project Scanner

# Index current project
cortex index .

# Index specific directory
cortex index ./src

# Re-index (update existing)
cortex index . --update

# Index with specific depth
cortex index . --depth 3

4.2 Auto-Detection

Detect project type and extract relevant info:

Project Type Detection Extracts
Node.js package.json Dependencies, scripts, name
Python pyproject.toml, setup.py Dependencies, entry points
Rust Cargo.toml Crates, features
Go go.mod Modules, dependencies
Generic README.md Description, setup

4.3 Component Extraction

Create nodes for discovered components:

interface IndexedComponent {
  kind: 'component';
  title: string;           // e.g., "UserService"
  content: string;         // Description + key exports
  tags: string[];          // ['backend', 'service', 'auth']
  metadata: {
    filePath: string;
    language: string;
    exports: string[];
    imports: string[];
    loc: number;
  };
}

4.4 Relationship Mapping

Auto-create edges based on imports/dependencies:

// File A imports from File B
addEdge(componentA.id, componentB.id, 'depends_on');

// Directory contains files
addEdge(directoryNode.id, fileNode.id, 'contains');

// Module implements interface
addEdge(impl.id, interface.id, 'implements');

4.5 Architecture Summary

Generate high-level architecture node:

const architectureNode = {
  kind: 'component',
  title: `${projectName} Architecture`,
  content: `
## Overview
${projectDescription}

## Tech Stack
- Runtime: ${runtime}
- Framework: ${framework}
- Database: ${database}

## Key Components
${components.map(c => `- **${c.title}**: ${c.summary}`).join('\n')}

## Directory Structure
${directoryTree}
`,
  tags: ['architecture', 'index', projectName],
};

4.6 Incremental Updates

Track indexed files and only re-process changes:

interface IndexState {
  projectPath: string;
  lastIndexed: number;
  fileHashes: Record<string, string>;  // path -> content hash
  nodeIds: Record<string, string>;     // path -> node ID
}

Implementation

Scanner Architecture

// src/core/indexer/index.ts
export async function indexProject(root: string, options: IndexOptions): Promise<IndexResult> {
  // Detect project type
  const projectType = await detectProjectType(root);

  // Load existing index state
  const state = await loadIndexState(root);

  // Scan files
  const files = await scanFiles(root, {
    ignore: [...DEFAULT_IGNORE, ...options.ignore],
    maxDepth: options.depth,
  });

  // Process each file
  const components: IndexedComponent[] = [];
  for (const file of files) {
    if (shouldSkip(file, state)) continue;

    const component = await extractComponent(file, projectType);
    if (component) {
      components.push(component);
    }
  }

  // Create/update nodes
  const nodes = await upsertComponents(components, state);

  // Map relationships
  await mapRelationships(nodes, files);

  // Generate architecture summary
  await generateArchitectureSummary(root, projectType, nodes);

  // Save state
  await saveIndexState(root, state);

  return { indexed: nodes.length, relationships: edges.length };
}

Language Parsers

// src/core/indexer/parsers/typescript.ts
export async function parseTypeScript(file: string): Promise<ParsedFile> {
  // Use TypeScript compiler API or tree-sitter
  const ast = ts.createSourceFile(file, content, ts.ScriptTarget.Latest);

  return {
    exports: extractExports(ast),
    imports: extractImports(ast),
    classes: extractClasses(ast),
    functions: extractFunctions(ast),
    interfaces: extractInterfaces(ast),
  };
}

// Parsers for: JavaScript, Python, Rust, Go, etc.

Ignore Patterns

const DEFAULT_IGNORE = [
  'node_modules',
  '.git',
  'dist',
  'build',
  '__pycache__',
  '.env*',
  '*.min.js',
  '*.map',
  'coverage',
  '.next',
  'target',  // Rust
  'vendor',  // Go
];

CLI Commands

Command Description
cortex index [path] Index project at path
cortex index --update Update existing index
cortex index --dry-run Preview what would be indexed
cortex index --depth <n> Limit directory depth
cortex index --lang <lang> Only index specific language

MCP Tools

memory_index      // Index current project
memory_reindex    // Force re-index
memory_components // List indexed components

Testing

  • Detects Node.js, Python, Rust, Go projects
  • Creates component nodes for modules
  • Maps import relationships correctly
  • Respects .gitignore patterns
  • Incremental update only processes changes
  • Architecture summary is accurate
  • Performance: <30s for 10k file project

Acceptance Criteria

  • cortex index . creates meaningful component nodes
  • Relationships reflect actual code dependencies
  • Architecture summary provides useful overview
  • Incremental updates are fast
  • Works with monorepos
  • MCP tool enables Claude to trigger indexing

Estimated Effort

  • Project detection: 2 hours
  • File scanner: 3 hours
  • TypeScript parser: 4 hours
  • Python parser: 3 hours
  • Relationship mapping: 4 hours
  • Architecture summary: 3 hours
  • Incremental updates: 3 hours
  • Testing: 3 hours
  • Total: ~25 hours

Dependencies

  • None (enhances Milestone 3 but independent)

References