Files

omigamedev d484f61b29 Add development plan with 13 milestone specifications

- docs/plan.md: Master roadmap with phases and priorities
- docs/milestones/01-13: Detailed specs for each feature
- Updated CLAUDE.md with plan references and build commands

Milestones cover:
- Phase 1: Temporal versioning, auto-capture, context injection, codebase indexing
- Phase 2: Daily journal, content ingestion, graph visualization, import/export
- Phase 3: Multi-graph, smart retrieval, TUI dashboard, browser extension, shell completions

2026-02-03 09:36:08 +01:00

6.0 KiB

Raw Blame History

Milestone 4: Codebase Indexing

Overview

Automatically scan and index project structure, creating component nodes for modules, services, and architectural patterns. Claude understands your codebase from day one.

Motivation

New projects require extensive explanation to Claude
Architecture decisions are scattered across files
Component relationships aren't captured anywhere
Supermemory's /index command is highly valued

Features

4.1 Project Scanner

# Index current project
cortex index .

# Index specific directory
cortex index ./src

# Re-index (update existing)
cortex index . --update

# Index with specific depth
cortex index . --depth 3

4.2 Auto-Detection

Detect project type and extract relevant info:

Project Type	Detection	Extracts
Node.js	`package.json`	Dependencies, scripts, name
Python	`pyproject.toml`, `setup.py`	Dependencies, entry points
Rust	`Cargo.toml`	Crates, features
Go	`go.mod`	Modules, dependencies
Generic	`README.md`	Description, setup

4.3 Component Extraction

Create nodes for discovered components:

interface IndexedComponent {
  kind: 'component';
  title: string;           // e.g., "UserService"
  content: string;         // Description + key exports
  tags: string[];          // ['backend', 'service', 'auth']
  metadata: {
    filePath: string;
    language: string;
    exports: string[];
    imports: string[];
    loc: number;
  };
}

4.4 Relationship Mapping

Auto-create edges based on imports/dependencies:

// File A imports from File B
addEdge(componentA.id, componentB.id, 'depends_on');

// Directory contains files
addEdge(directoryNode.id, fileNode.id, 'contains');

// Module implements interface
addEdge(impl.id, interface.id, 'implements');

4.5 Architecture Summary

Generate high-level architecture node:

const architectureNode = {
  kind: 'component',
  title: `${projectName} Architecture`,
  content: `
## Overview
${projectDescription}

## Tech Stack
- Runtime: ${runtime}
- Framework: ${framework}
- Database: ${database}

## Key Components
${components.map(c => `- **${c.title}**: ${c.summary}`).join('\n')}

## Directory Structure
${directoryTree}
`,
  tags: ['architecture', 'index', projectName],
};

4.6 Incremental Updates

Track indexed files and only re-process changes:

interface IndexState {
  projectPath: string;
  lastIndexed: number;
  fileHashes: Record<string, string>;  // path -> content hash
  nodeIds: Record<string, string>;     // path -> node ID
}

Implementation

Scanner Architecture

// src/core/indexer/index.ts
export async function indexProject(root: string, options: IndexOptions): Promise<IndexResult> {
  // Detect project type
  const projectType = await detectProjectType(root);

  // Load existing index state
  const state = await loadIndexState(root);

  // Scan files
  const files = await scanFiles(root, {
    ignore: [...DEFAULT_IGNORE, ...options.ignore],
    maxDepth: options.depth,
  });

  // Process each file
  const components: IndexedComponent[] = [];
  for (const file of files) {
    if (shouldSkip(file, state)) continue;

    const component = await extractComponent(file, projectType);
    if (component) {
      components.push(component);
    }
  }

  // Create/update nodes
  const nodes = await upsertComponents(components, state);

  // Map relationships
  await mapRelationships(nodes, files);

  // Generate architecture summary
  await generateArchitectureSummary(root, projectType, nodes);

  // Save state
  await saveIndexState(root, state);

  return { indexed: nodes.length, relationships: edges.length };
}

Language Parsers

// src/core/indexer/parsers/typescript.ts
export async function parseTypeScript(file: string): Promise<ParsedFile> {
  // Use TypeScript compiler API or tree-sitter
  const ast = ts.createSourceFile(file, content, ts.ScriptTarget.Latest);

  return {
    exports: extractExports(ast),
    imports: extractImports(ast),
    classes: extractClasses(ast),
    functions: extractFunctions(ast),
    interfaces: extractInterfaces(ast),
  };
}

// Parsers for: JavaScript, Python, Rust, Go, etc.

Ignore Patterns

const DEFAULT_IGNORE = [
  'node_modules',
  '.git',
  'dist',
  'build',
  '__pycache__',
  '.env*',
  '*.min.js',
  '*.map',
  'coverage',
  '.next',
  'target',  // Rust
  'vendor',  // Go
];

CLI Commands

Command	Description
`cortex index [path]`	Index project at path
`cortex index --update`	Update existing index
`cortex index --dry-run`	Preview what would be indexed
`cortex index --depth <n>`	Limit directory depth
`cortex index --lang <lang>`	Only index specific language

MCP Tools

memory_index      // Index current project
memory_reindex    // Force re-index
memory_components // List indexed components

Testing

Detects Node.js, Python, Rust, Go projects
Creates component nodes for modules
Maps import relationships correctly
Respects .gitignore patterns
Incremental update only processes changes
Architecture summary is accurate
Performance: <30s for 10k file project

Acceptance Criteria

cortex index . creates meaningful component nodes
Relationships reflect actual code dependencies
Architecture summary provides useful overview
Incremental updates are fast
Works with monorepos
MCP tool enables Claude to trigger indexing

Estimated Effort

Project detection: 2 hours
File scanner: 3 hours
TypeScript parser: 4 hours
Python parser: 3 hours
Relationship mapping: 4 hours
Architecture summary: 3 hours
Incremental updates: 3 hours
Testing: 3 hours
Total: ~25 hours

Dependencies

None (enhances Milestone 3 but independent)

References

tree-sitter for parsing
Sourcebot architecture

6.0 KiB Raw Blame History