Files
moxa/PHASE_2_1_PLAN.md
mleku e34c490753
Some checks failed
Build Cross OS / Go (/go, oldstable, macos-latest) (push) Has been cancelled
Build Cross OS / Go (/go, oldstable, ubuntu-latest) (push) Has been cancelled
Build Cross OS / Go (/go, stable, macos-latest) (push) Has been cancelled
Build Cross OS / Go (/go, stable, ubuntu-latest) (push) Has been cancelled
Build Cross OS / Go (\go, oldstable, windows-latest) (push) Has been cancelled
Build Cross OS / Go (\go, stable, windows-latest) (push) Has been cancelled
Main / Linting (push) Has been cancelled
Main / Checks code and generated code (oldstable) (push) Has been cancelled
Main / Checks code and generated code (stable) (push) Has been cancelled
Main / Build and Test (oldstable) (push) Has been cancelled
Main / Build and Test (stable) (push) Has been cancelled
complete phase 2, mutable strings
concatenation operator change deferred for now, or maybe ever
2025-11-22 12:44:15 +00:00

8.9 KiB

Phase 2.1 Implementation Plan: Mutable Strings as *[]byte

Objective

Transform strings from Go's immutable string type to mutable *[]byte while maintaining string literal syntax sugar.

Current String Implementation Analysis

1. Type System (interp/type.go)

References Found: 10

Location Purpose Change Required
Line 53 stringT type category definition Keep category, change semantics
Line 101 cats array: stringT: "stringT" Keep for debugging
Line 161 untypedString() returns stringT Keep, but change type construction
Line 1323 zeroValues[stringT] = reflect.ValueOf("") Change to *[]byte
Line 2383 Type conversion case Update conversion logic
Line 2615 isString() helper May need update

2. Type Resolution (interp/cfg.go)

References Found: 4

Location Purpose Change Required
Line 200 Range over string Update to range over *[]byte
Line 1023-1024 String indexing returns byte Already correct!
Line 1026 reflect.String indexing Update to handle *[]byte
Line 1093 Slice/String switch case Update

3. Runtime Operations (interp/run.go)

References Found: 2

Location Purpose Change Required
Line 2878-2889 String type operations Update to use *[]byte
Line 3801 String literal creation KEY: Convert to *[]byte

4. Type Checking (interp/typecheck.go)

References Found: 6

Location Purpose Change Required
Line 528 reflect.String case Update to recognize *[]byte
Line 783 String to []byte for append May simplify
Line 808 len() on strings Already works for slices
Line 873 String operations Update
Line 1183-1184 String literal constant Convert to *[]byte

5. Universe Scope (interp/interp.go)

Reference: 1

Location Purpose Change Required
Line 443 "string" type definition Change to *[]byte representation

6. Operations (interp/op.go)

References Found: 3

Location Purpose Change Required
Line 21 String operations Update
Line 1503 String comparison Update
Line 1561 String operations Update

Dependency Analysis

Layer 1: Foundation (No dependencies)

  1. Type Definition - Change stringT to represent *[]uint8
  2. Zero Values - Update string zero value to *[]byte{}

Layer 2: Type System (Depends on Layer 1)

  1. Type Construction - untypedString() creates *[]uint8 type
  2. Universe Scope - Update "string" type mapping

Layer 3: Literal Creation (Depends on Layer 2)

  1. String Literals - Convert "hello" to *[]byte{'h','e','l','l','o'}
  2. Runtime Constants - Update constant string handling

Layer 4: Operations (Depends on Layer 3)

  1. Indexing - s[i] returns/sets byte
  2. Slicing - s[1:3] returns *[]byte
  3. Range - for i, v := range s works on *[]byte
  4. Comparison - s1 == s2 compares byte slices
  5. Built-ins - len(s) already works via auto-deref

Layer 5: Conversions (Depends on Layer 4)

  1. String ↔ []byte - Becomes a no-op (same type)
  2. Type Assertions - Update reflection handling

Implementation Plan (Dependency-Ordered)

Step 1: Update Type System Foundation

Files: interp/type.go

// Change stringT to internally be *[]uint8
const (
    ...
    stringT  // Now represents *[]uint8, not Go string
    ...
)

// Update zero value
zeroValues[stringT] = reflect.ValueOf(&[]byte{})

// Update untypedString to create *[]uint8 type
func untypedString(n *node) *itype {
    // Create a *[]uint8 type with stringT category for special handling
    return &itype{
        cat: stringT,
        val: &itype{cat: uint8T},  // Element type
        untyped: true,
        str: "untyped string",
        node: n,
    }
}

Step 2: Update Universe Scope

Files: interp/interp.go

// Map "string" to *[]uint8 type
"string": {
    kind: typeSym,
    typ: &itype{
        cat: stringT,  // Special category for string literals
        val: &itype{cat: uint8T},
        name: "string",
        str: "string",
    },
},

Step 3: Update String Literal Creation

Files: interp/run.go, interp/typecheck.go

// When creating string literals from constants:
// OLD: reflect.ValueOf(constant.StringVal(c))
// NEW:
func stringLiteralToBytes(s string) reflect.Value {
    bytes := []byte(s)
    ptr := &bytes
    return reflect.ValueOf(ptr)
}

// Update all constant.StringVal() calls
v = stringLiteralToBytes(constant.StringVal(c))

Step 4: Update Type Checking

Files: interp/typecheck.go

// Update reflect.String checks to handle stringT
case reflect.String:
    // OLD: Direct string handling
    // NEW: Treat as *[]byte (pointer to byte slice)

// Simplify string ↔ []byte conversions
// They're now the same type, so no conversion needed

Step 5: Update Runtime Operations

Files: interp/cfg.go

// Update string indexing (already returns byte - minimal change)
case stringT:
    n.typ = sc.getType("byte")  // Already correct!
    // But need to ensure mutation works: s[i] = 'x'

// Update range over string
case stringT:
    // Now ranges over *[]byte
    sc.add(sc.getType("int64"))  // Index storage
    ktyp = sc.getType("int64")
    vtyp = sc.getType("byte")  // Changed from rune to byte

Step 6: Update String Operations

Files: interp/op.go

// String comparison - compare underlying byte slices
// String concatenation - disabled (we're skipping | operator for now)

Step 7: Handle Reflection Cases

Files: Multiple

// Update all reflect.String cases to recognize stringT as *[]byte
// Ensure reflection operations work correctly

Breaking Changes

For Users

  1. Strings are now mutable:

    // OLD: Error - strings are immutable
    s := "hello"
    s[0] = 'H'  // ❌ Error in Go
    
    // NEW: Works in Moxie
    s := "hello"  // Actually *[]byte
    (*s)[0] = 'H'  // ✅ Works! s is now "Hello"
    
  2. String iteration returns bytes, not runes:

    // OLD: Iteration yields runes
    for i, r := range "hello" {
        // r is rune (int32)
    }
    
    // NEW: Iteration yields bytes
    for i, b := range "hello" {
        // b is byte (uint8)
    }
    
  3. String ↔ []byte conversion is a no-op:

    // OLD: Conversion creates a copy
    s := "hello"
    b := []byte(s)  // Copy
    
    // NEW: They're the same type
    s := "hello"  // *[]byte
    b := s        // Same pointer, no copy!
    

Testing Strategy

Phase 1: Type System

  • String literals create *[]byte
  • Type checks recognize stringT as *[]byte
  • Zero value is empty *[]byte

Phase 2: Basic Operations

  • String indexing: s[0] returns byte
  • String mutation: (*s)[0] = 'x' works
  • String slicing: s[1:3] returns *[]byte

Phase 3: Advanced Features

  • Range over string yields bytes
  • len(s) works via auto-deref
  • String comparison works
  • String literals in composite types

Phase 4: Edge Cases

  • Empty strings
  • Unicode handling (bytes vs runes)
  • Nil string pointers
  • String constants

Risks and Mitigation

Risk 1: Reflection System Confusion

Problem: Go's reflect package expects string type, we're giving it *[]byte

Mitigation: Keep stringT as a special category that wraps *[]uint8 but is recognized as "string-like"

Risk 2: Unicode/Rune Handling

Problem: Strings now iterate as bytes, not runes. Unicode characters break.

Mitigation: Document clearly. Users need to use explicit rune conversion for Unicode.

Risk 3: Performance

Problem: String literals now allocate heap memory (pointer + slice)

Mitigation: Acceptable trade-off for mutability. Can optimize later with string interning.

Risk 4: Existing Code Breaks

Problem: Code expecting immutable strings will break

Mitigation: This is a breaking change - document clearly in migration guide.

Success Criteria

  • All Phase 1 and 1.2 tests still pass
  • String literals create mutable *[]byte
  • String indexing/slicing works
  • String mutation works: (*s)[i] = byte
  • Range over strings works
  • No regression in existing functionality

Estimated Complexity

  • Type System Changes: Medium (5-10 locations)
  • Runtime Changes: Medium (3-5 locations)
  • Operation Updates: Low (string ops mostly disabled for now)
  • Testing: Medium (need comprehensive tests)

Total Estimated Lines Changed: ~100-150 lines across 6 files

Time Estimate: 2-3 hours of focused work

Next Steps After Phase 2.1

Once mutable strings work:

  1. Phase 2.2: Add | concatenation operator (if desired)
  2. Phase 3: Built-in function modifications
  3. Comprehensive Unicode/rune handling documentation