# Phase 2.1 Implementation Plan: Mutable Strings as `*[]byte` ## Objective Transform strings from Go's immutable `string` type to mutable `*[]byte` while maintaining string literal syntax sugar. ## Current String Implementation Analysis ### 1. Type System (interp/type.go) **References Found: 10** | Location | Purpose | Change Required | |----------|---------|-----------------| | Line 53 | `stringT` type category definition | Keep category, change semantics | | Line 101 | cats array: `stringT: "stringT"` | Keep for debugging | | Line 161 | `untypedString()` returns `stringT` | Keep, but change type construction | | Line 1323 | `zeroValues[stringT] = reflect.ValueOf("")` | Change to `*[]byte` | | Line 2383 | Type conversion case | Update conversion logic | | Line 2615 | `isString()` helper | May need update | ### 2. Type Resolution (interp/cfg.go) **References Found: 4** | Location | Purpose | Change Required | |----------|---------|-----------------| | Line 200 | Range over string | Update to range over `*[]byte` | | Line 1023-1024 | String indexing returns `byte` | Already correct! | | Line 1026 | reflect.String indexing | Update to handle `*[]byte` | | Line 1093 | Slice/String switch case | Update | ### 3. Runtime Operations (interp/run.go) **References Found: 2** | Location | Purpose | Change Required | |----------|---------|-----------------| | Line 2878-2889 | String type operations | Update to use `*[]byte` | | Line 3801 | String literal creation | **KEY**: Convert to `*[]byte` | ### 4. Type Checking (interp/typecheck.go) **References Found: 6** | Location | Purpose | Change Required | |----------|---------|-----------------| | Line 528 | reflect.String case | Update to recognize `*[]byte` | | Line 783 | String to []byte for append | May simplify | | Line 808 | len() on strings | Already works for slices | | Line 873 | String operations | Update | | Line 1183-1184 | String literal constant | Convert to `*[]byte` | ### 5. Universe Scope (interp/interp.go) **Reference: 1** | Location | Purpose | Change Required | |----------|---------|-----------------| | Line 443 | `"string"` type definition | Change to `*[]byte` representation | ### 6. Operations (interp/op.go) **References Found: 3** | Location | Purpose | Change Required | |----------|---------|-----------------| | Line 21 | String operations | Update | | Line 1503 | String comparison | Update | | Line 1561 | String operations | Update | ## Dependency Analysis ### Layer 1: Foundation (No dependencies) 1. **Type Definition** - Change `stringT` to represent `*[]uint8` 2. **Zero Values** - Update string zero value to `*[]byte{}` ### Layer 2: Type System (Depends on Layer 1) 3. **Type Construction** - `untypedString()` creates `*[]uint8` type 4. **Universe Scope** - Update `"string"` type mapping ### Layer 3: Literal Creation (Depends on Layer 2) 5. **String Literals** - Convert `"hello"` to `*[]byte{'h','e','l','l','o'}` 6. **Runtime Constants** - Update constant string handling ### Layer 4: Operations (Depends on Layer 3) 7. **Indexing** - `s[i]` returns/sets `byte` 8. **Slicing** - `s[1:3]` returns `*[]byte` 9. **Range** - `for i, v := range s` works on `*[]byte` 10. **Comparison** - `s1 == s2` compares byte slices 11. **Built-ins** - `len(s)` already works via auto-deref ### Layer 5: Conversions (Depends on Layer 4) 12. **String ↔ []byte** - Becomes a no-op (same type) 13. **Type Assertions** - Update reflection handling ## Implementation Plan (Dependency-Ordered) ### Step 1: Update Type System Foundation **Files: interp/type.go** ```go // Change stringT to internally be *[]uint8 const ( ... stringT // Now represents *[]uint8, not Go string ... ) // Update zero value zeroValues[stringT] = reflect.ValueOf(&[]byte{}) // Update untypedString to create *[]uint8 type func untypedString(n *node) *itype { // Create a *[]uint8 type with stringT category for special handling return &itype{ cat: stringT, val: &itype{cat: uint8T}, // Element type untyped: true, str: "untyped string", node: n, } } ``` ### Step 2: Update Universe Scope **Files: interp/interp.go** ```go // Map "string" to *[]uint8 type "string": { kind: typeSym, typ: &itype{ cat: stringT, // Special category for string literals val: &itype{cat: uint8T}, name: "string", str: "string", }, }, ``` ### Step 3: Update String Literal Creation **Files: interp/run.go, interp/typecheck.go** ```go // When creating string literals from constants: // OLD: reflect.ValueOf(constant.StringVal(c)) // NEW: func stringLiteralToBytes(s string) reflect.Value { bytes := []byte(s) ptr := &bytes return reflect.ValueOf(ptr) } // Update all constant.StringVal() calls v = stringLiteralToBytes(constant.StringVal(c)) ``` ### Step 4: Update Type Checking **Files: interp/typecheck.go** ```go // Update reflect.String checks to handle stringT case reflect.String: // OLD: Direct string handling // NEW: Treat as *[]byte (pointer to byte slice) // Simplify string ↔ []byte conversions // They're now the same type, so no conversion needed ``` ### Step 5: Update Runtime Operations **Files: interp/cfg.go** ```go // Update string indexing (already returns byte - minimal change) case stringT: n.typ = sc.getType("byte") // Already correct! // But need to ensure mutation works: s[i] = 'x' // Update range over string case stringT: // Now ranges over *[]byte sc.add(sc.getType("int64")) // Index storage ktyp = sc.getType("int64") vtyp = sc.getType("byte") // Changed from rune to byte ``` ### Step 6: Update String Operations **Files: interp/op.go** ```go // String comparison - compare underlying byte slices // String concatenation - disabled (we're skipping | operator for now) ``` ### Step 7: Handle Reflection Cases **Files: Multiple** ```go // Update all reflect.String cases to recognize stringT as *[]byte // Ensure reflection operations work correctly ``` ## Breaking Changes ### For Users 1. **Strings are now mutable:** ```go // OLD: Error - strings are immutable s := "hello" s[0] = 'H' // ❌ Error in Go // NEW: Works in Moxie s := "hello" // Actually *[]byte (*s)[0] = 'H' // ✅ Works! s is now "Hello" ``` 2. **String iteration returns bytes, not runes:** ```go // OLD: Iteration yields runes for i, r := range "hello" { // r is rune (int32) } // NEW: Iteration yields bytes for i, b := range "hello" { // b is byte (uint8) } ``` 3. **String ↔ []byte conversion is a no-op:** ```go // OLD: Conversion creates a copy s := "hello" b := []byte(s) // Copy // NEW: They're the same type s := "hello" // *[]byte b := s // Same pointer, no copy! ``` ## Testing Strategy ### Phase 1: Type System - [ ] String literals create `*[]byte` - [ ] Type checks recognize `stringT` as `*[]byte` - [ ] Zero value is empty `*[]byte` ### Phase 2: Basic Operations - [ ] String indexing: `s[0]` returns `byte` - [ ] String mutation: `(*s)[0] = 'x'` works - [ ] String slicing: `s[1:3]` returns `*[]byte` ### Phase 3: Advanced Features - [ ] Range over string yields bytes - [ ] `len(s)` works via auto-deref - [ ] String comparison works - [ ] String literals in composite types ### Phase 4: Edge Cases - [ ] Empty strings - [ ] Unicode handling (bytes vs runes) - [ ] Nil string pointers - [ ] String constants ## Risks and Mitigation ### Risk 1: Reflection System Confusion **Problem:** Go's reflect package expects `string` type, we're giving it `*[]byte` **Mitigation:** Keep `stringT` as a special category that wraps `*[]uint8` but is recognized as "string-like" ### Risk 2: Unicode/Rune Handling **Problem:** Strings now iterate as bytes, not runes. Unicode characters break. **Mitigation:** Document clearly. Users need to use explicit rune conversion for Unicode. ### Risk 3: Performance **Problem:** String literals now allocate heap memory (pointer + slice) **Mitigation:** Acceptable trade-off for mutability. Can optimize later with string interning. ### Risk 4: Existing Code Breaks **Problem:** Code expecting immutable strings will break **Mitigation:** This is a breaking change - document clearly in migration guide. ## Success Criteria - ✅ All Phase 1 and 1.2 tests still pass - ✅ String literals create mutable `*[]byte` - ✅ String indexing/slicing works - ✅ String mutation works: `(*s)[i] = byte` - ✅ Range over strings works - ✅ No regression in existing functionality ## Estimated Complexity - **Type System Changes:** Medium (5-10 locations) - **Runtime Changes:** Medium (3-5 locations) - **Operation Updates:** Low (string ops mostly disabled for now) - **Testing:** Medium (need comprehensive tests) **Total Estimated Lines Changed:** ~100-150 lines across 6 files **Time Estimate:** 2-3 hours of focused work ## Next Steps After Phase 2.1 Once mutable strings work: 1. Phase 2.2: Add `|` concatenation operator (if desired) 2. Phase 3: Built-in function modifications 3. Comprehensive Unicode/rune handling documentation