concatenation operator change deferred for now, or maybe ever
8.9 KiB
Phase 2.1 Implementation Plan: Mutable Strings as *[]byte
Objective
Transform strings from Go's immutable string type to mutable *[]byte while maintaining string literal syntax sugar.
Current String Implementation Analysis
1. Type System (interp/type.go)
References Found: 10
| Location | Purpose | Change Required |
|---|---|---|
| Line 53 | stringT type category definition |
Keep category, change semantics |
| Line 101 | cats array: stringT: "stringT" |
Keep for debugging |
| Line 161 | untypedString() returns stringT |
Keep, but change type construction |
| Line 1323 | zeroValues[stringT] = reflect.ValueOf("") |
Change to *[]byte |
| Line 2383 | Type conversion case | Update conversion logic |
| Line 2615 | isString() helper |
May need update |
2. Type Resolution (interp/cfg.go)
References Found: 4
| Location | Purpose | Change Required |
|---|---|---|
| Line 200 | Range over string | Update to range over *[]byte |
| Line 1023-1024 | String indexing returns byte |
Already correct! |
| Line 1026 | reflect.String indexing | Update to handle *[]byte |
| Line 1093 | Slice/String switch case | Update |
3. Runtime Operations (interp/run.go)
References Found: 2
| Location | Purpose | Change Required |
|---|---|---|
| Line 2878-2889 | String type operations | Update to use *[]byte |
| Line 3801 | String literal creation | KEY: Convert to *[]byte |
4. Type Checking (interp/typecheck.go)
References Found: 6
| Location | Purpose | Change Required |
|---|---|---|
| Line 528 | reflect.String case | Update to recognize *[]byte |
| Line 783 | String to []byte for append | May simplify |
| Line 808 | len() on strings | Already works for slices |
| Line 873 | String operations | Update |
| Line 1183-1184 | String literal constant | Convert to *[]byte |
5. Universe Scope (interp/interp.go)
Reference: 1
| Location | Purpose | Change Required |
|---|---|---|
| Line 443 | "string" type definition |
Change to *[]byte representation |
6. Operations (interp/op.go)
References Found: 3
| Location | Purpose | Change Required |
|---|---|---|
| Line 21 | String operations | Update |
| Line 1503 | String comparison | Update |
| Line 1561 | String operations | Update |
Dependency Analysis
Layer 1: Foundation (No dependencies)
- Type Definition - Change
stringTto represent*[]uint8 - Zero Values - Update string zero value to
*[]byte{}
Layer 2: Type System (Depends on Layer 1)
- Type Construction -
untypedString()creates*[]uint8type - Universe Scope - Update
"string"type mapping
Layer 3: Literal Creation (Depends on Layer 2)
- String Literals - Convert
"hello"to*[]byte{'h','e','l','l','o'} - Runtime Constants - Update constant string handling
Layer 4: Operations (Depends on Layer 3)
- Indexing -
s[i]returns/setsbyte - Slicing -
s[1:3]returns*[]byte - Range -
for i, v := range sworks on*[]byte - Comparison -
s1 == s2compares byte slices - Built-ins -
len(s)already works via auto-deref
Layer 5: Conversions (Depends on Layer 4)
- String ↔ []byte - Becomes a no-op (same type)
- Type Assertions - Update reflection handling
Implementation Plan (Dependency-Ordered)
Step 1: Update Type System Foundation
Files: interp/type.go
// Change stringT to internally be *[]uint8
const (
...
stringT // Now represents *[]uint8, not Go string
...
)
// Update zero value
zeroValues[stringT] = reflect.ValueOf(&[]byte{})
// Update untypedString to create *[]uint8 type
func untypedString(n *node) *itype {
// Create a *[]uint8 type with stringT category for special handling
return &itype{
cat: stringT,
val: &itype{cat: uint8T}, // Element type
untyped: true,
str: "untyped string",
node: n,
}
}
Step 2: Update Universe Scope
Files: interp/interp.go
// Map "string" to *[]uint8 type
"string": {
kind: typeSym,
typ: &itype{
cat: stringT, // Special category for string literals
val: &itype{cat: uint8T},
name: "string",
str: "string",
},
},
Step 3: Update String Literal Creation
Files: interp/run.go, interp/typecheck.go
// When creating string literals from constants:
// OLD: reflect.ValueOf(constant.StringVal(c))
// NEW:
func stringLiteralToBytes(s string) reflect.Value {
bytes := []byte(s)
ptr := &bytes
return reflect.ValueOf(ptr)
}
// Update all constant.StringVal() calls
v = stringLiteralToBytes(constant.StringVal(c))
Step 4: Update Type Checking
Files: interp/typecheck.go
// Update reflect.String checks to handle stringT
case reflect.String:
// OLD: Direct string handling
// NEW: Treat as *[]byte (pointer to byte slice)
// Simplify string ↔ []byte conversions
// They're now the same type, so no conversion needed
Step 5: Update Runtime Operations
Files: interp/cfg.go
// Update string indexing (already returns byte - minimal change)
case stringT:
n.typ = sc.getType("byte") // Already correct!
// But need to ensure mutation works: s[i] = 'x'
// Update range over string
case stringT:
// Now ranges over *[]byte
sc.add(sc.getType("int64")) // Index storage
ktyp = sc.getType("int64")
vtyp = sc.getType("byte") // Changed from rune to byte
Step 6: Update String Operations
Files: interp/op.go
// String comparison - compare underlying byte slices
// String concatenation - disabled (we're skipping | operator for now)
Step 7: Handle Reflection Cases
Files: Multiple
// Update all reflect.String cases to recognize stringT as *[]byte
// Ensure reflection operations work correctly
Breaking Changes
For Users
-
Strings are now mutable:
// OLD: Error - strings are immutable s := "hello" s[0] = 'H' // ❌ Error in Go // NEW: Works in Moxie s := "hello" // Actually *[]byte (*s)[0] = 'H' // ✅ Works! s is now "Hello" -
String iteration returns bytes, not runes:
// OLD: Iteration yields runes for i, r := range "hello" { // r is rune (int32) } // NEW: Iteration yields bytes for i, b := range "hello" { // b is byte (uint8) } -
String ↔ []byte conversion is a no-op:
// OLD: Conversion creates a copy s := "hello" b := []byte(s) // Copy // NEW: They're the same type s := "hello" // *[]byte b := s // Same pointer, no copy!
Testing Strategy
Phase 1: Type System
- String literals create
*[]byte - Type checks recognize
stringTas*[]byte - Zero value is empty
*[]byte
Phase 2: Basic Operations
- String indexing:
s[0]returnsbyte - String mutation:
(*s)[0] = 'x'works - String slicing:
s[1:3]returns*[]byte
Phase 3: Advanced Features
- Range over string yields bytes
len(s)works via auto-deref- String comparison works
- String literals in composite types
Phase 4: Edge Cases
- Empty strings
- Unicode handling (bytes vs runes)
- Nil string pointers
- String constants
Risks and Mitigation
Risk 1: Reflection System Confusion
Problem: Go's reflect package expects string type, we're giving it *[]byte
Mitigation: Keep stringT as a special category that wraps *[]uint8 but is recognized as "string-like"
Risk 2: Unicode/Rune Handling
Problem: Strings now iterate as bytes, not runes. Unicode characters break.
Mitigation: Document clearly. Users need to use explicit rune conversion for Unicode.
Risk 3: Performance
Problem: String literals now allocate heap memory (pointer + slice)
Mitigation: Acceptable trade-off for mutability. Can optimize later with string interning.
Risk 4: Existing Code Breaks
Problem: Code expecting immutable strings will break
Mitigation: This is a breaking change - document clearly in migration guide.
Success Criteria
- ✅ All Phase 1 and 1.2 tests still pass
- ✅ String literals create mutable
*[]byte - ✅ String indexing/slicing works
- ✅ String mutation works:
(*s)[i] = byte - ✅ Range over strings works
- ✅ No regression in existing functionality
Estimated Complexity
- Type System Changes: Medium (5-10 locations)
- Runtime Changes: Medium (3-5 locations)
- Operation Updates: Low (string ops mostly disabled for now)
- Testing: Medium (need comprehensive tests)
Total Estimated Lines Changed: ~100-150 lines across 6 files
Time Estimate: 2-3 hours of focused work
Next Steps After Phase 2.1
Once mutable strings work:
- Phase 2.2: Add
|concatenation operator (if desired) - Phase 3: Built-in function modifications
- Comprehensive Unicode/rune handling documentation