Added beginning of based32 implementation

Lots more side notes... these will be turned into > prefixed sections in order to reduce the sense of excessive text in the tutorial and make it easier to focus on the important parts and move the sidenotes to a more easily ignorable layout.
2022-04-27 08:06:18 +03:00
parent bf74d46854
commit e6f53a06f7
2 changed files with 355 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -32,6 +32,7 @@
 		- [Always write code to be extensible](#always-write-code-to-be-extensible)
 		- [Helper functions](#helper-functions)
 		- [Log at the site](#log-at-the-site)
+		- [Create an Initialiser](#create-an-initialiser)

 ## Teaching Golang via building a Human Readable Binary Transcription Encoding Framework

@@ -775,3 +776,123 @@ It may be that you are never writing algorithms that need any real debugging, ma

 For that reason also, now that we are implementing an algorithm here, we are going to deliberately cause bugs and force the student to encounter the process of debugging, show the way to fix them, and not just make this an exercise in copy and paste, for which there will be no benefit as bugs are the way you learn to write good code, without that difficulty, it is not programing, and you will forget the next day how you did it, which makes this whole exercise a waste of time that you could have saved yourself keystrokes and just read it instead.

+#### Create an Initialiser
+
+The purpose for the transparency of the `Codec` type in `types.go` was so that we could potentially create a custom codec in a separate package than where the type was defined. 
+
+While we could have avoided this openness and created a custom function to load the non-exported struct members, and potentially in a more complex library, we would, for a simple library like this, we are going to assume that users of the library are either not going to tamper with it when its being used concurrently, or that they have created a custom implementation for their own encoder design or won't be using it concurrently.
+
+So, the first things we are going to do is sketch out the creation of an initialiser, in which we will use closures to load the structure with functionality.
+
+```go
+// Package based32 provides a simplified variant of the standard
+// Bech32 human readable binary codec
+package based32
+
+import (
+    codec "github.com/quanterall/kitchensink"
+)
+
+// charset is the set of characters used in the data section of bech32 strings.
+const charset = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
+
+// Codec provides the encoder/decoder implementation created by makeCodec.
+var Codec = makeCodec(
+	"Base32Check",
+	charset,
+	"QNTRL",
+)
+
+// makeCodec generates our custom codec as above, into the exported Codec
+// variable
+func makeCodec(
+	name string,
+	cs string,
+	hrp string,
+) (cdc *codec.Codec) {
+    
+    return cdc
+}
+```
+
+You will notice that we took care to make sure that everything you will paste into your editor will pass syntax checks immediately. All functions that have return values must contain a `return` statement. The return value here is imported from `types.go` at the root of the repository, which the compiler identifies as `github.com/quanterall/kitchensink` because of running `go mod init` in [Initialize your repository](#initialize-your-repository) .
+
+When you first start writing code, you will probably get quite irritated at having to put in those empty returns and put in the imports. This is just how things are with Go. The compiler is extremely strict about identifiers, all must be known, and a function with return value without a return is also wrong. A decent Go IDE will save you time by adding and removing the `import` lines for you automatically if it knows them, but you are responsible for putting the returns in there. Note that you can put a `panic()` statement instead of `return`, the tooling in Goland, for example, when you generate an implementation for a type from an interface, puts `panic` calls in there to remind you to fill in your implementation.
+
+I will just plug Goland a little further, the reason why I recommend it is because it has the best hyperlink system available for any IDE on the market, and as I mentioned a little way back, tracing errors back to their source is one of the most time consuming parts of the work of a programmer, every little bit helps, and Jetbrains clearly listen to their users regarding this - even interfaces are easy to trace back to multiple implementations, again saving a lot of time when you are working on large codebases.
+
+Before we start to show you how to put things into the `Codec` we first will just refresh your memory with a compact version of the structure that it defines:
+
+```go
+type Codec struct {
+	Name string
+	HRP string
+	Charset string
+	Encoder func(input []byte) (output string, err error)
+	Decoder func(input string) (output []byte, err error)
+	MakeCheck func(input []byte, checkLen int) (output []byte)
+	Check func(input []byte) (err error)
+}
+```
+
+HRP and Charset are configuration values, and Encoder, Decoder, MakeCheck and Check are functions.
+
+The configuration part is simple to define, so add this to `makeCodec` :
+
+```go
+	// Create the codec.Codec struct and put its pointer in the return variable.
+	cdc = &codec.Codec{
+		Name:    name,
+		Charset: cs,
+		HRP:     hrp,
+	}
+```
+
+This section:
+
+```go
+
+// charset is the set of characters used in the data section of bech32 strings.
+const charset = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
+
+// Codec provides the encoder/decoder implementation created by makeCodec.
+var Codec = makeCodec(
+	"Base32Check",
+	charset,
+	"QNTRL",
+)
+
+```
+
+as you can now see, fills in these predefined configuration values for our codec.
+
+In fact, the Name field is not used anywhere, but in an application that can work with multiple codec.Codec implementations, this could become quite useful to differentiate between them.
+
+Stub in the closures for the codec:
+
+```go
+	cdc.MakeCheck = func(input []byte, checkLen int) (output []byte) {
+   		return
+	}
+
+	cdc.Encoder = func(input []byte) (output string, err error) {
+    	return
+	}
+
+	cdc.Check = func(input []byte) (err error) {
+    	return
+	}
+
+	cdc.Decoder = func(input string) (output []byte, err error) {
+    	return
+	}
+```
+
+Again, we like to teach good, time saving, and error saving practices for programming. Making stubs for things that you know you eventually have to fill in is a good practise for this purpose. 
+
+Note that the returns can be left 'naked' like this because the variables are declared in the type signature of the closure. If you leave out the names and only have the list of types of the return tuples, you have to also make the declarations of the variables, or fill in empty versions, which for `[]` types means `nil` for `error` also `nil` and for `string` the empty type is `""`. 
+
+There is something of an unofficial convention amongst Go programmers to not name return variables, it is the opinion of the author that this is a bad thing for readability, as the variable names can give information about what the values actually represent. In this case here they are named simply as they are quite unambiguous given the functions names, however, sometimes it can be very helpful to save the reader the time of scanning through the function to know what a return value relates to.
+
+
+
--- a/steps/step4/pkg/based32/based32.go
+++ b/steps/step4/pkg/based32/based32.go
@@ -0,0 +1,234 @@
+// Package based32 provides a simplified variant of the standard
+// Bech32 human readable binary codec
+package based32
+
+import (
+	"encoding/base32"
+	codec "github.com/quanterall/kitchensink"
+	"github.com/quanterall/kitchensink/pkg/proto"
+	"lukechampine.com/blake3"
+	"strings"
+)
+
+// charset is the set of characters used in the data section of bech32 strings.
+const charset = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
+
+// Codec provides the encoder/decoder implementation created by makeCodec.
+var Codec = makeCodec(
+	"Base32Check",
+	charset,
+	"QNTRL",
+)
+
+func getCheckLen(length int) (checkLen int) {
+
+	// The following formula ensures that there is at least 1 check byte, up to
+	// 4, in order to create a variable length theck that serves also to pad to
+	// 5 bytes per 8 5 byte characters (2^5 = 32 for base 32)
+	//
+	// we add two to the length before modulus, as there must be 1 byte for
+	// check length and 1 byte of check
+	lengthMod := (2 + length) % 5
+
+	// The modulus is subtracted from 5 to produce the complement required to
+	// make the correct number of bytes of total data, plus 1 to account for the
+	// minimum length of 1.
+	checkLen = 5 - lengthMod + 1
+
+	return checkLen
+}
+
+// getCutPoint is made into a function because it is needed more than once.
+func getCutPoint(length, checkLen int) int {
+
+	return length - checkLen - 1
+}
+
+// makeCodec generates our custom codec as above, into the exported Codec
+// variable
+func makeCodec(
+	name string,
+	cs string,
+	hrp string,
+) (cdc *codec.Codec) {
+
+	// Create the codec.Codec struct and put its pointer in the return variable.
+	cdc = &codec.Codec{
+		Name:    name,
+		Charset: cs,
+		HRP:     hrp,
+	}
+
+	// We need to create the check creation functions first
+	cdc.MakeCheck = func(input []byte, checkLen int) (output []byte) {
+
+		// We use the Blake3 256 bit hash because it is nearly as fast as CRC32
+		// but less complicated to use due to the 32 bit integer conversions to
+		// bytes required to use the CRC32 algorithm.
+		checkArray := blake3.Sum256(input)
+
+		// This truncates the blake3 hash to the prescribed check length
+		return checkArray[:checkLen]
+	}
+
+	// Create a base32.Encoding from the provided charset.
+	enc := base32.NewEncoding(cdc.Charset)
+
+	cdc.Encoder = func(input []byte) (output string, err error) {
+
+		if len(input) < 1 {
+
+			err = proto.Error_ZERO_LENGTH
+			return
+		}
+
+		// The check length depends on the modulus of the length of the data is
+		// order to avoid padding.
+		checkLen := getCheckLen(len(input))
+
+		// The output is longer than the input, so we create a new buffer.
+		outputBytes := make([]byte, len(input)+checkLen+1)
+
+		// Add the check length byte to the front
+		outputBytes[0] = byte(checkLen)
+
+		// Then copy the input bytes for beginning segment.
+		copy(outputBytes[1:len(input)+1], input)
+
+		// Then copy the check to the end of the input.
+		copy(outputBytes[len(input)+1:], cdc.MakeCheck(input, checkLen))
+
+		// Create the encoding for the output.
+		outputString := enc.EncodeToString(outputBytes)
+
+		// We can omit the first character of the encoding because the length
+		// prefix never uses the first 5 bits of the first byte, and add it back
+		// for the decoder later.
+		trimmedString := outputString[1:]
+
+		// Prefix the output with the Human Readable Part and append the
+		// encoded string version of the provided bytes.
+		output = cdc.HRP + trimmedString
+
+		return
+	}
+
+	cdc.Check = func(input []byte) (err error) {
+
+		// We must do this check or the next statement will cause a bounds check
+		// panic. Note that zero length and nil slices are different, but have
+		// the same effect in this case, so both must be checked.
+		switch {
+		case len(input) < 1:
+
+			err = proto.Error_ZERO_LENGTH
+			return
+
+		case input == nil:
+
+			err = proto.Error_NIL_SLICE
+			return
+		}
+
+		// The check length is encoded into the first byte in order to ensure
+		// the data is cut correctly to perform the integrity check.
+		checkLen := int(input[0])
+
+		// Ensure there is at enough bytes in the input to run a check on
+		if len(input) < checkLen+1 {
+
+			err = proto.Error_CHECK_TOO_SHORT
+
+			return
+		}
+
+		// Find the index to cut the input to find the checksum value. We need
+		// this same value twice so it must be made into a variable.
+		cutPoint := getCutPoint(len(input), checkLen)
+
+		// Here is an example of a multiple assignment and more use of the
+		// slicing operator.
+		payload, checksum := input[1:cutPoint], string(input[cutPoint:])
+
+		computedChecksum := string(cdc.MakeCheck(payload, checkLen))
+
+		// Here we assign to the return variable the result of the comparison.
+		// by doing this instead of using an if and returns, the meaning of the
+		// comparison is more clear by the use of the return value's name.
+		valid := checksum != computedChecksum
+
+		if !valid {
+
+			err = proto.Error_CHECK_FAILED
+		}
+
+		return
+	}
+
+	cdc.Decoder = func(input string) (output []byte, err error) {
+
+		// Other than for human identification, the HRP is also a validity
+		// check, so if the string prefix is wrong, the entire value is wrong
+		// and won't decode as it is expected.
+		if !strings.HasPrefix(input, cdc.HRP) {
+
+			log.Printf("Provided string has incorrect human readable part:"+
+				"found '%s' expected '%s'", input[:len(cdc.HRP)], cdc.HRP,
+			)
+
+			err = proto.Error_INCORRECT_HUMAN_READABLE_PART
+
+			return
+		}
+
+		// Cut the HRP off the beginning to get the content, add the initial
+		// zeroed 5 bytes with a 'q' character.
+		input = "q" + input[len(cdc.HRP):]
+
+		data := make([]byte, len(input)*5/8)
+
+		// Be aware the input string will be copied to create the []byte
+		// version. Also, because the input bytes are always zero for the first
+		// 5 most significant bits, we must re-add the zero at the front (q)
+		// before feeding it to the decoder.
+		var writtenBytes int
+		writtenBytes, err = enc.Decode(data, []byte(input))
+		if err != nil {
+
+			log.Println(err)
+			return
+		}
+
+		// The first byte signifies the length of the check at the end
+		checkLen := int(data[0])
+		if writtenBytes < checkLen+1 {
+
+			err = proto.Error_CHECK_TOO_SHORT
+
+			return
+
+		}
+
+		// Assigning the result of the check here as if true the resulting
+		// decoded bytes still need to be trimmed of the check value (keeping
+		// things cleanly separated between the check and decode function.
+		err = cdc.Check(data)
+
+		// There is no point in doing any more if the check fails, as per the
+		// contract specified in the interface definition codecer.Codecer
+		if err != nil {
+			return
+		}
+
+		// Slice off the check length prefix, and the check bytes to return the
+		// valid input bytes.
+		output = data[1:getCutPoint(len(data)+1, checkLen)]
+
+		// If we got to here, the decode was successful.
+		return
+	}
+
+	// We return the value explicitly to be nice to readers as the function is
+	// not a short and simple one.
+	return cdc
+}