From the rough introduction in the last post it is clear that we want to define multiple representations for use in different phases of the compiler. By itself this desire does not cause any problems: Haskell lets us declare as many datatypes as we want. So in a naive manner we could jump right in:
data Tree1 = Lit Integer | Var String | Add Tree1 Tree1... data Tree2 = Lit2 Integer | Var2 String Visibility | Add Visibility Tree2 Tree2 ... data Tree3 = Lit3 Integer | Var3 String Type | Cast Type Tree3 ...
Sure enough, Haskell will let us type a lot of code quickly, but it is not necessarily a good way to proceed. Even in this simple example we can see several issues start to appear.
- Constructors need unique names, so we are generating arbitrary instances of syntactic families (Lit,Lit2 etc). This means that we are reading something into the relationship between these constructors (and their types) that does not exist in the program.
- Both Lit and Var are simple constructor cases - they will represent leaves in the program tree. The Add and Cast constructors are more complex because they represent trees, so their fields include recursive definitions.
- Both Tree2 and Tree3 extend Tree1 in some way: Tree2 adds data-flow properties to nodes in the tree (e.g. for analysing information leakage) while Tree3 adds type-information to the tree. Both the Tree2 and Tree3 datatypes include a common core that is similar to Tree1, but this relationship is only expressed syntactically.
- Tree2 extends Tree1 in a monotonic way: each of the constructors in Tree1 has an equivalent in Tree2. Tree3 extends some part of Tree1, but it does not include an equivalent to the Add constructor. The set of nodes in Tree3 overlaps the set of nodes in Tree1, but it is not a subset.
parse :: String -> Maybe Tree1
build2 :: Tree1 -> Tree2 build3 :: Tree2 -> Tree3
pretty :: Tree1 -> String pretty :: Tree2 -> String pretty :: Tree3 -> String