In­dexed Data­types

In his ICFP pa­per from 2014, Conor McBride gives an ex­per­i­ence re­port on writ­ing a cor­rect-by-­con­struc­tion sor­ted data struc­utre in the de­pend­ently typed pro­gram­ming lan­guage Ag­da. His first at­tempt in­volves so called meas­ures which are also an im­port­ant as­pect of the Li­quid Haskell sys­tem. The fol­low­ing data de­clar­a­tion rep­res­ents the list data­type en­riched with a length meas­ure. In­form­ally, a re­fine­ment type { x | p(x) } is in­hab­it­ted by an ele­ment e just if p(e) holds.

data List a where
  Nil :: { xs :: List a | len xs = 0}
  Cons :: a -> (xs' :: List a) -> { xs :: List a | len xs = len xs' + 1 }

Be­ing able to spe­cify the length of a list at the type-­level gives us a stronger static guar­an­tee. The head func­tion, for ex­ample, can be re­fined to re­quire lists that are prov­ably non-empty. Meas­ures provide a quick­-fire way of re­fin­ing al­geb­raic data­types that feels very nat­ural - the length func­tion is one of the first en­countered by func­tional pro­gram­mers. Meas­ures can be trans­lated into a more al­geb­raic spe­cific­a­tion with an in­dex (an ad­di­tional type para­met­er, like n :: Nat) that spe­cifies the value of a meas­ure, e.g.:

type List :: Nat -> Type -> Type
data List n a where
  Nil :: List 0 a
  Cons :: a -> List n a -> List (n + 1) a

The prac­tic­al­ity of these ad­vanced type sys­tems de­pends on the ca­pa­city for type check­ing and in­fer­ence. Oth­er­wise, there is no ad­vant­age over manual proofs of cor­rect­ness. Type check­ing ul­ti­mately boils down to equal­ity on types: we need to be sure that the types of two terms align, and thus it is safe for data to flow from one point in our pro­gram to an­oth­er. In a de­pend­ently typed lan­guage this in­volves equal­ity between ar­bit­rary pro­gram terms. For ex­ample, the types List (x + 1) a and List (1 + x) a would ideally be con­sidered equi­val­ent, but in gen­er­al, this is an un­de­cid­able prob­lem.

While a dir­ect trans­la­tion into in­dexed types works well for simple meas­ures. McBride no­ticed that some styles of in­dex­ing are bet­ter at pla­cat­ing type check­ers. And, for com­plex meas­ures, this be­comes very im­port­ant. In the fol­low­ing quote “green slime” refers to meas­ure ex­pres­sions such as x + 1.

We got the wrong an­swer be­cause we asked the wrong ques­tion: What should the type of a sub­tree tell us? some­what pre­sup­poses that in­form­a­tion bubbles out­ward from sub­trees to the nodes which con­tain them. In Mil­ner’s tra­di­tion, we are used to syn­thes­iz­ing the type of a thing. Moreover, the very syn­tax of data de­clar­a­tions treats the in­dex de­livered from each con­structor as an out­put. It seems nat­ural to treat data­type in­dices as meas­ures of the data. That is all very well for the length of a vec­tor, but when the meas­ure­ment is in­tric­ate […] pro­gram­ming be­comes vexed by the need for the­or­ems about the meas­ur­ing func­tions. The pres­ence of ‘green slime’ - defined func­tions in the re­turn types of con­struct­ors is a danger sign.

We can take an al­tern­at­ive view of types, not as syn­thes­ized meas­ure­ments of data, bubbled out­ward, but as checked re­quire­ments of data, pushed in­ward. To en­force the in­vari­ant, let us rather ask “What should we tell the type of a sub­tree?”.

In sum­mary, the tar­get type of a given ex­pres­sion should dic­tate what type its sub­-­ex­pres­sions should re­ceive. This fol­lows simply from the fact that type an­nota­tions ap­pear at the top-­level in our pro­gram and rarely on the leaves of syn­tax trees. In this post, we will con­struct an in­dexed type for dic­tat­ing al­geb­raic ef­fect pro­to­cols with ex­actly this prop­erty.

Al­geb­raic Ef­fects

Al­geb­raic ef­fects are an ap­proach to man­aging side-ef­fects in a pure lan­guage of rising pop­ular­ity. They provide sev­eral ad­vant­ages over the tra­di­tional monad trans­former for­mu­lism, but most no­tice­able is their in­dif­fer­ence to the or­der (and scope) of ef­fects; the de­cision only needs to made by the hand­ler, not the ef­fect­ful pro­ced­ure. This flex­ib­il­ity al­lows the design of more com­pos­able sys­tems.

At the heart of al­geb­raic ef­fect sys­tems is the free mon­ad, defined as a gen­er­al­isa­tion of trees with leaves of the re­turn type and branches shaped by some sig­na­ture. While the sig­na­ture is of­ten as­sumed to be a func­tor, for which this tree in­ter­pret­a­tion is most lu­cid, we will in­stead fo­cus on the fol­low­ing defin­i­tion:

data Free f a where
  Pure :: a -> Free f a
  Bind :: f a -> (a -> Free f b) -> Free f b

This data­type al­lows us to lift prim­it­ive op­er­a­tions given by some sig­na­ture f :: Type -> Type into ar­bit­rary mon­adic ex­pres­sions. By us­ing the fol­low­ing State s a sig­na­ture, for ex­ample, we can build func­tions that ma­nip­u­late a state of type s. The second para­meter of this data­type is the re­turn type of its prim­it­ive op­er­a­tions.

data State s a where
  Get :: State s s
  Put :: s -> State s ()

There is also a grow­ing in­terest in in­dex­ing mon­ads by a grad­ing that provides some ab­strac­tion of the op­er­a­tions they per­form. This is no dif­fer­ent from con­sid­er­ing the length of the list as an ab­strac­tion over its value, and has the same be­ne­fits. However, we fur­ther re­quire that a mon­ad’s grad­ing is a mon­oid, so it has a sens­ible no­tion of com­pos­i­tion and iden­tity that mir­rors the mon­ad’s struc­ture.

As we are con­cerned with the free monad a good place to start is with the free mon­oid, i.e. lists. Con­sid­er­ing sig­na­tures that are them­selves in­dexed by some type i, we can in­dex mon­adic ex­pres­sions as either the empty list for the Pure con­struct­or, in­dic­at­ing that it has no ef­fect, or by pre­pend­ing the in­dex of a prim­it­ive op­er­a­tion to those of the con­tinu­ation for the Bind con­struct­or.

type Free :: (i -> Type) -> [i] -> Type -> Type
data Free f ix a where
  Pure :: a -> f [] a
  Bind :: f i a -> (a -> f ix b) -> f (i : ix) b

We can now en­rich our ef­fect sig­na­ture with an in­dex that dis­tin­guishes between dif­fer­ent sorts of prim­it­ive op­er­a­tion. In this case, to in­dic­ate whether it is a Get or Put. And thus spe­cify that shape of an ef­fect­ful op­er­a­tion at com­pile time.

data Op­Sort
  = G | P

type State :: Type -> Op­Sort -> Type -> Type
data State s sort a where
  Get :: State s G s
  Put :: s -> State s P ()

modify :: (s -> s) -> Free (State s) [G, P] ()
modify f = 
  Bind Get $ \s -> Put (f s)

Reg­u­lar Ef­fect Pro­to­cols

This tech­nique for stat­ic­ally pre­scrib­ing a se­quence of ef­fects can be thought of as a pro­tocol that defines the ac­cept­able be­ha­viour of an ef­fect­ful pro­gram. It is not, however, par­tic­u­larly use­ful be­cause we have to know the ex­act se­quence. What would be a lot nicer is if we were able to de­scribe a set of valid be­ha­viours. A con­veni­ent way of spe­cify­ing a set of lists (i.e. a lan­guage) is as a reg­u­lar ex­pres­sion. For ex­ample, if we re­quire that all Get op­er­a­tions are fol­lowed by a Put op­er­a­tion, we could write Star (Unit G :. Unit P) us­ing the fol­low­ing DSL:

data Regex i
  = Empty -- Empty lan­guage
  | Ep­si­lon -- Empty string
  | Unit i -- Single char­ac­ter
  | Regex i :. Regex i -- Se­quen­tial com­pos­i­tion
  | Regex i :|| Regex i -- Union of lan­guages
  | Regex i :&& Regex i -- In­ter­sec­tion of lan­guges
  | Star (Regex i) -- Kleene Star, i.e. un­boun­ded it­er­a­tion

A very sat­is­fy­ing prop­erty of reg­u­lar ex­pres­sions is that the lan­guage which re­mains after con­sum­ing a single char­ac­ter is an­other reg­u­lar ex­pres­sion. Let’s un­pack this briefly with an ex­ample. If I start with the lan­guage Star (Unit G :. Unit P) and ob­serve a G, then the fol­low­ing se­quence of char­ac­ters must sat­isfy Unit P :. Star (Unit G :. Unit P) What hap­pens if I ob­serve a P in­stead? Well the re­main­ing lan­guage is Empty as no string fol­low­ing a P will sat­isfy our pro­tocol (not even the empty string Ep­si­lon). This con­struc­tion is called the Brzo­zowski de­riv­at­ive and will be key to en­sur­ing the type of the over­all tree, that is our pro­to­col, can ef­fect­ively dic­tate the type of the sub­trees, which will be the de­riv­at­ive with re­spect to the op­er­a­tions en­countered so far.

The Brzo­zowski de­riv­at­ive on type-­level in­dices is defined by three type fam­il­ies (i.e. type-­level func­tion­s). The first Nu (read nul­lable) checks whether the lan­guage of a reg­u­lar ex­pres­sion con­tains the empty string. It re­turns Ep­si­lon, if this is the case, and Empty oth­er­wise. As we wish to min­im­ise the amount of equa­tional reas­on­ing the com­pile has to do, and in fact, we won’t spe­cify any of the equa­tional laws of reg­u­lar ex­pres­sions, this func­tion also ex­pli­citly sim­pli­fies the res­ult via the Force type fam­ily, op­pose to the usual present­a­tion where the out­put is a reg­u­lar ex­pres­sion that is merely equi­val­ent to Epis­lon or Empty.

type Nu :: Regex i -> Regex i
type fam­ily Nu r where
  Nu Empty = Empty
  Nu Ep­si­lon = Ep­si­lon
  Nu (Unit i) = Ep­si­lon
  Nu (r :. s) = Force (Nu r :&& Nu s)
  Nu (r :&& s) = Force (Nu r :&& Nu s)
  Nu (r :|| s) = Force (Nu r :|| Nu s)
  Nu (Star r) = Ep­si­lon

type Force :: Regex i -> Regex i
type fam­ily Force r where
  Force Empty = Empty
  Force Ep­si­lon = Ep­si­lon
  Force (Empty :&& s) = Empty
  Force (Ep­si­lon :&& s) = Force s
  Force (Empty :|| s) = Force s
  Force (Ep­si­lon :&& s) = Ep­si­lon

The third type fam­ily, which defines the de­riv­at­ive op­er­a­tion, is Delta. It takes two para­met­ers: an ob­served op­er­a­tion i, and the reg­u­lar ex­pres­sion r. The out­put is a reg­u­lar ex­pres­sion whose lan­guage con­tains strings that can be pre­pen­ded with i to make an ele­ment of the lan­guage defined by r.

type Delta :: i -> Regex i -> Regex i
type fam­ily Delta i r where
  Delta i Empty = Empty
  Delta i Ep­si­lon = Empty
  Delta i (Unit i) = Ep­si­lon
  Delta i (Unit j) = Empty -- Here we know that i and j are dis­tinct
  Delta i (r :. s) =
    (Delta i r :. s) :|| -- Either i is con­sumed by r,
      (Nu r :. Delta i s) -- or r can be the empty string and i is con­sumed by s
  Delta i (r :&& s) =
    Delta i r :&& Delta i s
  Delta i (r :|| s) =
    Delta i r :|| Delta i s
  Delta i (Star r) =
    Delta i (r :. Star r) -- Un­fold Kleene star

Put­ting it all to­gether

Now we have an ap­pro­pri­ate way of passing type in­form­a­tion to sub­trees, we can define a free monad in­dexed not by a list of op­er­a­tions but by a reg­u­lar lan­guage of op­er­a­tions. In the case of Pure, we need to know that the empty string is ac­cep­ted by the pro­tocol. Luck­ily, the Nu type fam­ily checks ex­actly that! For the Bind con­struct­or, we just need to take the de­riv­at­ive of the tar­get pro­tocol with re­spect to the ob­served op­er­a­tion to find a suit­able pro­tocol for the con­tinu­ation.

type Free :: (i -> Type) -> Regex i -> Type -> Type
data Free f r a where
  Pure :: Nu r ~ Ep­si­lon => a -> Free f r a
  Bind :: f i a -> (a -> Free f (Delta i r) b) -> Free f r b

type Pro­tocol :: Regex Get­Put
type Pro­tocol =
  Star (Unit G :. Unit P)

-- This func­tion will type check.
safe :: Free (State Int) Pro­tocol ()
safe = 
  modify (s + 1)

-- This func­tion will not type check!
un­safe :: Free (State Int) Pro­tocol ()
un­safe =
  Bind Get $ \s -> Pure ()

This in­dexed monad is quite neat be­cause we can now stat­ic­ally verify that only one of these two func­tions ac­tu­ally meets our pro­tocol. However, it is lim­ited to ex­pli­cit se­quences of op­er­a­tions be­cause the de­riv­at­ive is com­puted with re­spect to a given prim­it­ive op­er­a­tions, not an­other pro­tocol. It is worth not­ing that reg­u­lar lan­guages are also closed un­der quo­tient­ing, a gen­er­al­isa­tion of the de­riv­at­ive op­er­ator that would al­low for the com­pos­i­tion of ar­bit­rary ef­fect­ful func­tions. Nev­er­the­less, this toy im­ple­ment­a­tion does demon­strate that or­gan­ising data struc­tures in the man­ner pro­posed by McBride can in­deed lessen the bur­den on the com­piler.