This is the third part of my slow examination of Kiselyov and Shan’s Lightweight Static Capabilities paper. Start at the the beginning and then read part two to put this part into context.
Section 3 of the paper uses the same technique from Section 2 and adapts it to tackle a larger and more interesting problem: array bound checking. To motivate the problem they follow in the footsteps of the dependent type literature by implementing binary search with statically-checked “safe” array lookups.
To explain the approach they use we’ll start with an unsafe implementation that uses runtime bounds checking on each index.
bsearch :: (Ord a) => (a -> a -> Ordering) -> a -> Array a -> Maybe (Int,a)
bsearch cmp key arr = go 0 (length arr - 1)
where go lo hi = if hi >= lo
then let midpoint = lo + ((hi - lo)`div`2)
curvalue = arr ! midpoint
in case cmp key curvalue of
LT -> look lo (m - 1)
EQ -> Just (m, curvalue)
GT -> look (m + 1) hi
else Nothing
The index operation (!) checks at runtime to see if the supplied index is within the bounds defined by the array (we still have to catch an exception, which I haven’t shown here, but at least it won’t blow everything up). This check is not necessary if we can prove that the index will always be in range.
The general approach outlined in Section 2 was to brand “safe” operations and types which go together. Once a list had been branded non-empty (ie had type FullList) it could be freely passed around and later safely destructured into head and tail without worry.
In the case of lists the values were branded “safe” and the head/tail operations could only work on “safe” values. The head operation is analogous to indexing the 0th element in an array. If we were to extend the list approach to arrays we would need a deref0 for the 0th element, deref1 for the 1st element and so on. This is both unwieldy and needlessly restricts the type of indexing possible.
The solution proposed by the paper is a set of “safe” indexing operations which only work when both the array and the index are branded as being compatible. An index is compatible with an array if it is within that array’s bounds.
get :: BrandedArray b a -> BrandedInt b -> a
The type variable b here is a phantom type which brands arrays and indices. The get operation can only be used if the same brand is applied to both array and the index.
Initially the only “safe” array indices are the offsets of the upper and lower element. The brands for an array are created alongside the upper and lower index in one operation, providing the assurance to the type system that whatever type b represents it is identical for these three values:
brand :: Array -> (BrandedArray b a, BrandedInt b, BrandedInt b)
(In fact this is not the proper type signature. The proper signature is reproduced below, with existential quantification over the branding type b and continuations for passing/failing conditions. This signature is just a gentle introduction to get the idea across)
Further operations can be created with transformations on the provided indices: increments, decrements and midpoints.
Whatever b is here it’s the same type, but next time brand is called b may well be a different type. The type system can’t unify them since it doesn’t know the type — it’s never specified. So only indices created alongside the branded array can be used with it, since only in that situation can the type checker be sure that the type of b matches.
This seems a powerful use of phantom types and it’s one I’d never seen before. The paper uses the phrase “type eigenvariable” at this point, which I’m not familiar with but it’s such a nice phrase that I’ll have to keep an eye out for it.
As before the type constructors to enforce this safety are hidden and only accessible under specific and well-guarded conditions.
-- Not exported!
newtype BrandedArray b a = BrandArray a
newtype BrandedInt b i = BrandInt i
brand :: Array -> (forall b. (BrandedArray b, BrandedInt b, BrandedInt b) -> w)
-> w -> w
brand arr cont stop = if lo <= hi
then cont (BrandArray, BrandInt lo, BrandInt hi)
else stop
The user provides functions to deal with the two cases that the bounds are valid and invalid. This same pattern follows through to other parts of the implementation: condition checking is done internally and the user provides functions to deal with various cases. This keeps some of the constraints (like the existential quantification over b, the brand type) wrapped up. If the user tries to subvert this the type checker objects:
Inferred type is less polymorphic than expected
Quantified type variable `b' escapes
In the second argument of `brand', namely `Just'
In the expression: brand testarray Just Nothing
In the definition of `it': it = brand testarray Just Nothing
In order to implement the rest of binary search over arrays some arithmetic functions on indices are introduced. These functions exist inside the trusted kernel so some effort is made to ensure they are correct. They also require passing/failing functions for operations that may fail (such as finding the successor of a valid index: the successor may be out of bounds):
-- If i1 and i2 are within bounds then their midpoint
-- is within bounds.
bmiddle :: BrandedInt b -> BrandedInt b -> BrandedInt b
bmiddle i1 i2 = BrandInt (i1 + (i2 - i1) `div` 2)
bsucc :: BrandedInt b -> BrandedInt b
-> (BrandedInt b -> w)
-> w -> w
-- If i is less than the upper bound we use continue calculation
-- with the inside else we fail with outside.
bsucc (BrandInt upper) (BrandInt i) inside outside =
let i' = i + 1
in if i' <= upper then inside (BrandInt i') else outside
There is also a predecessor function which works analogously. First we need to rework the binary search for this style of programming:
bsearch :: (Ord a) => (a -> a -> Ordering) -> a -> Array a -> Maybe (Int,a)
bsearch cmp key arr = brand arr find Nothing
where find (barr,lower,upper) = go lower upper
where go lo hi = let m = bmiddle lo hi
x = unsafeGet barr m
in case cmp key x of
LT -> bpred lo m (\i -> go lo i) Nothing
EQ -> Just (unbrand m, x)
GT -> bsucc hi m (\i -> go i hi) Nothing
On each iteration of the algorithm finding the value at the midpoint is always a safe operation provided the two arguments are safe. Incrementing or decrementing the upper/lower bound might fail if the index goes out of range, but this check can be considered part of the end condition, ie the search finishes unsuccessfully anyway if the index goes off the end because it’s an indication that the value we’re looking for is not stored.
The call to unsafeGet is, in fact, safe, since the index m is guaranteed to be in range for this array. No runtime checks are required over those needed to indicate the end condition. As before this problem is always dealt with formally using a similar mechanism for the empty-list problem, using Strict and Lax languages and mappings from one to the other. I’ll look at that next time.