TITLE: Equality Check Patterns for Recursive Structures
AUTHOR: Eugene Wallingford
DATE: October 07, 2012 2:50 PM
DESC:
-----
BODY:
I first encountered this trio of programming patterns when writing
Smalltalk programs to manipulate graphs back in the late 1980s.
These days I see them most often when comparing recursive data
types in language processors. Andrew Black wrote about these
patterns in the context of Smalltalk on the Squeak mailing list in
the mid-2000s.
~~~~
Recursive Equality Check
Problem
You are working with a recursive data structure. In the simplest
form, you might have a list or a record that can contain lists or
records as parts. In a language application, you might have a
structured type that can have the same type as one of its parts.
For instance, a function type might allow a function as an argument
or a function as a return value.
You have two instances of the structure and need to compare them,
say, for equality or for dominance. In the language app, for
example, you will need to verify that the type of an argument
matches the type of the formal parameter on a procedure.
Solution
Standard
structural recursion
works. Walk the two structures in parallel, checking to see that
they have the same values in the same positions. When one of the
positions holds values the same structure, make a recursive call
to compare them.
But what if ...
~~~~
Recursive Equality Check with Identity Check
Problem
An instance of the recursive structure can contain a reference
to itself as a value, either directly or through mutual recursion.
In the simplest form, this might be a dictionary that contains
itself as a value in a key/vaue pair, as in Smalltalk, where the
global variable Smalltalk is the dictionary of all
global variables, including Smalltalk.
Comparing two instances now raises concerns. Two instances may be
identical, or contain identical components. In such cases, the
standard recursive comparison will never terminate.
Solution
Check for identity first. Recurse only if the two values are
distinct.
But what if...
~~~~
Recursive Equality Check with Cache
Problem
You have two structures that do not share any elements, but they
are structurally isomorphic. For example, this can occur in the
simplest of structures, two one-element maps:
a = { :self => a }
b = { :self => b }
Now, even with an identity test up front, the recursive comparison
will never terminate.
Solution
Maintain a cache of compared pairs. Before you begin to compare
two objects, check to see if the pair is in the cache. If yes,
return true. Otherwise, add the pair to the cache and proceed.
This approach works even though the function has not finished
comparing the two objects yet. If there turns out to be a
difference between the two, the check currently in progress will
find it elsewhere and answer false. There is no need to enter a
recursive check.
~~~~
A variation of this caching technique can also be used in other
situations, such as computing a hash value for a recursive
structure. If in the course of computing the hash you encounter
the same structure again, assume that the value is a suitable
constant, such as 0 or 1. Hashes are only approximations anyway,
so making only one pass over the structure is usually enough. If
you really need a fixpoint, then you can't take this shortcut.
Ruby hashes handle all three of these problems correctly:
a = { :first => :int,
:second => { :first => :int, :second => :int } }
b = { :second => { :second => :int, :first => :int },
:first => :int, }
a == b # --> true
c = { :first => c }
a = { :first => :int,
:second => { :first => c, :second => :int } }
b = { :first => :int,
:second => { :first => c, :second => :int } }
a == b # --> true
a = { :self => a }
b = { :self => b }
a == b # --> true
I don't know if either MRI Ruby or JRuby uses these patterns to
implement their solutions, or if they use some other technique.
-----