David Michael Barr

A relocatable, immutable, randomised, ternary search tree (part 2)

19 Jun, 2010 – Canberra

Following on from my last post, I have determined the requirements for the data structure. The structure must provide a map from (revision, path) pairs to file-system nodes, i.e. files and directories. Four basic operations are required: insert, remove, replace, graft, diff. What is graft? It takes all entries in the map descendent from a common prefix and copies them to new keys with the common prefix replaced by a new prefix. Diff compares the keys below two given prefixes and computes a sequence of add, remove and replace operations to produce one from the other. Add, remove and replace operations can be represented by a single operation, delta(a, b): delta(nil, b) = add(b), delta(a, nil) = remove(a), delta(a, b) where a and b not nil = replace(a, b). I believe that storing the keys in a trie, of which there are many implementations, will require memory bounded by a constant multiple of the entropy of the key set. From Bentley's paper it appears that a ternary search tree is a fast and efficient trie representation. I propose that when coupled with an immutable tree operations that the memory required for each change set will also be bounded by a constant multiple of the entropy of its key set. h2. Related reading "Purely functional data structures":http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf by Chris Okasaki, Cambridge University Press, 1998, ISBN 0-521-66350-4. "Ternary Search Trees":http://www.drdobbs.com/windows/184410528 By Jon Bentley and Bob Sedgewick, April 01, 1998 "Fast algorithms for sorting and searching strings":http://www.cs.princeton.edu/~rs/strings/paper.pdf pre. @inproceedings{314321, author = "Bentley, Jon L. and Sedgewick, Robert", title = "Fast algorithms for sorting and searching strings", year = "1997", address = "Philadelphia, PA, USA", booktitle = "SODA '97: Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms", isbn = "0-89871-390-0", location = "New Orleans, Louisiana, United States", pages = "360--369", publisher = "Society for Industrial and Applied Mathematics", } Raimund G. Seidel and Cecilia R. Aragon, Randomized Search Trees, Algorithmica, 16(4/5):464-497 (1996). Also in 30th Annual Symposium on Foundations of Computer Science, pages 540-545, Research Triangle Park, North Carolina, 30 October-1 November 1989. IEEE. Conrado Martinez and Salvador Roura, Randomized Binary Search Trees, Journal of the ACM, 45(2):288-323, March 1998.