Module CCLevenshtein

module CCLevenshtein: sig .. end

Levenshtein distance

We take inspiration from http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata for the main algorithm and ideas. However some parts are adapted


type 'a sequence = ('a -> unit) -> unit 
type 'a gen = unit -> 'a option 

Abstraction over Strings

Due to the existence of several encodings and string representations we abstract over the type of strings. A string is a finite array of characters (8-bits char, unicode runes, etc.) which provides a length operation and a function to access the n-th character.
module type STRING = sig .. end

Continuation list

This data structure is used to represent a list of result that is evaluated only as far as the user wants. If the user only wants a few elements, she doesn't pay for the remaining ones.

In particular, when matching a string against a (big) set of indexed strings, we return a continuation list so that, even if there are many results, only those actually asked for are evaluated.

type 'a klist = unit -> [ `Cons of 'a * 'a klist | `Nil ] 
val klist_to_list : 'a klist -> 'a list
Helper for short lists.

Signature

The signature for a given string representation provides 3 main things:

A possible use of the index could be:
let words = CCIO.with_in "/usr/share/dict/words"
  (fun i -> CCIO.read_all i |> CCString.Split.list_cpy ~by:"\n");;

let words = List.map (fun s->s,s) words;;
let idx = CCLevenshtein.Index.of_list words;;

CCLevenshtein.Index.retrieve ~limit:1 idx "hell" |> CCLevenshtein.klist_to_list;;

module type S = sig .. end

Functor


module Make (Str : STRING) : S 
  with type string_ = Str.t
  and type char_ = Str.char_

Default instance: string


include CCLevenshtein.S
val debug_print : Pervasives.out_channel -> automaton -> unit