Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / algorithm

Incremental construction of DFA in F#

5.00/5 (2 votes)
3 Aug 2016MIT6 min read 14.2K   80  
F#-implementation of algorithm for incremental construction of minimal deterministic finite automata through adding and removing strings.

Image 1

Introduction

This is implementation of the algorithm presented in the paper [Carrasco and Forcada 2002]1, that, in its turn, is founded on the work [Daciuk et al. 2000]2. They described the method for modifying of minimal deterministic finite automaton through adding (or removing) a string into the language accepting by this automaton. The algorithm is given in pseudo-code, and wouldn't be difficult to implement it in any imperative language. But it seemed more interesting to write working realization in functional language, such as F#, that is positioned as a powerful tool for language oriented programming and writing parsers3, so the code may find some use in this areas.

Background

A deterministic finite automaton (DFA) M is a tuple (Q, Σ, δ, q0, F), where:4

  • Q is a finite set of states
  • Σ is a finite set of input symbols
  • δ: Q × Σ → Q is a transition function
  • q∈ Q is an initial or start state
  • ⊆ Q is a set of final or accepting states

Automaton M accepts string ω = a1...an ∈ Σ* if M starts in initial state q0, for each ai moves to next state according to δ, and finally stays in one of the accepting states from F. Each automaton M recognizes some regular language L(M), that means all ω ⊆ L(M) are accepted by M. For example, the automaton

  1. Q = {0, 1, 2, 3, 4, 5},
  2. Σ = {a, b, r},
  3. δ is 
     
    0
    1
    2
    3
    4
    5
    a
    -
    2
    -
    -
    5
    -
    b
    1
    -
    4
    -
    -
    4
    r
    -
    -
    3
    -
    -
    -
  4. q= 0,
  5. = {2, 3, 5}

accepts strings bar, ba, baba, bababa and so forth, which are its language. Function δ, as often, is defined by the transition table. Not all cells are full, that means the δ is partially-defined on Q × Σ. In theory it can be easily handled by adding special absorbing state, but in practice it's no need. In programming language we can consider that result of δ in such cases is null, or is None for F#, what have been the choice.

Here is diagram view of the automaton and illustration of its steps during process string ω = baba:

Image 2

If in some step there is no suitable transition for next symbol of input string ω, or all symbols are passed and current state isn't final, then ω is not accepted (i.e., is rejected) by automaton. For example, the string bra is rejected by the automaton above and not belongs to its language.

Why do we need it

The incremental construction of DFA means adding strings one by one to the language recognized by this automaton. Why do we want doing it? Lets see the automaton below:

abc

It simply recognizes the word son. Now we add to it word win:

Image 4

We'll discuss later how it can be done. After minimization the resulting automaton contains two different paths for so and wi and one transition for n. In fact, some branches for the common parts were be merged. We could think of it as a some sort of compression. What will happen if now we add to this DFA the word wind? It will increase much more, because there is no word sond, so common branches cannot be stay here anymore:

Image 5

It's not so interesting. But if we add to previous DFA the word wing instead of wind, and then the word song, automaton will decrease:

Image 6

Image 7

That's why we keep automaton minimized. It provides less computational and memory costs in case where DFA recognizes strings that have some common structure rather than set of "independent" or random strings (and that's the purpose of using automata in practice). Now the DFA above accepts the words {son, song, win, wing} and contains only six states and six transitions. Consider that recognized language may be finite (like here) or infinite (like in background section).

Minimization is a quite resource intensive operation. According to straight approach for adding another string we should represent it as a simple automaton, perform union of two automata and then minimize resulting DFA. Instead of this chain of operations, the method proposed by Daciuk and modified by Carrasco and Forcada is to construct a minimal automaton in a single phase by adding new strings one by one and minimizing the resulting automaton on-the fly.2

Implementation

Key points of the algorithm will be illustrated by examples. We'll add string bra to the automaton from the background section. The project itself doesn't contain visualisation, but the samples have been created separately for better clarity.

DFA is defined as a Record, whose values are named according to mathematical definition:

F#
type DFA =
    { Σ : Set<char>;
      Q : Set<int>;
      q0 : int;
      F : Set<int>;
      δ : Map<int * char, int>
    }

States are an integer numbers, symbols are represented by the char type. ΣQ and F are defined as a Setsδ is a Map where key is a Tuple of state and symbol and value is resulting state. It also has method Accept to provide processing input string:

F#
///<summary> Trying to accept the input string </summary>
///<param name="ω"> Input string </param>
///<returns> True if ω is accepted by automaton or False in other case </returns>
member M.Accept ω =
    ///function returns extended mapping δ*(q, ω)
    let rec f ω q =
        match ω with
        | [] -> (Some q)
        ///return None or recursivly applies f to result of mapping
        | a :: x -> M.δ.TryFind (q, a) |> Option.bind (f x)
    ///finds δ*(q0, ω) and checks that result is a final state
    f ω M.q0 |> Option.fold (fun _ -> M.F.Contains) false

There are an additional functions to specify some automaton modifications helpful for the algorithm: addStateremoveState and addTransition:

F#
///adds given state q to automaton M. If "isFinal" is set to true,
///also adds it to the final states
let addState q isFinal M =
    { M with
        Q = M.Q.Add q;
        F = if isFinal then M.F.Add q else M.F }

///removes given state q from automaton M. If q is initial state,
///generates an InvalidOperationException
let removeState q M =
    if q <> M.q0
    then { M with
            Q = M.Q.Remove q;
            F = M.F.Remove q;
            δ = Set.foldBack (fun a -> Map.remove (q, a)) M.Σ M.δ }
    else invalidOp "The initial state cannot be removed"

///adds given transition ((q1, a), q2) to automaton M. If states q1, q2
///or symbol a are not in the automaton, generates an ArgumentException
let addTransition ((q1, a), q2) M =
    if M.Q.Contains q1 && M.Q.Contains q2 && M.Σ.Contains a
    then { M with δ = Map.add (q1, a) q2 M.δ }
    else invalidArg "((q1, a), q2)" "The transition contains wrong elements"

They return copy of the DFA record with modified respective values.

Functions addString and removeString simply call addOrRemove function with true or false as a parameter. We'll see later that adding and removing string differ in just one detail.

The addOrRemove function do follow:

  • Creates new state that will be initial (as states are defined by numbers, that it is simply next number following by the maximum state's identifier):
F#
let q0' = M.Q.MaximumElement + 1
  • Creates and adds to the automaton new states starting from q0' (and collects them in a list) using function clone:
F#
let (M', added) = clone flag ω [] q0' (Some M.q0) M
F#
///takes string ω, list of already created states "added", new state q',
///state q, automaton M and recursivly adds q' to M for all δ*(q, ω)
let rec clone flag ω added q' q M =
    match ω with
    //add the last of created states to automaton (and to final states
    //if flag = true, e.g. the string ω is added to automaton)
    | [] -> Option.foldBack (cloneTrs q') q (addState q' flag M), q'::added
    | a :: x ->
        match q with
        //if q is an absorption state (e.g. None),
        //add q' to automaton as a queue state
        | None -> None,
                  M |> addState q' false
        //if q is a nonabsorption state, add q' to automaton
        //as a cloned state of q with all transitions created
        | Some v -> M.δ.TryFind (v, a),
                    M |> addState q' (M.F.Contains v) |> cloneTrs q' v
        //create new state, add q' to list of created states and call
        ///the function again for new nonabsorption state (or None),
        ///augmented automaton and next symbol of ω
        ||> clone flag x (q'::added) (q' + 1)

Image 8

We successively find next state of the automaton step-by-step for each symbol in input string and add new state to the DFA depends on result. If at the another step the current state exists, then its transitions should be cloned to the new state by function cloneTrs:

F#
///takes state q, cloned state q', automaton M and returns
///new automaton with the union of δ and all transitions
///coming from q where q is replaced by q'
let cloneTrs q' q M =
    { M with δ = Set.foldBack (fun a ->
                    Option.foldBack (Map.add (q', a))
                                    (M.δ.TryFind (q, a)))
                              M.Σ M.δ }

Those new states are called cloned. If result of applying transition function to the current state doesn't exists, then state added at this step will be called queue in the terminology of [Carrasco and Forcada 2002]1.

The flag determines whether we add or remove string. In second case the last added state is not belong to final states, and this is all the difference.

  • Adds transitions between new states with symbols from ω and change initial state for q0':
F#
{ Seq.foldBack2 (fun a (q2, q1) ->
    addTransition ((q1, a), q2)) (List.rev ω) (List.pairwise added) M'
  with q0 = q0'}

Image 9

You can see that in code transitions are added in reverse order, but actually it doesn't matter here.

  • Then, starting from the "old" initial state and moving by transitions according to ω, removes those states that became unreacheable using function eliminate:
F#
|> eliminate (Some M.q0) ω M.Q​
F#
///takes state q, input string ω, register R, automaton M, and
///recursivly removes all unreacheable states from R and M for all δ*(q, ω)
let rec eliminate q ω R M =
    match q with
    | Some v when unreachable v M.δ ->
        match ω with
        | [] -> Set.remove v R, removeState v M
        | a :: x -> eliminate (M.δ.TryFind (v, a))
                              x
                              (R.Remove v)
                              (removeState v M)
    | _ -> R, M
F#
///takes state q, transitions δ and checks
///whether δ has transitions incoming into q or has not
let unreachable q = not << Map.exists (fun _ -> (=) q)

Image 10

  • Finally, merges equivalent states and return resulting automaton using function repreg:
F#
||> repreg added
F#
///takes reverse list of created cloned and queue states, register R,
///automaton M and recursivly merges equivalent states from list and R
let rec repreg added (R : Set<int>) M =
    match added with
    | [] -> M
    | q :: tail ->
        match Seq.tryFind (equiv M q) R with
        //if q from added states and p from R are equivalent, merge them
        //and call the function again for rest states and new automaton
        | Some p -> repreg tail R (merge p q M)//{ M with δ = merge p q M.δ }
        //if not, q is added to the register R
        | None -> repreg tail (R.Add q) M
F#
///takes automaton M, state p from register, state q from Q
///and checks for the equivalence of p and q
let equiv M p q =
    //both states are accepting or not
    (M.F.Contains p = M.F.Contains q)
    //all outgoing transitions lead to the same states
    && not <| Set.exists (fun a ->
        M.δ.TryFind (q, a) <> M.δ.TryFind (p, a)) M.Σ

///takes state p from R, state q from Q, automaton M and return new automaton
///where q is removed and transitions coming into q are redirected into p
let merge p q M =
    { M with
        Q = M.Q.Remove q;
        F = M.F.Remove q;
        δ = Map.map (fun _ -> function
                    | q_next when q_next = q -> p
                    | q_next -> q_next)
                M.δ }

Image 11

Using the code and remarks

The attachment for this article is a Visual Studio 2015 solution. The solution consists of one project - console application for .NET Framework 4.5, F# 4.0. The code is presented in Program.fs source file and contains example of defining automaton, adding some string to it and removing other string. The results are printed to console (automata displayed as a lists of states and transitions). All you need to get quick look on the project's work is to open it in Visual Studio and run it by F5 key. The code can also be processed in some interpretator, such as http://www.tryfsharp.org/Create, by extraction the launching snippet from main function into REPL queries.

This realization is one of many possible (basically due to the fact what F# allows to do one thing in different ways). I've tried to write code clear enough to understand the concept, not caring about algorithm complexity and optimization. Moreover, in some cases of F# development best performance demands the combining of functional and imperative approach. You should take it into account if you want to use this method in practice.

Although the code takes less than 200 lines, it probably can be improved (and may contain bugs). Feel free to comment your remarks and to make commits on project's GitHub page IncDFA_algorithm.

Automata pictures are created with FSM simulator.

History

2016-08-03: publication.

References

  1. Rafael C. Carrasco, Mikel L. Forcada. 2002. Incremental Construction and Maintenance of Minimal Finite-State Automata. Computational Linguistics.
  2. Daciuk Jan, Stoyan Mihov, Bruce W. Watson, Richard E. Watson. 2000. Incremental Construction of Minimal Acyclic Finite-State Automata. Computational Linguistics.
  3. Kris Smith. 2009. F# for Architects: Hitting the sweet spot.
  4. John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman. 2001. Introduction to Automata Theory, Languages, and Computation (2 ed.). Addison Wesley.

License

This article, along with any associated source code and files, is licensed under The MIT License