Each FST runs across the tape once and output to another tape once. As you can see, each state is only activated if there is a link to it whose input condition was correct, passes the remaining input to the next state, and writes to a separate output tape. This would get activated, set the output tape to just b, and pass the remaining bc to state 2. A link to state 2 matches a and changes that to b. A link is "activated" when its input condition is correct and then gives then next state the adjusted tape.įor example let's say an FST starts with the tape abc at state 1. The entire FST is represented by a set of states and links between them. A tape is essentially a set of inputs like characters in a string. In as simple terms as possible, I understand that an FST is essentially a "thing" that moves from one state to the next based on an input tape and writes to a different output tape. FSTs can be built by a variety of tools that accept an extended regular expression syntax. All the FSTs were combined into a single one using an FST compiler, which produced a single FST that was much smaller than the sum of its parts and ran very fast. I also had an FST for the verb "to be", which would turn "is" into "be+PRESENT+3rd" (3rd person), and similarly for other irregular verbs. My main FST for verbs would turn a regular verb, say "walked", into "walk+PAST". FSTs can do parsing of regular languages into strings in linear time.Īs an example, I once implemented morphological parsing as a bunch of FSTs. The FST starts out in a designated start state and jumps to different states depending on the input, while producing output according to its transition table.įSTs are useful in NLP and speech recognition because they have nice algebraic properties, most notably that they can be freely combined (form an algebra) under composition, which implements relational composition on regular relations (think of this as non-deterministic function composition) while staying very compact. pattern matching).Īn FST consists of a finite number of states which are linked by transitions labeled with an input/output pair. Please note that the details regarding the functionality of foma in that article are somewhat obsolete by now.A finite state transducer (FST) is a finite state automaton (FSA, FA) which produces output as well as reading input, which means it is useful for parsing (while a "bare" FSA can only be used for recognizing, i.e. If you use foma, you can use the following citation for attribution. Contains functions for constraining reduplication (_eq()).
![fst finite state automata fst finite state automata](https://www.shenyanchao.cn/images/lucene-fst/fst/fst-cauchy.png)
![fst finite state automata fst finite state automata](https://images.slideplayer.com/15/4847461/slides/slide_2.jpg)
![fst finite state automata fst finite state automata](https://slideplayer.com/slide/7914674/25/images/3/Dotted+Pair+Notation+1)+FSA+recogniser+for+fox+f+o+x.jpg)
GraphViz (for transducer visualization with the view command).
Fst finite state automata install#
Most of these, if missing, can be installed on Linux systems with apt install Precompiled binaries for Win32, OSX, Linux.apt install foma (for Debian/Ubuntu packaging).Also, more advanced construction methods are available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc. The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, boolean operations. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes. It has specific support for many natural language processing applications such as producing morphological analyzers. Foma - A Finite State Compiler and Library Foma - a finite-state compiler and C libraryįoma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses.