A State Machine Simulator

This is it: the last day of NaCreSoMo 2014 and the finale of the CS mini-tutorial. We learned how to build a lexer. We learned what a state machine is, and how they are used to match or classify text. And we built a parser, using our lexer, that understands a language that describes state machines.

Today we’re going to go all the way: actually simulating a state machine. You can also just skip to the fun part: playing with the completed simulator.

  1. What is a Lexer?
  2. What are State Machines?
  3. What are Regular Expressions?
  4. What is a Parser?

Where we left off last time, we had a list of states, and each state had a label (name or number) and a list of transitions. What does it take to build an actual state machine out of that?

class StateMachine

First, we need some way to build one.

def initialize(states)

We have a list of all the states, but that makes it hard to jump from one state to another. Let’s build a map instead: a structure where you can look up a state by its name.

@states = {}

states.each do |state|

We should first check that we don’t have any duplicate names.

raise "duplicate state #{state.name}" unless @states[state.name].nil?
@states[state.name] = state

And we should also be looking for the start state. Otherwise, how will we know where to start?

if state.start?
    raise "multiple start states: #{@start_state.name} and #{state.name}" unless @start_state.nil?
    @start_state = state
end

On the flip side, if none of the states are start states, we also have a problem.

end

# We should have exactly one start state.
raise "missing start state" if @start_state.nil?

Now that we’ve built up a map with every state in it, we can check all the transitions to make sure they’re valid.

states.each do |state|
    state.transitions.each do |transition|
        destination = transition.destination_name
        next if destination.nil? # A nil name represents an error transition.
        raise "destination state #{destination} doesn't exist" if @states[destination].nil?
    end
end

Okay, we’ve built a valid state machine! Note that we didn’t check for any final states: it’s valid to have a state machine with no final states (like the stop light machine).

end

Now that we have a state machine, we need to know how to run it. That’s a bit easier: we just go until we run out of input.

def match(input)
    @current_state = @start_state
    input = transition(input) until input.empty?
    return final?
end

This is one of the great things about programming: you can write parts of the program assuming the rest of the program already works. We don’t actually know how to transition yet, but we can still write this match part. (As another example, the state machine uses the output of the parser, but it doesn’t have to know anything about the lexer.)

Actually writing the transition part isn’t that much harder:

def transition(input)

We start off by finding the first transition that matches the input. We do this in order because we want to be able to match some transitions (like "@") before others (like any symbol).

remaining_input = nil
transition_to_take = @current_state.transitions.find do |transition|
    remaining_input = transition.match(input)
    !remaining_input.nil? # If the match attempt didn't return "nil", it succeeded!
end

If we didn’t match anything, we should go to the error state, which we’ll represent with nil.

if transition_to_take.nil?
    @current_state = nil

But otherwise we should look up the transition’s destination in the map of all states.

else
    @current_state = @states[transition_to_take.destination_name] 
end

If we end up in an error state, we should stop processing the input, so let’s pretend we’ve reached the end:

return "" if @current_state.nil?

Otherwise, the transition will have told us what input is left. We can send that back to continue being matched.

return remaining_input

And we’re done!

end

Here’s the entire StateMachine class, along with the new helpers we need to match plain-transitions and any-transitions.

class StateMachine
  def initialize(states, whitespace: true)
    @whitespace_sensitive = whitespace

    # Use a mapping from state names to State objects,
    # rather than just a list.
    @states = {}

    # Add all the states to the map.
    states.each do |state|
      raise "duplicate state #{state.name}" unless @states[state.name].nil?
      @states[state.name] = state

      if state.start?
        raise "multiple start states: #{@start_state.name} and #{state.name}" unless @start_state.nil?
        @start_state = state
      end
    end

    # We should have exactly one start state.
    raise "missing start state" if @start_state.nil?

    # Check all the transitions.
    states.each do |state|
      state.transitions.each do |transition|
        destination = transition.destination_name
        next if destination.nil? # A nil name represents an error transition.
        raise "destination state #{destination} doesn't exist" if @states[destination].nil?
      end
    end
  end

  def match(input, verbose: false)
    @current_state = @start_state
    until input.empty?
      input = input.lstrip unless @whitespace_sensitive
      puts "\tstate #{self.current_state.name}, input '#{input}'" if verbose
      input = transition(input)
    end
    return final?
  end

  def transition(input)
    # Find the first transition that matches the input.
    remaining_input = nil
    transition_to_take = @current_state.transitions.find do |transition|
      remaining_input = transition.match(input)
      !remaining_input.nil? # If the match attempt didn't return "nil", it succeeded!
    end

    if transition_to_take.nil?
      @current_state = nil
    else
      @current_state = @states[transition_to_take.destination_name] 
    end

    # Stop processing input if we're now in an error state.
    return "" if @current_state.nil?

    return remaining_input
  end

  def current_state
    @current_state
  end

  def error?
    @current_state.nil?
  end

  def final?
    !error? && @current_state.final?
  end
end

class PlainTransition
  # A plain transition matches input if the input starts with the text.
  def match(input)
    # Index 0 is the first letter, which we know is a quote.
    # Index -1 is the last letter, which should also be a quote.
    # So we test letters 1 through -2...
    if input.start_with?(@quoted_text[1..-2])
      # ...and chop off the first <length> letters from the input if there's a match.
      return input[(@quoted_text.length-2)..-1]
    end
    # Otherwise, return the special "nil" value for "no match".
    return nil
  end
end

class AnyTransition
  def match(input)
    # Chop off the first letter if we get a match.
    case @kind
    when "letter"
      return input[1..-1] if letter?(input[0])
    when "number"
      return input[1..-1] if digit?(input[0])
    when "space"
      return input[1..-1] if space?(input[0])
    when "symbol"
      return input[1..-1] unless letter?(input[0]) || digit?(input[0]) || space?(input[0])
    end
    return nil
  end
end

Whew! You can view the complete working source on Github.


That is it. We’ve built a language for state machines, and we can now turn a description of a state machine into a simulation.

Want to try it?

Remember:

  • You can have more than one final state.
  • State labels can be quoted text too ("name"), not just numbers.

This version is built in CoffeeScript rather than Ruby. I condensed it a bit and used some more advanced features. You can find it on Github as well.


So that’s it! I’m…pretty sure I lost all of my readers in this tutorial. I’m definitely sure the best sections are the diagram-crazy, non-coding ones on state machines and regular expressions. But it was still fun getting here.

Extra credit: add a “description” production to the parser and turn the state machine into an adventure game.