A State Machine Simulator

This is it: the last day of NaCreSoMo 2014 and the finale of the CS mini-tutorial. We learned how to build a lexer. We learned what a state machine is, and how they are used to match or classify text. And we built a parser, using our lexer, that understands a language that describes state machines.

Today we’re going to go all the way: actually simulating a state machine. You can also just skip to the fun part: playing with the completed simulator.

  1. What is a Lexer?
  2. What are State Machines?
  3. What are Regular Expressions?
  4. What is a Parser?

Where we left off last time, we had a list of states, and each state had a label (name or number) and a list of transitions. What does it take to build an actual state machine out of that?

<div>
class StateMachine

</div>

First, we need some way to build one.

<div>
def initialize(states)

</div>

We have a list of all the states, but that makes it hard to jump from one state to another. Let’s build a map instead: a structure where you can look up a state by its name.

<div>
@states = {}
	
	states.each do |state|

</div>

We should first check that we don’t have any duplicate names.

<div>
raise "duplicate state #{state.name}" unless @states[state.name].nil?
	@states[state.name] = state

</div>

And we should also be looking for the start state. Otherwise, how will we know where to start?

<div>
if state.start?
		raise "multiple start states: #{@start_state.name} and #{state.name}" unless @start_state.nil?
		@start_state = state
	end

</div>

On the flip side, if none of the states are start states, we also have a problem.

<div>
end
	
	# We should have exactly one start state.
	raise "missing start state" if @start_state.nil?

</div>

Now that we’ve built up a map with every state in it, we can check all the transitions to make sure they’re valid.

<div>
states.each do |state|
		state.transitions.each do |transition|
			destination = transition.destination_name
			next if destination.nil? # A nil name represents an error transition.
			raise "destination state #{destination} doesn't exist" if @states[destination].nil?
		end
	end

</div>

Okay, we’ve built a valid state machine! Note that we didn’t check for any final states: it’s valid to have a state machine with no final states (like the stop light machine).

<div>
end

</div>

Now that we have a state machine, we need to know how to run it. That’s a bit easier: we just go until we run out of input.

<div>
def match(input)
		@current_state = @start_state
		input = transition(input) until input.empty?
		return final?
	end

</div>

This is one of the great things about programming: you can write parts of the program assuming the rest of the program already works. We don’t actually know how to transition yet, but we can still write this match part. (As another example, the state machine uses the output of the parser, but it doesn’t have to know anything about the lexer.)

Actually writing the transition part isn’t that much harder:

<div>
def transition(input)

</div>

We start off by finding the first transition that matches the input. We do this in order because we want to be able to match some transitions (like "@") before others (like any symbol).

<div>
remaining_input = nil
	transition_to_take = @current_state.transitions.find do |transition|
		remaining_input = transition.match(input)
		!remaining_input.nil? # If the match attempt didn't return "nil", it succeeded!
	end

</div>

If we didn’t match anything, we should go to the error state, which we’ll represent with nil.

<div>
if transition_to_take.nil?
		@current_state = nil

</div>

But otherwise we should look up the transition’s destination in the map of all states.

<div>
else
		@current_state = @states[transition_to_take.destination_name] 
	end

</div>

If we end up in an error state, we should stop processing the input, so let’s pretend we’ve reached the end:

<div>
return "" if @current_state.nil?

</div>

Otherwise, the transition will have told us what input is left. We can send that back to continue being matched.

<div>
return remaining_input

</div>

And we’re done!

<div>
end

</div>

Here’s the entire StateMachine class, along with the new helpers we need to match plain-transitions and any-transitions.

<div>
class StateMachine
	  def initialize(states, whitespace: true)
	    @whitespace_sensitive = whitespace
	
	    # Use a mapping from state names to State objects,
	    # rather than just a list.
	    @states = {}
	
	    # Add all the states to the map.
	    states.each do |state|
	      raise "duplicate state #{state.name}" unless @states[state.name].nil?
	      @states[state.name] = state
	
	      if state.start?
	        raise "multiple start states: #{@start_state.name} and #{state.name}" unless @start_state.nil?
	        @start_state = state
	      end
	    end
	
	    # We should have exactly one start state.
	    raise "missing start state" if @start_state.nil?
	
	    # Check all the transitions.
	    states.each do |state|
	      state.transitions.each do |transition|
	        destination = transition.destination_name
	        next if destination.nil? # A nil name represents an error transition.
	        raise "destination state #{destination} doesn't exist" if @states[destination].nil?
	      end
	    end
	  end
	
	  def match(input, verbose: false)
	    @current_state = @start_state
	    until input.empty?
	      input = input.lstrip unless @whitespace_sensitive
	      puts "\tstate #{self.current_state.name}, input '#{input}'" if verbose
	      input = transition(input)
	    end
	    return final?
	  end
	
	  def transition(input)
	    # Find the first transition that matches the input.
	    remaining_input = nil
	    transition_to_take = @current_state.transitions.find do |transition|
	      remaining_input = transition.match(input)
	      !remaining_input.nil? # If the match attempt didn't return "nil", it succeeded!
	    end
	
	    if transition_to_take.nil?
	      @current_state = nil
	    else
	      @current_state = @states[transition_to_take.destination_name] 
	    end
	
	    # Stop processing input if we're now in an error state.
	    return "" if @current_state.nil?
	
	    return remaining_input
	  end
	
	  def current_state
	    @current_state
	  end
	
	  def error?
	    @current_state.nil?
	  end
	
	  def final?
	    !error? && @current_state.final?
	  end
	end
	
	class PlainTransition
	  # A plain transition matches input if the input starts with the text.
	  def match(input)
	    # Index 0 is the first letter, which we know is a quote.
	    # Index -1 is the last letter, which should also be a quote.
	    # So we test letters 1 through -2...
	    if input.start_with?(@quoted_text[1..-2])
	      # ...and chop off the first <length> letters from the input if there's a match.
	      return input[(@quoted_text.length-2)..-1]
	    end
	    # Otherwise, return the special "nil" value for "no match".
	    return nil
	  end
	end
	
	class AnyTransition
	  def match(input)
	    # Chop off the first letter if we get a match.
	    case @kind
	    when "letter"
	      return input[1..-1] if letter?(input[0])
	    when "number"
	      return input[1..-1] if digit?(input[0])
	    when "space"
	      return input[1..-1] if space?(input[0])
	    when "symbol"
	      return input[1..-1] unless letter?(input[0]) || digit?(input[0]) || space?(input[0])
	    end
	    return nil
	  end
	end

</div>

Whew! You can view the complete working source on Github.


That is it. We’ve built a language for state machines, and we can now turn a description of a state machine into a simulation.

Want to try it?

Remember:

  • You can have more than one final state.
  • State labels can be quoted text too ("name"), not just numbers.

This version is built in CoffeeScript rather than Ruby. I condensed it a bit and used some more advanced features. You can find it on Github as well.


So that’s it! I’m…pretty sure I lost all of my readers in this tutorial. I’m definitely sure the best sections are the diagram-crazy, non-coding ones on state machines and regular expressions. But it was still fun getting here.

Extra credit: add a “description” production to the parser and turn the state machine into an adventure game.

blog comments powered by Disqus