Counting distinct n-state DFAs

On the occasion of the confirmation of BB(5) (a problem involving the enumeration of Turing machines), Thomas Baruchel asks the following simpler question about finite state machines:

Given an alphabet of size \(L\), how many distinct languages over that alphabet are recognizable by a complete DFA with \(n\) states?

For \(L=1\), this is the partial sums of OEIS A059412: A059412(\(n\)) counts the number of distinct unary languages recognizable by a minimal complete DFA with \(n\) states, where a DFA is defined to be “minimal” if and only if no smaller DFA recognizes the same language.

For \(L=2\), this is the partial sums of A129622, which counts the number of distinct binary languages recognizable by a minimal complete DFA with \(n\) states. (However, as of this writing, A129622(1)=2 is omitted from OEIS, which means you’ll have to remember to add it to those partial sums yourself.)

For \(L=3\), the count of distinct languages over the alphabet {a, b, c} is not yet in the OEIS, but its analogous sequence would begin \(2, 112, 41928,\ldots\).

It seems more natural to me to count the total number of languages recognizable by \(n\)-state DFAs; you can simply subtract adjacent elements to obtain the OEIS sequence. The count will always be an even number, because for any DFA that accepts a language including the empty string, you can turn it into a DFA that accepts the exactly complementary language (not including the empty string) by simply flipping all the states’ “accept” bits.

In the following table, column L=1 is the partial sums of A059412. Column L=2 is the partial sums of A129622. Row n=2 seems to be \(2(2^{2L} - 2^L + 1)\) (i.e., 2×A020515, or 1+A092440).

Count of distinct languages over an alphabet of size L
which are recognizable by a DFA with at most n states.

      L=1   L=2      L=3     L=4     L=5      L=6   L=7    L=8
n=1   2     2        2       2       2        2     2      2
n=2   6     26       114     482     1986     8066  32514  130562
n=3   18    1054     42042   1353214 39676458
n=4   48    57068            2048646
n=5   126   3762374
n=6   306   290480170
n=7   738   25784367022
n=8   1716

The most impressive of these numbers are just copied from OEIS. Simple Python and C++ code for the smaller numbers is here.

The six distinct languages over alphabet {a} recognizable by 2-state DFAs are these three and their complements:

.*
.*.
(..)*

The twenty-six distinct languages over alphabet {a, b} recognizable by 2-state DFAs are these 13 and their complements:

.*    
..*    
(..)*    
a* b*
.*a .*b
(a|b.)* (b|a.)*
(.a*b)* (.b*a)*
(a|ba*b)* (b|ab*a)*

Equivalence “up to recoloring of the alphabet”

Notice that the above table has only 8 languages and their complements “up to recoloring of the alphabet”; for example, to recognize the language b* it suffices to map all bs in your input to as and vice versa, and then feed the resulting string to the DFA for a*.

To recognize .* it suffices to map both letters to a and then use the DFA for a*; but the reverse isn’t true — you can’t use the DFA for .* to recognize the language a* — and so we don’t consider those two languages equivalent.

As the alphabet size \(L\) increases, the count of languages “equivalent up to recoloring of the alphabet” also increases. Maybe we should count each equivalence class only once. This makes the count of distinct \(n\)-state DFAs converge to a well-defined sequence \(a(k) = f(n=k, L\geq k^2) = 2, 26,\ldots\).

Count of distinct languages over an alphabet of size L
which are recognizable by a DFA with at most n states,
up to recoloring of the alphabet.

      L=1   L=2      L=3     L=4     L=5
n=1   2     ...
n=2   6     16       24      26      ...
n=3   18    540      7038
n=4   48    28640
n=5   126   1882192
n=6   306
n=7   738

The twenty-six distinct-up-to-recoloring languages recognizable by any 2-state DFA are these 13 and their complements. The first eight rows of this table match the eight rows of the previous table; the next five rows are new. This set of twenty-six DFAs is closed — if you add to any one of these diagrams the state transitions for a new letter e, you’ll find that you’ve merely duplicated one of the other diagrams in the set.

a*
aa*
(aa)*
a*
(a|b)*a
(a|b(a|b))*
((a|b)a*b)*
(a|ba*b)*
(b|c|(a(a|c)*b))*
(b|c|(ac*(a|b)))*
(c|((a|b)(b|c)*a))*
(c|((a|b)b*(a|c)))*
(c|d|((a|b)(b|d)*(a|c)))*
Posted 2024-07-27