01.21.2019 Pac-Man Specification and Synthesis Milestone Report

This milestone was reached: in the workbench demo UI, a user runs a program that draws all 16 of the Namco Pac-Man colors. The work primarily consisted of:

- Path compiler. I created a new synthesis tool that reads a text file enumerating known types and functions (with their type signatutes), including hints for the desired input and output types, creates a type graph in memory (and produces a GraphViz visualization), solves for the shortest path between input and output types, and constructs a flip program with each function-edge corresponding to a known flip subprogram that is loaded from disk and laid into place in the constructed program, with suitable bit copying to stitch inputs to outputs.
- Loop inference. The path compiler when faced with a vector output type and a source type with a range capable of filling the vector, assumes that repetition is desired, and synthesizes an unrolled loop to fill the output vector. This mechanism is currently somewhat peculiar to the immediate demo target and will likely need to some modification to be generalized for future tasks.
- Mathematical model. Along the way, I was able to refine the mathematical model of the flip machine, which lead to startling connections to linear algebra and classical linear groups. More on this below.
- Workbench UI. The Workbench UI has been improved toward becoming fully document- or project-oriented.

Observations

Linear Computation. The Flip Machine was conceived as a model of computation based on a minimal finite group action on a set of bits. While articulating the action of this group, I eventually realized that in fact a more natural and complete perspective is to view the bit state space as binary vector space. In this view, {0, 1} with modular addition and multiplication is the finite field with two elements, and with N of these bits we have an N-dimensional vector space. The original "flip group" plays the role of abelian additive group of the vector space.

Things get really interesting now. When viewed as flip group, we could ask the question -- what if we only looked at functions that are automorphisms on the bit space? When viewed as a vector space, the corresponding question becomes -- what if we only looked at functions that are invertible linear transformations on the bit space? And the answer here is that the set of all such functions, known classically as the general linear group over a vector space, in the case of our binary vector space, becomes a simple group known as the projective special linear group, or PSL(n, 2). (Simple groups are sort of akin to prime numbers.) The projective special linear groups are one class of groups in the Classification of Finite Simple Groups.

Also note that this view of computation actually has nothing to do with Flip per se, it applies to all computation on two-valued bits. Additionally, the Hamming distance (metric) bestows binary vector space with a decent topology.

Connecting this to program synthesis and Pac-man, the question is -- is there any advantage to adopting the constraint of linearity and invertibility in the function-programs we choose? Currently, the Flip synthesis tools do not adopt this constraint -- they readily produce programs which compute functions on the bit space which are not linear. There is a fairly obvious way to modify the tools to produce only programs that compute invertible linear transformations, since the taks can be posed as solving for a binary matrix that fits an equation expressing the desired inputs and outputs, which directly translates to a boolean variable satisfiability problem (SAT). While this problem is NP-complete in general, SAT solver libraries can quickly find solutions for problems with thousands of variables. The solvability of this task rests on a small lemma which I will attempt a proof for: every binary vector space operator is the restriction of an invertible linear operator. I believe you can just add some number of bits to "unpack" the nonlinearities. A related interesting thought is that the number of additional bits that need to be added to compute an equivalent linear function, could perhaps be an interesting measure of nonlinearity.

Here are some potential advantages to adopting the linear computation constraint:

- Massive reduction in size of the space of possible programs. If one ends up searching the space of all possible programs for a suitable program (I haven't had to resort to this yet), a smaller space is better. It's encouraging to me that this reduction in size aligns with my vague intuition months ago that utilizing underlying algebraic structure could benefit program synthesis. To be specific about the reduction in size, consider these sizes when dealing with just 8 bits:

Description | Set | Order (size) | Approximation |

Functions from 8 bits to 8 bits | {F_2^8}^{F_2^8} | 256^256 | ~ 10^616 |

Invertible functions from 8 bits to 8 bits | S_{F_2^8} | 256! | ~ 10^507 |

Invertible linear functions from 8 bits to 8 bits | PSL(8, 2) | ~ 10^19 |

- Reduction in size of input-output specifications (specification compression). The current tool that generates a flip program from a set of input and output bit patterns requires the specification to completely enumerate the patterns on N bits, and so requires 2^N rules. A linear program only needs a specification on a basis for the vector space, and so only needs N rules.
- Reduction in size of programs (program compression). The path compiler may produce a long program. If each of the subprograms is linear and invertible, the aggregate program is also linear and invertible, and thus is equivalent to a binary matrix on the max number of bits used by any subprogram. This fact can be used to derive a much smaller equivalent program.

(Thanks to Andrew for the term "linear computation").

Binary Vector Machines. Our boy Alan Turing showed that intuitively computable numbers are computable by arithmetic machines. And today, virtually all computers descend from this lineage -- they are arithmetic machines. But today, the more interesting question is not what human ideas are computable, but rather how can we human express ideas in computation? And in this endeavor, I think it's plausible that the programming surfaces presented by arithmetic machines incur huge and unnecessary obstacles. So, universal -- yes, but elegantly supporting increasingly automated production of software -- perhaps not. Today's armies of software engineers take as a given fact that the only embodiment of human ideas into software is via an arithmetic type machine processor plus higher level language compilers that program to those interfaces. I believe an entirely different lineage is possible, one in which ideas are not funnelled through arithmetic. For example, say the linear computation approach above works out nicely. One can imagine running such programs on a native binary vector machine, which would provide a programming interface solely concerned with primitive operations on binary vectors (e.g. x = x + y addition (aka boolean XOR), x = bx scalar multiplication, b = x . y dot product.)

Category Theory. The type graph utilized by the path compiler is roughly equivalent a category defined by a pure functional language, and in this sense the path compiler is like a compiler for a functional language in which no textual representation of the language is used. Functions are specified by either their input-output bit patterns or by their input-output types and their implementations are solved for. This simple scheme works for the simple demos created so far.

Future directions:

- Prove the linearifcation lemma and modify the synthesis tools to produce linear programs.
- Recursively use the output of the path compiler. Since the output of the path compiler defines a new function with known input and output types, a record for this could be created and then this new function can be reused.
- Investigate polynomials over F_2.
- Summarize the relation between machine-level reversibility and function-level invertibility.