Newt: A Tiny Embeddable Python Subset

I've been helping teach robotics programming to students in grades 5 and 6 for a number of years. The class uses Lego models for the mechanical bits, and a variety of development environments, including Robolab and Lego Logo on both Apple ][ and older Macintosh systems. Those environments are quite good, but when the Apple ][ equipment died, I decided to try exposing the students to an Arduino environment so that they could get another view of programming languages.

The Arduino environment has produced mixed results. The general nature of a full C++ compiler and the standard Arduino libraries means that building even simple robots requires a considerable typing, including a lot of punctuation and upper case letters. Further, the edit/compile/test process is quite long making fixing errors slow. On the positive side, many of the students have gone on to use Arduinos in science research projects for middle and upper school (grades 7-12).

In other environments, I've seen Python used as an effective teaching language; the direct interactive nature invites exploration and provides rapid feedback for the students. It seems like a pretty good language to consider for early education -- "real" enough to be useful in other projects, but simpler than C++/Arduino has been. However, I haven't found a version of Python that seems suitable for the smaller microcontrollers I'm comfortable building hardware with.

How Much Python Do We Need?

Python is a pretty large language in embedded terms, but there's actually very little I want to try and present to the students in our short class (about 6 hours of language introduction and another 30 hours or so of project work). In particular, all we're using on the Arduino are:

  • Numeric values
  • Loops and function calls
  • Digital and analog I/O

Remembering my childhood Z-80 machine with its BASIC interpreter, I decided to think along those lines in terms of capabilities. I think I can afford more than 8kB of memory for the implementation, and I really do want to have "real" functions, including lexical scoping and recursion.

I'd love to make this work on our existing Arduino Duemilanove compatible boards. Those have only 32kB of flash and 2kB of RAM, so that might be a stretch...

What to Include

Exploring Python, I think there's a reasonable subset that can be built here. Included in that are:

  • Lists, numbers and string types
  • Global functions
  • For/While/If control structures.

What to Exclude

It's hard to describe all that hasn't been included, but here's some major items:

  • Objects, Dictionaries, Sets
  • Comprehensions
  • Generators (with the exception of range)
  • All numeric types aside from single-precision float

Implementation

Newt is implemented in C, using flex and bison. It includes the incremental mark/sweep compacting GC system I developed for my small scheme interpreter last year. That provides a relatively simple to use and efficient memory system.

The Newt “Compiler”

Instead of directly executing a token stream as my old BASIC interpreter did, Newt is compiling to a byte coded virtual machine. Of course, we have no memory, so we don't generate a parse tree and perform optimizations on that. Instead, code is generated directly in the grammar productions.

The Newt “Virtual Machine”

With the source compiled to byte codes, execution is pretty simple -- read a byte code, execute some actions related to it. To keep things simple, the virtual machine has a single accumulator register and a stack of other values.

Global and local variables are stored in 'frames', with each frame implemented as a linked list of atom/value pairs. This isn't terribly efficient in space or time, but was quick to implement the required Python semantics for things like 'global'.

Lists and tuples are simple arrays in memory, just like C Python. I use the same sizing heuristic for lists that Python does; no sense inventing something new for that. Strings are C strings.

When calling a non-builtin function, a new frame is constructed that includes all of the formal names. Those get assigned values from the provided actuals and then the instructions in the function are executed. As new locals are discovered, the frame is extended to include them.

Testing

Any new language implementation really wants to have a test suite to ensure that the desired semantics are implemented correctly. One huge advantage for Newt is that we can cross-check the test suite by running it with Python.

Current Status

I think Newt is largely functionally complete at this point; I just finished adding the limited for statement capabilities this evening. I'm sure there are a lot of bugs to work out, and I expect to discover additional missing functionality as we go along.

I'm doing all of my development and testing on my regular x86 laptop, so I don't know how big the system will end up on the target yet.

I've written 4836 lines of code for the implementation and another 65 lines of Python for simple test cases. When compiled -Os for x86_64, the system is about 36kB of text and another few bytes of initialized data.

Links

The source code is available from my server at https://keithp.com/cgit/newt.git/, and also at github https://github.com/keith-packard/newt. It is licensed under the GPLv2 (or later version).