Teaching an AI to Play ZX Spectrum Games

Because why should Atari get all the love?

So this all started by doing a lot of yak shaving:

  • For my Oakley compiler I realised I needed a way to build Z80 assembly code programmatically in C#, a project I called OakAsm.
  • I decided that it would be good to share the code for an abstract syntax tree of Z80 assembly with the main Oakley compiler, so I span off the MrKWatkins.Ast project.
  • At this point I made the mistake of thinking "well now I have the core of an assembler I could easily add a parser on top and make it compile .asm files", which I then did.
  • Of course if I build assembly for my compiler I'm going to need to test that assembly, and for that I'll need a Z80 emulator in .NET. I had a look around and couldn't find one that really worked for my needs so I decided to write one. Plus I'd always wanted to, which made the decision much easier...
  • And then decided it would be nice to have a disassembler to help with the emulator. I have the core of that in OakAsm, so that would be easy to do too right? And maybe a nice way to format the assembly whilst I'm at it?
  • Turns out that the Z80 has a lot of quirks and undefined behaviour. And I couldn't not emulate all that could I? So I spent a lot of time learning about MEMPTR and other esoteric behaviour.

At this point I was still technically working on the Oakley compiler, even if I hadn't actually touched that main project for months. It was all work still linked to Oakley. But then I decided I need to start learning AI in detail. AI has become massive and I'm a computer programmer, so I don't want to get left behind.

Reinforcement learning is the branch of AI that I find most interesting. LLMs bore me; working with ChatGPT is just typing a question into a search engine, albeit a pretty clever one. But with reinforcement learning people have taught computers to play chess and go better than any human. And they have even been playing Atari games with AI since the seminal DeepMind paper.

Why should Atari get all the love though? The ZX Spectrum is obviously a far better computer all round. So I decided I'd teach an AI to play Speccy games. Which then led to another batch of tasks:

  • I needed an emulator. Given I already had the Z80 emulator I thought why not write a full ZX Spectrum emulator? Another project I'd always wanted to try...
  • That was a lot of fun which meant I inevitably got dragged into adding stuff into it I didn't really need for AI, but I just enjoyed coding. Getting the emulator to finally load a tape of one of my favourite games, Spy Hunter, was a particular highlight because it uses the Speedlock protection system that was a massive pain to get the tape audio exactly right for it.
  • Learning Python as pretty much all AI is Python.
  • Learning about Gymnasium which is a well known way to provide an environment for AI training.
  • Learing about Ray which provides many AI algorithms over the top of Gymnasium.
  • Creating Python wrappers over OakEmu, a Gymnasium environment and various bits of Ray helper code.
  • Hacking apart example Ray code for training Atari games.

Then I was ready to train! What game to start with? Well there is nothing more iconic than Manic Miner. So I setup an environment for Manic Miner. An environment basically needs to be able to:

  • Start the game.
  • Play the game for a period of time, usually one or more frames.
  • Take an action, such as 'move right' or 'jump' and use it whilst playing a frame.
  • Pass the current state of the screen to the AI training algorithm. The AI can then learn from the image to pick the best action possible. We usually give it more than one frame so it has some sense of things moving around the screen.
  • Calculate a reward for the frame. The AI trains by first starting with random actions for the screen input, but over time it learns which actions for the input gave the biggest rewards. The simplest reward we can calculate is the score increase per frame.
  • Give a reward state for the frame, i.e. have we won the game, have we lost, or should we just continue?

To do all this required some hacking of Manic Miner, so we can jump straight to the start of the level for example or extract the current score. Luckily there are several games out there that have been disassembled with the excellent tool Skoolkit, and Manic Miner is one. This gave me everything I needed to make a Manic Miner environment.

I used an algorithm called Proximal Policy Optimization to train the model. This is a more modern algorithm than the one used in the original Atari paper, and is much faster at training AIs.

...But it didn't really work. The problem is that Manic Miner has sparse rewards, i.e. most of the actions you do don't give you any points. Just collecting an object and getting to the exit at the end earn rewards. Which means it can take quite a lot of random actions until you actually pick up an object, so training takes quite a while. Combine that with the fact that my computer is 8 years old, has no graphics card (training without a GPU is very slow) and doesn't really have enough memory (I had to keep stopping and restarting the training to avoid out-of-memory crashes) and Manic Miner just did not work for me.

I needed to choose a game without sparse rewards. And ideally an iconic Spectrum game. I chose the fantastic Deathchase, which luckily also has a disassembly. A motorbike game where you chase bad guys through a forest, you can score just by moving forward with extra points for shooting the bad guys.

And that worked pretty well! Especially given I'd just taken the model settings directly from the Atari example which involves squishing the input image to 64 x 64, losing a lot of detail. It learned impressively quickly that you need to move forward on the bike and avoid trees. It also learning that shooting was good, although it just shoots constantly so far rather than picking its shots. But given that's how I play the game I can't really complain can I?

0:00
/0:12

AI playing Deathchase after a few hours of training

So what is next? Lots to do:

  • Firstly I'm treating myself to a new computer with lots of CPUs, memory and a GPU. This means I can train models much faster, making it much easier to try new things and see the results quickly. I can also use a higher input resolution and larger models that my current PC just cannot handle.
  • More training! Get a better Deathchase model and go back to Manic Miner.
  • Move a lot of the environment code from Python into OakEmu. Currently things like the frame skipping and number of frames to pass to the model are all in Python, but I'll need that to run models directly from the emulator.
  • Create a format for a configuration file that can be used to specify an environment for a game. The games so far are all setup via C# code which is a pain if people want to setup environments for other games.
  • Add a nice UI to the emulator so people can use it as an emulator, test new configuration files, run AI models, export videos, etc.
  • Open source all the above code. It needs a bit of tidying up first; I do have a professional reputation to maintain after all... Although I'll probably stop being so precious and just get the code out there with a big "please don't judge me" in the ReadMe. Most likely I won't accept pull requests though as this is my pet project.
  • Write lots of blog posts about it all.

Unfortunately real life does tend to get in the way somewhat so no promises as to when any of this might get done...