TL;DR

If you want to see it in action, click here to play the WASM version. It’s currently not as fast as the desktop app, but it works!

Why?

Why not?

But in all seriousness, I eventually want to make a GBA emulator. However, I’ve always been put it off since reading the hardware documentation on GBATEK makes me go cross-eyed.

The CHIP8 is a good baby step to that end goal due to it’s simplicity in comparison the the GBA:

  • Fewer machine instructions to interpret (37 instructions compared to an entire ARM7 CPU)
  • Simpler audio system (One note compared to 4 channel audio)
  • Simpler memory layout (No banking needed)
  • Simpler Display (No sprites)

Great, How Do You Get Started?

Well, what follows is a broad overview on how each component of the emulator works.

Environment Setup

See the github repo for detailed instructions on how to build everything.

If, for some reason, you want to do this yourself from scratch… see below

Invaluable Resources

I really didn’t want a tutorial that just gave you the entire source code of the interpreter, since the temptation to just straight up copy code without understanding is WAY too high. These two are the best resources that I could find that meet this criteria:

  • COWGOD!!!!!: Old website from 2002 that very clearly lays out the various hardware components and assembly instructions that comprise the CHIP8 system.
  • Tobias Langhoff: A much more modern guide to building a CHIP8 emulator. It covers a lot of gotchas that you might encounter will coding everything up.
  • Test Rom Repository: This repo consists of a wide variety of tests roms that are useful for debugging. Especially useful for debugging keyboard stuff.

“Hardware”

CPU

Data Buses

First, a brief tangent on data buses.

Other physical systems, like the NES and GBA, utilize actual CPUs like the 6502 or some ARM processor. When emulating these physical systems, you can easily decouple the CPU/opcode parsing from the rest of the system by introducing a Bus class on which all of your devices can read and write from. You can think of the Bus as the shared memory of the system. Other devices can then be attached to the Bus by assigning address ranges that the device can read and write from. This procedure is used to replicate other hardware functionality as needed needed (like the PPU or the Audio processing unit in the NES).

The reason that this is relevant for my CHIP8 emulator is that I did NOT end up using a data bus. The reasoning behind this is that some CHIP8 assembly instructions are very tightly coupled with the other hardware of the system (like DRW and SKP for instance). While in theory, I could define memory address ranges that each device could read and write from, it was just simpler to have the CPU module solely focus on controlling it’s internal registers, and then handle the opcode parsing and instruction execution in the aggregate CHIP8 interpreter class, after all the other hardware was defined.

Interface

All of that being said, the “CPU” unit is fairly simple.

The CPU has several 16 general-purpose registers which can be accessed by the user. They are labelled with hex values 0x0 through 0xF. 0xF is special in that it is designated to hold all the flags that happen as a result of other instructions (Ex. when you overflow on addition, VF gets set to 1).

After that, there are more specialized registers:

  • The program counter (pc) keeps track of the instruction that needs to be executed next
  • The original CHIP8 spec calls for a stack pointer and a 16 byte array which is utilized like a stack data structure. I made a wrapper around std::deque and set a max value of 16
  • The sound and delay registers are auto decremented at a rate of 60 Hz if they are non-zero. The auto-decrement functionality is taken care of in the final integration class (see here) in the same breath as the rest of the timing functions across the other hardware
    • I originally tried to do this decrementing in a separate thread. This turned out to be more trouble than it was worth, since it made it difficult to consistently test things. In the end, it turned out to be overkill for the CHIP8.
  • There is also a 16 bit register called I whose lower 3 nibbles can be set by various instructions.

I had the CPU provide access control to all the aforementioned registers. What entails is making all registers private members, and then implementing getters and setters with error checking/ any additional necessary computation (for instance, an invariant on I is that the upper most nibble should always be 0. This is achieved by masking any value that is passed into set I with 0xFFF).

Display

This is effectively just a wrapper around SDL_Window, SDL_Renderer, and SDL_Texture. The CHIP8 has a 64x32 pixel display. Reading from the screen is as simple as indexing the array (Although I did the one dimensional array indexing trick to avoid using array of arrays). The way that you write to the screen is by XORing 1 with a particular pixel coordinate. If the value at that coordinate is already set, then it gets turned off. Otherwise, it get turned on.

Memory

The memory space of the CHIP8 is small enough that I can just zero-initialize a uint8_t array of size 4096 as the address space. I can then fill the first hundred bytes with the default character set (see here to see what I’m talking about). Read and write functionality are as expected, with some additional bound checking to make sure that you are writing to a valid address.

Sound

Sound was tricky. While there were plenty of tutorials on how to play files with SDL (include LazyFoo’s lovely series), I found it difficult to find a tutorial on how to directly write to the drivers with your own sounds. I eventually stumbled on Fredrb’s Blog which covered the exact use case that I needed: playing a single note.

To briefly summarize what he did: you need to define a function with the signature

void oscillator_callback(void *userdata, Uint8 *stream, int len)

where void* userdata is some external data you want to pass along to the function, stream the audio buffer that you need to fill with samples, and len is the size of stream.

This function gets passed to the constructor of SDL_AudioSpec, which in turn gets passed to SDL_OpenAudioDevice, after which you run the function SDL_PauseAudioDevice to start playing the desired tone.

The only problem was that his example code was in C. And while I guess I could have just slapped the C code in there, I decided to turn his code into more idiomatic C++ code. It wasn’t too bad. The only sticking point was passing the this pointer to userdata in SDL_AudioSpec, and then static casting userdata in oscillator_callback to the associated C++ class.

Keyboard

The last piece of hardware to emulate is the keyboard/input, which turned out to be the most annoying part. This is because I couldn’t think of a nice way to unit test this functionality. I needed to fall back on a Test Rom Repository that I found in order to see if I was properly emulating the keyboard.

The actual implementation of the input pad was simple. We represent the 16 keys of the CHIP8 with the bits in a uint16_t integer. Periodically, we check what keys have been pressed since the last time, and set the appropriate bitfields of the keyboard integer.

Tying It All Together

From the user’s perspective, they only need to deal with the CHIP8 interpreter class. This class bundles all of the other emulated hardware together, enables communication between each of them, handles various time sensitive tasks, and utilizes SDL in order to make the proper syscalls to output video and audio properly. An end user just needs to utilize the run_eternalfunction.

This is where I ended up implementing all of the opcodes for the system (the most tedious part), as well as the fetch-decode-execute loop. While doing this, I ended up making a whole lot of unit tests to ensure that everything was functioning properly. The Test Rom Repository was also extremely helpful for this part!

Fetch-Decode-Execute

The fetch stage is the simplest part: you just read the instruction from the current program counter.

The decode stage is more involved: I implemented a simple nested switch statement decoder take took in the current machine code, and parsed the certain bitfields until the instruction could be identified. The return value of the decode stage was a simple function pointer to the appropriate opcode.

The execute stage is also relatively simple: first increment the program counter, run the decoded function, then do all the timing related events after the instruction gets executed. This includes:

  • Decrementing the sound and delay registers at a rate of 60 Hz. This means that for every $\frac{1}{60}$ seconds that have elapsed in real time, decrement the register by 1 if it’s not 0.
    • If the sound register reaches 0, then pause
  • If $\frac{1}{60}$ seconds have elapsed since the last time the keyboard was polled, the update the keyboard state
  • After all of that, if not enough time has passed for a single frame to elapse, use SDL_Delay wait until the start of the next frame.
Windows Update

Prior to every instruction, I check if a DRW command has occurred. If it has, I update the screen with the latest pixel data. In addition, I also do some check if the window has been requested to be closed via the SDL_QUIT event id, and exit if this happens

So… Now What?

GBA emulator. Enough said.