Post

Building a 60fps Game Boy Emulator for iPod Classic

Building a 60fps Game Boy Emulator for iPod Classic

Why Build a New Emulator?

In my previous post, I fixed 7 audio bugs in Rockbox’s built-in Game Boy emulator (rockboy). The audio was perfect after the fix, but there was a bigger problem: performance. Rockboy is based on gnuboy, a codebase from 2000, and it runs heavy Game Boy titles at roughly 5 fps on iPod Classic. That is not playable.

Rather than trying to optimize 10K+ lines of legacy C, I ported Peanut-GB, a modern single-header Game Boy emulator, to the Rockbox plugin API. Then I spent days profiling on real hardware and squeezing every cycle out of a 216 MHz ARM chip.

The result: 50-67 fps in native resolution. Playable Game Boy on an iPod.

The Hardware Constraints

The iPod Classic 7G is not a powerful device by modern standards:

  • CPU: ARM926EJ-S @ 216 MHz (boosted)
  • RAM: 64 MB SDRAM
  • IRAM: 128 KB total, 80 KB available to plugins
  • Cache: 16 KB I-cache, 16 KB D-cache
  • LCD: 320x240 RGB565, updated via DMA

Every optimization decision comes back to these numbers. There is no GPU. There is no NEON. The entire Game Boy frame has to be emulated, rendered, and pushed to the LCD in under 16.7 ms using a single ARM9 core.

The Optimizations That Mattered

I tried many things. Here is what actually moved the needle, roughly in order of impact.

1. Direct ROM Access

Peanut-GB uses function pointer callbacks for ROM reads. On a Game Boy, the CPU reads ROM on nearly every instruction. Each indirect call through a function pointer means: load pointer from memory, branch to it, execute, return. Thousands of times per frame.

I replaced the callbacks with direct array access:

1
2
3
4
5
6
7
8
#define PGB_DIRECT_ACCESS 1
static uint8_t *pgb_rom_ptr;

// In the read handler (before):
return gb->gb_rom_read(gb, addr);

// After:
return pgb_rom_ptr[addr];

This was the single biggest performance gain. The compiler can now inline these reads and the CPU can prefetch linearly.

2. IRAM Placement

The iPod has 128 KB of fast on-chip IRAM that bypasses the cache hierarchy entirely. Rockbox reserves 80 KB for plugins. I put the hottest code and data there:

  • Hot functions (19 KB): CPU step, memory read/write, draw line, APU update
  • gb struct (17 KB): The entire emulator state, accessed on every instruction
  • APU functions (1.5 KB): Audio channel updates

Total: ~38 KB of 80 KB budget. Every access to these is guaranteed single-cycle, no cache misses possible.

3. O2 on IRAM Functions Only

GCC -O2 produces larger code, which normally thrashes the 16 KB I-cache. But IRAM bypasses the cache entirely. So I applied -O2 only to IRAM-resident functions using GCC attributes:

1
#define ICODE_ATTR __attribute__((section(".icode"),optimize("O2")))

The bloated-but-fast code runs from IRAM with no cache penalty. Everything else stays at -Os to fit in cache.

4. Non-Blocking Audio

The original audio design busy-waited when the ring buffer was full. On a device this slow, stalling the emulation thread for even one yield() can cost an entire frame. The fix: if the buffer is full, drop the audio frame and keep emulating.

1
2
int next_wr = (ring_wr + 1) % RING_SLOTS;
if (next_wr == ring_rd) return;  // drop, don't block

The ear forgives a dropped audio frame far more easily than the eye forgives a dropped video frame.

5. Direct LCD Framebuffer Writes

In 1:1 mode (160x144 Game Boy screen centered on 320x240 LCD), I write pixels directly to the Rockbox LCD framebuffer instead of going through an intermediate buffer:

1
2
3
4
// Two pixels per 32-bit store
uint32_t c0 = palette[pixels[x] & 3];
uint32_t c1 = palette[pixels[x + 1] & 3];
dest32[x >> 1] = c0 | (c1 << 16);

This halves memory store instructions and eliminates a full-screen buffer copy.

The LCD DMA Wall

Scaled modes (Full and Fit) hit a hardware bottleneck that no amount of CPU optimization can fix. Pushing a full 320x240 RGB565 frame to the LCD requires transferring 153,600 bytes via DMA. At the bus speeds available, this takes roughly 12 ms, leaving only 4.7 ms for the entire Game Boy frame emulation.

I mitigated this with split-screen DMA: push the top half one frame, the bottom half the next. This halves per-frame DMA cost, getting scaled modes from ~20 fps to 31-40 fps. Still working on pushing this higher.

Features

Beyond raw performance, pgb has the features you would expect from a proper emulator:

  • 5-slot save states with timestamps, so you can keep multiple saves per game
  • Autosave modes: Off, On Exit, or Frequent (~60 second intervals)
  • 3 display modes: 1:1 (native), Fit (aspect-correct scaling), Full (stretched)
  • Volume control with efficient bit-shift implementation
  • FPS counter for performance monitoring
  • Clean menu accessible via the center button

What is Next: Game Boy Color

The current emulator only handles original Game Boy (DMG) games. The next phase is adding Game Boy Color support: dual VRAM banks, CGB color palettes, WRAM banking, double-speed CPU, and HDMA. The goal is one emulator that handles both .gb and .gbc files, auto-detecting the cartridge type from the ROM header.

This is a significant undertaking, roughly doubling the complexity of the emulator core, but the optimization techniques already proven on DMG should carry over.

Try It

The compiled plugin and full source are on GitHub:

Tyal13/rockbox-rockboy-audio-fix

Copy pgb-ipod6g.rock to .rockbox/rocks/viewers/pgb.rock on your iPod, open any .gb file, and play.


If this project is useful to you, consider sponsoring it on GitHub.

This post is licensed under CC BY 4.0 by the author.