Files
nesemu/docs/superpowers/specs/2026-03-13-audio-output-design.md
2026-03-13 16:21:30 +03:00

9.8 KiB
Raw Blame History

Audio Output Design — Full 5-Channel Mixer + cpal Backend

Overview

Add real audio output to the desktop NES emulator client. This involves two independent pieces of work:

  1. Full APU mixer — replace the current DMC-only mixer with proper 5-channel mixing (Pulse 1, Pulse 2, Triangle, Noise, DMC) using NES hardware-accurate formulas.
  2. cpal audio backend — replace the stub AudioSink in the desktop client with a real audio output using cpal, connected via a lock-free ring buffer. Add a volume slider to the GTK4 header bar.

1. Full APU Mixer

Current State

AudioMixer::push_cycles() in src/runtime/audio.rs reads only apu_regs[0x11] (DMC output level) and generates a single-channel signal. All other channels are ignored.

Design

1.1 Channel Outputs Struct

Add to src/native_core/apu/:

#[derive(Debug, Clone, Copy, Default)]
pub struct ChannelOutputs {
    pub pulse1: u8,    // 015
    pub pulse2: u8,    // 015
    pub triangle: u8,  // 015
    pub noise: u8,     // 015
    pub dmc: u8,       // 0127
}

1.2 New APU Internal State

The current Apu struct lacks timer counters and sequencer state needed to compute channel outputs. The following fields must be added:

Pulse channels (×2):

  • pulse_timer_counter: [u16; 2] — countdown timer, clocked every other CPU cycle
  • pulse_duty_step: [u8; 2] — position in 8-step duty cycle sequence (07)

Triangle channel:

  • triangle_timer_counter: u16 — countdown timer, clocked every CPU cycle
  • triangle_step: u8 — position in 32-step triangle sequence (031)

Noise channel:

  • noise_timer_counter: u16 — countdown timer, clocked every other CPU cycle
  • noise_lfsr: u16 — 15-bit linear feedback shift register, initialized to 1

These must be clocked in Apu::clock_cpu_cycle():

  • Pulse and noise timers decrement every 2 CPU cycles (APU rate, tracked via existing cpu_cycle_parity)
  • Triangle timer decrements every 1 CPU cycle
  • When a timer reaches 0, it reloads from the period register and advances the corresponding sequencer

1.3 APU Method

Add Apu::channel_outputs(&self) -> ChannelOutputs that computes the current output level of each channel:

  • Pulse 1/2: Output is 0 if length counter is 0, or sweep mutes the channel, or duty cycle sequencer output is 0. Otherwise output is the envelope volume (015).
  • Triangle: Output is the value from the 32-step triangle waveform lookup at triangle_step. Muted (output 0) if length counter or linear counter is 0.
  • Noise: Output is 0 if length counter is 0 or LFSR bit 0 is 1. Otherwise output is the envelope volume (015).
  • DMC: Output is dmc_output_level (0127), already tracked.

1.4 Save-State Compatibility

Adding new fields to Apu changes the save-state binary format. The save_state_tail() and load_state_tail() methods must be updated to serialize/deserialize the new fields. This is a breaking change to the save-state format — old save states will not be compatible. Since the project is pre-1.0, this is acceptable without a migration strategy.

1.5 Bus Exposure

Add NativeBus::apu_channel_outputs(&self) -> ChannelOutputs to expose channel outputs alongside the existing apu_registers().

1.6 Mixer Update

Change AudioMixer::push_cycles() signature:

// Before:
pub fn push_cycles(&mut self, cpu_cycles: u8, apu_regs: &[u8; 0x20], out: &mut Vec<f32>)

// After:
pub fn push_cycles(&mut self, cpu_cycles: u8, channels: ChannelOutputs, out: &mut Vec<f32>)

Mixing formula (nesdev wiki linear approximation):

pulse_out = 0.00752 * (pulse1 + pulse2)
tnd_out   = 0.00851 * triangle + 0.00494 * noise + 0.00335 * dmc
output    = pulse_out + tnd_out

Output range is approximately [0.0, 1.0]. Normalize to [-1.0, 1.0] by: sample = output * 2.0 - 1.0.

Known simplifications:

  • This uses the linear approximation, not the more accurate nonlinear lookup tables from real NES hardware. Nonlinear mixing can be added later as an enhancement.
  • The current repeat_n resampling approach (nearest-neighbor) produces aliasing. A low-pass filter or bandlimited interpolation can be added later.
  • Real NES hardware applies two first-order high-pass filters (~90Hz and ~440Hz). Without these, channel enable/disable will cause audible pops. Deferred for a future iteration.

1.7 Runtime Integration

Update NesRuntime::run_until_frame_complete_with_audio() in src/runtime/core.rs to pass ChannelOutputs (from self.bus.apu_channel_outputs()) instead of the register slice to the mixer.

2. Lock-Free Ring Buffer

Location

New file: src/runtime/ring_buffer.rs.

Design

SPSC (single-producer, single-consumer) ring buffer using AtomicUsize for head/tail indices:

  • Capacity: 4096 f32 samples (~85ms at 48kHz) — enough to absorb frame timing jitter
  • Producer: emulation thread writes samples after each frame via push_samples()
  • Consumer: cpal audio callback reads samples via pop_samples()
  • Underrun (buffer empty): consumer outputs silence (0.0)
  • Overrun (buffer full): producer drops new samples (standard SPSC behavior — only the consumer moves the tail pointer)
pub struct RingBuffer {
    buffer: Box<[f32]>,
    capacity: usize,
    head: AtomicUsize, // write position (producer only)
    tail: AtomicUsize, // read position (consumer only)
}

impl RingBuffer {
    pub fn new(capacity: usize) -> Self;
    pub fn push(&self, samples: &[f32]) -> usize;  // returns samples actually written
    pub fn pop(&self, out: &mut [f32]) -> usize;    // returns samples actually read
    pub fn clear(&self);                             // reset both pointers (call when no concurrent access)
}

Thread safety: RingBuffer is Send + Sync. Shared via Arc<RingBuffer>.

3. Desktop cpal Audio Backend

Dependencies

Add to crates/nesemu-desktop/Cargo.toml:

cpal = "0.15"

CpalAudioSink

pub struct CpalAudioSink {
    _stream: cpal::Stream,        // keeps the audio stream alive
    ring: Arc<RingBuffer>,
    volume: Arc<AtomicU32>,       // f32 bits stored atomically
}
  • Implements nesemu::AudioOutputpush_samples() writes to ring buffer
  • Created when a ROM is loaded; the ring buffer is cleared on ROM change to prevent stale samples
  • cpal callback: reads from ring buffer, multiplies each sample by volume, writes to output buffer
  • On pause: emulation stops producing samples → callback outputs silence (underrun behavior)
  • On ROM change: old stream is dropped, ring buffer cleared, new stream created

Error Handling

If no audio device is available, or the requested format is unsupported, or the stream fails to build:

  • Log the error to stderr
  • Fall back to NullAudio behavior (discard samples silently)
  • The emulator continues to work without sound

The cpal error callback also logs errors to stderr without crashing.

Stream Configuration

  • Sample rate: 48,000 Hz
  • Channels: 1 (mono — NES is mono)
  • Sample format: f32
  • Buffer size: let cpal choose (typically 2561024 frames)

Volume

  • Arc<AtomicU32> shared between UI and cpal callback
  • Stored as f32::to_bits() / f32::from_bits()
  • Default: 0.75 (75%)
  • Applied in cpal callback: sample * volume

4. UI — Volume Slider

Widget

gtk::Scale (horizontal) added to the header bar:

  • Range: 0.0 to 1.0 (displayed as 0100%)
  • Default: 0.75
  • connect_value_changed → atomically update volume

Placement

In the header bar, after the existing control buttons (open, pause, reset), with a small speaker icon label.

5. Threading Model

  • GTK main thread: runs emulation via glib::timeout_add_local (~16ms tick), UI events, volume slider updates
  • cpal OS thread: audio callback reads from ring buffer — this is the only cross-thread boundary
  • The ring buffer (Arc<RingBuffer>) and volume (Arc<AtomicU32>) are the only shared state between threads

6. Data Flow

CPU instruction step (GTK main thread)
    → APU.clock_cpu_cycle()  [updates internal channel state]
    → AudioMixer.push_cycles(cycles, apu.channel_outputs())
        → mix 5 channels → f32 sample
        → append to frame audio buffer (Vec<f32>)

Per frame (GTK main thread):
    → FrameExecutor collects audio_buffer
    → CpalAudioSink.push_samples(audio_buffer)
        → write to Arc<RingBuffer>

cpal callback (separate OS thread):
    → read from Arc<RingBuffer>
    → multiply by volume (Arc<AtomicU32>)
    → write to hardware audio buffer

7. Files Changed

File Change
src/native_core/apu/types.rs Add ChannelOutputs struct, new timer/sequencer fields to Apu and ApuStateTail
src/native_core/apu/api.rs Add channel_outputs() method, update save_state_tail/load_state_tail
src/native_core/apu/timing.rs Clock new timer/sequencer fields in clock_cpu_cycle()
src/native_core/bus.rs Add apu_channel_outputs()
src/runtime/audio.rs Rewrite mixer with 5-channel formula
src/runtime/ring_buffer.rs (new) Lock-free SPSC ring buffer
src/runtime/core.rs Pass channel_outputs() to mixer in run_until_frame_complete_with_audio()
src/runtime/mod.rs Export ring_buffer, ChannelOutputs
crates/nesemu-desktop/Cargo.toml Add cpal dependency
crates/nesemu-desktop/src/main.rs Replace stub AudioSink with CpalAudioSink, add volume slider

8. Testing

  • Existing tests in tests/public_api.rs must continue to pass (they use NullAudio). Note: the regression hash test (public_api_regression_hashes_for_reference_rom) will produce a different audio hash due to the mixer change — the expected hash must be updated.
  • Unit test for ring buffer: push/pop, underrun, overrun, clear
  • Unit test for mixer: known channel outputs → expected sample values
  • Manual test: load a ROM, verify audible sound through speakers