Files

se.cherkasov 9068e78a62 docs: add audio output design spec and implementation plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 16:21:30 +03:00

9.8 KiB

Raw Blame History

Audio Output Design — Full 5-Channel Mixer + cpal Backend

Overview

Add real audio output to the desktop NES emulator client. This involves two independent pieces of work:

Full APU mixer — replace the current DMC-only mixer with proper 5-channel mixing (Pulse 1, Pulse 2, Triangle, Noise, DMC) using NES hardware-accurate formulas.
cpal audio backend — replace the stub AudioSink in the desktop client with a real audio output using cpal, connected via a lock-free ring buffer. Add a volume slider to the GTK4 header bar.

1. Full APU Mixer

Current State

AudioMixer::push_cycles() in src/runtime/audio.rs reads only apu_regs[0x11] (DMC output level) and generates a single-channel signal. All other channels are ignored.

Design

1.1 Channel Outputs Struct

Add to src/native_core/apu/:

#[derive(Debug, Clone, Copy, Default)]
pub struct ChannelOutputs {
    pub pulse1: u8,    // 0–15
    pub pulse2: u8,    // 0–15
    pub triangle: u8,  // 0–15
    pub noise: u8,     // 0–15
    pub dmc: u8,       // 0–127
}

1.2 New APU Internal State

The current Apu struct lacks timer counters and sequencer state needed to compute channel outputs. The following fields must be added:

Pulse channels (×2):

pulse_timer_counter: [u16; 2] — countdown timer, clocked every other CPU cycle
pulse_duty_step: [u8; 2] — position in 8-step duty cycle sequence (0–7)

Triangle channel:

triangle_timer_counter: u16 — countdown timer, clocked every CPU cycle
triangle_step: u8 — position in 32-step triangle sequence (0–31)

Noise channel:

noise_timer_counter: u16 — countdown timer, clocked every other CPU cycle
noise_lfsr: u16 — 15-bit linear feedback shift register, initialized to 1

These must be clocked in Apu::clock_cpu_cycle():

Pulse and noise timers decrement every 2 CPU cycles (APU rate, tracked via existing cpu_cycle_parity)
Triangle timer decrements every 1 CPU cycle
When a timer reaches 0, it reloads from the period register and advances the corresponding sequencer

1.3 APU Method

Add Apu::channel_outputs(&self) -> ChannelOutputs that computes the current output level of each channel:

Pulse 1/2: Output is 0 if length counter is 0, or sweep mutes the channel, or duty cycle sequencer output is 0. Otherwise output is the envelope volume (0–15).
Triangle: Output is the value from the 32-step triangle waveform lookup at triangle_step. Muted (output 0) if length counter or linear counter is 0.
Noise: Output is 0 if length counter is 0 or LFSR bit 0 is 1. Otherwise output is the envelope volume (0–15).
DMC: Output is dmc_output_level (0–127), already tracked.

1.4 Save-State Compatibility

Adding new fields to Apu changes the save-state binary format. The save_state_tail() and load_state_tail() methods must be updated to serialize/deserialize the new fields. This is a breaking change to the save-state format — old save states will not be compatible. Since the project is pre-1.0, this is acceptable without a migration strategy.

1.5 Bus Exposure

Add NativeBus::apu_channel_outputs(&self) -> ChannelOutputs to expose channel outputs alongside the existing apu_registers().

1.6 Mixer Update

Change AudioMixer::push_cycles() signature:

// Before:
pub fn push_cycles(&mut self, cpu_cycles: u8, apu_regs: &[u8; 0x20], out: &mut Vec<f32>)

// After:
pub fn push_cycles(&mut self, cpu_cycles: u8, channels: ChannelOutputs, out: &mut Vec<f32>)

Mixing formula (nesdev wiki linear approximation):

pulse_out = 0.00752 * (pulse1 + pulse2)
tnd_out   = 0.00851 * triangle + 0.00494 * noise + 0.00335 * dmc
output    = pulse_out + tnd_out

Output range is approximately [0.0, 1.0]. Normalize to [-1.0, 1.0] by: sample = output * 2.0 - 1.0.

Known simplifications:

This uses the linear approximation, not the more accurate nonlinear lookup tables from real NES hardware. Nonlinear mixing can be added later as an enhancement.
The current repeat_n resampling approach (nearest-neighbor) produces aliasing. A low-pass filter or bandlimited interpolation can be added later.
Real NES hardware applies two first-order high-pass filters (~90Hz and ~440Hz). Without these, channel enable/disable will cause audible pops. Deferred for a future iteration.

1.7 Runtime Integration

Update NesRuntime::run_until_frame_complete_with_audio() in src/runtime/core.rs to pass ChannelOutputs (from self.bus.apu_channel_outputs()) instead of the register slice to the mixer.

2. Lock-Free Ring Buffer

Location

New file: src/runtime/ring_buffer.rs.

Design

SPSC (single-producer, single-consumer) ring buffer using AtomicUsize for head/tail indices:

Capacity: 4096 f32 samples (~85ms at 48kHz) — enough to absorb frame timing jitter
Producer: emulation thread writes samples after each frame via push_samples()
Consumer: cpal audio callback reads samples via pop_samples()
Underrun (buffer empty): consumer outputs silence (0.0)
Overrun (buffer full): producer drops new samples (standard SPSC behavior — only the consumer moves the tail pointer)

pub struct RingBuffer {
    buffer: Box<[f32]>,
    capacity: usize,
    head: AtomicUsize, // write position (producer only)
    tail: AtomicUsize, // read position (consumer only)
}

impl RingBuffer {
    pub fn new(capacity: usize) -> Self;
    pub fn push(&self, samples: &[f32]) -> usize;  // returns samples actually written
    pub fn pop(&self, out: &mut [f32]) -> usize;    // returns samples actually read
    pub fn clear(&self);                             // reset both pointers (call when no concurrent access)
}

Thread safety: RingBuffer is Send + Sync. Shared via Arc<RingBuffer>.

3. Desktop cpal Audio Backend

Dependencies

Add to crates/nesemu-desktop/Cargo.toml:

cpal = "0.15"

CpalAudioSink

pub struct CpalAudioSink {
    _stream: cpal::Stream,        // keeps the audio stream alive
    ring: Arc<RingBuffer>,
    volume: Arc<AtomicU32>,       // f32 bits stored atomically
}

Implements nesemu::AudioOutput — push_samples() writes to ring buffer
Created when a ROM is loaded; the ring buffer is cleared on ROM change to prevent stale samples
cpal callback: reads from ring buffer, multiplies each sample by volume, writes to output buffer
On pause: emulation stops producing samples → callback outputs silence (underrun behavior)
On ROM change: old stream is dropped, ring buffer cleared, new stream created

Error Handling

If no audio device is available, or the requested format is unsupported, or the stream fails to build:

Log the error to stderr
Fall back to NullAudio behavior (discard samples silently)
The emulator continues to work without sound

The cpal error callback also logs errors to stderr without crashing.

Stream Configuration

Sample rate: 48,000 Hz
Channels: 1 (mono — NES is mono)
Sample format: f32
Buffer size: let cpal choose (typically 256–1024 frames)

Volume

Arc<AtomicU32> shared between UI and cpal callback
Stored as f32::to_bits() / f32::from_bits()
Default: 0.75 (75%)
Applied in cpal callback: sample * volume

4. UI — Volume Slider

gtk::Scale (horizontal) added to the header bar:

Range: 0.0 to 1.0 (displayed as 0–100%)
Default: 0.75
connect_value_changed → atomically update volume

Placement

In the header bar, after the existing control buttons (open, pause, reset), with a small speaker icon label.

5. Threading Model

GTK main thread: runs emulation via glib::timeout_add_local (~16ms tick), UI events, volume slider updates
cpal OS thread: audio callback reads from ring buffer — this is the only cross-thread boundary
The ring buffer (Arc<RingBuffer>) and volume (Arc<AtomicU32>) are the only shared state between threads

6. Data Flow

CPU instruction step (GTK main thread)
    → APU.clock_cpu_cycle()  [updates internal channel state]
    → AudioMixer.push_cycles(cycles, apu.channel_outputs())
        → mix 5 channels → f32 sample
        → append to frame audio buffer (Vec<f32>)

Per frame (GTK main thread):
    → FrameExecutor collects audio_buffer
    → CpalAudioSink.push_samples(audio_buffer)
        → write to Arc<RingBuffer>

cpal callback (separate OS thread):
    → read from Arc<RingBuffer>
    → multiply by volume (Arc<AtomicU32>)
    → write to hardware audio buffer

7. Files Changed

File	Change
`src/native_core/apu/types.rs`	Add `ChannelOutputs` struct, new timer/sequencer fields to `Apu` and `ApuStateTail`
`src/native_core/apu/api.rs`	Add `channel_outputs()` method, update `save_state_tail`/`load_state_tail`
`src/native_core/apu/timing.rs`	Clock new timer/sequencer fields in `clock_cpu_cycle()`
`src/native_core/bus.rs`	Add `apu_channel_outputs()`
`src/runtime/audio.rs`	Rewrite mixer with 5-channel formula
`src/runtime/ring_buffer.rs` (new)	Lock-free SPSC ring buffer
`src/runtime/core.rs`	Pass `channel_outputs()` to mixer in `run_until_frame_complete_with_audio()`
`src/runtime/mod.rs`	Export `ring_buffer`, `ChannelOutputs`
`crates/nesemu-desktop/Cargo.toml`	Add `cpal` dependency
`crates/nesemu-desktop/src/main.rs`	Replace stub AudioSink with CpalAudioSink, add volume slider

8. Testing

Existing tests in tests/public_api.rs must continue to pass (they use NullAudio). Note: the regression hash test (public_api_regression_hashes_for_reference_rom) will produce a different audio hash due to the mixer change — the expected hash must be updated.
Unit test for ring buffer: push/pop, underrun, overrun, clear
Unit test for mixer: known channel outputs → expected sample values
Manual test: load a ROM, verify audible sound through speakers

9.8 KiB Raw Blame History Unescape Escape

Audio Output Design — Full 5-Channel Mixer + cpal Backend

Overview

1. Full APU Mixer

Current State

Design

1.1 Channel Outputs Struct

1.2 New APU Internal State

1.3 APU Method

1.4 Save-State Compatibility

1.5 Bus Exposure

1.6 Mixer Update

1.7 Runtime Integration

2. Lock-Free Ring Buffer

Location

Design

3. Desktop cpal Audio Backend

Dependencies

CpalAudioSink

Error Handling

Stream Configuration

Volume

4. UI — Volume Slider

Widget

Placement

5. Threading Model

6. Data Flow

7. Files Changed

8. Testing

9.8 KiB

Raw Blame History