Files
nesemu/docs/superpowers/specs/2026-03-13-audio-output-design.md

246 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Audio Output Design — Full 5-Channel Mixer + cpal Backend
## Overview
Add real audio output to the desktop NES emulator client. This involves two independent pieces of work:
1. **Full APU mixer** — replace the current DMC-only mixer with proper 5-channel mixing (Pulse 1, Pulse 2, Triangle, Noise, DMC) using NES hardware-accurate formulas.
2. **cpal audio backend** — replace the stub `AudioSink` in the desktop client with a real audio output using `cpal`, connected via a lock-free ring buffer. Add a volume slider to the GTK4 header bar.
## 1. Full APU Mixer
### Current State
`AudioMixer::push_cycles()` in `src/runtime/audio.rs` reads only `apu_regs[0x11]` (DMC output level) and generates a single-channel signal. All other channels are ignored.
### Design
#### 1.1 Channel Outputs Struct
Add to `src/native_core/apu/`:
```rust
#[derive(Debug, Clone, Copy, Default)]
pub struct ChannelOutputs {
pub pulse1: u8, // 015
pub pulse2: u8, // 015
pub triangle: u8, // 015
pub noise: u8, // 015
pub dmc: u8, // 0127
}
```
#### 1.2 New APU Internal State
The current `Apu` struct lacks timer counters and sequencer state needed to compute channel outputs. The following fields must be added:
**Pulse channels (×2):**
- `pulse_timer_counter: [u16; 2]` — countdown timer, clocked every other CPU cycle
- `pulse_duty_step: [u8; 2]` — position in 8-step duty cycle sequence (07)
**Triangle channel:**
- `triangle_timer_counter: u16` — countdown timer, clocked every CPU cycle
- `triangle_step: u8` — position in 32-step triangle sequence (031)
**Noise channel:**
- `noise_timer_counter: u16` — countdown timer, clocked every other CPU cycle
- `noise_lfsr: u16` — 15-bit linear feedback shift register, initialized to 1
These must be clocked in `Apu::clock_cpu_cycle()`:
- Pulse and noise timers decrement every **2** CPU cycles (APU rate, tracked via existing `cpu_cycle_parity`)
- Triangle timer decrements every **1** CPU cycle
- When a timer reaches 0, it reloads from the period register and advances the corresponding sequencer
#### 1.3 APU Method
Add `Apu::channel_outputs(&self) -> ChannelOutputs` that computes the current output level of each channel:
- **Pulse 1/2:** Output is 0 if length counter is 0, or sweep mutes the channel, or duty cycle sequencer output is 0. Otherwise output is the envelope volume (015).
- **Triangle:** Output is the value from the 32-step triangle waveform lookup at `triangle_step`. Muted (output 0) if length counter or linear counter is 0.
- **Noise:** Output is 0 if length counter is 0 or LFSR bit 0 is 1. Otherwise output is the envelope volume (015).
- **DMC:** Output is `dmc_output_level` (0127), already tracked.
#### 1.4 Save-State Compatibility
Adding new fields to `Apu` changes the save-state binary format. The `save_state_tail()` and `load_state_tail()` methods must be updated to serialize/deserialize the new fields. This is a **breaking change** to the save-state format — old save states will not be compatible. Since the project is pre-1.0, this is acceptable without a migration strategy.
#### 1.5 Bus Exposure
Add `NativeBus::apu_channel_outputs(&self) -> ChannelOutputs` to expose channel outputs alongside the existing `apu_registers()`.
#### 1.6 Mixer Update
Change `AudioMixer::push_cycles()` signature:
```rust
// Before:
pub fn push_cycles(&mut self, cpu_cycles: u8, apu_regs: &[u8; 0x20], out: &mut Vec<f32>)
// After:
pub fn push_cycles(&mut self, cpu_cycles: u8, channels: ChannelOutputs, out: &mut Vec<f32>)
```
Mixing formula (nesdev wiki linear approximation):
```
pulse_out = 0.00752 * (pulse1 + pulse2)
tnd_out = 0.00851 * triangle + 0.00494 * noise + 0.00335 * dmc
output = pulse_out + tnd_out
```
Output range is approximately [0.0, 1.0]. Normalize to [-1.0, 1.0] by: `sample = output * 2.0 - 1.0`.
**Known simplifications:**
- This uses the linear approximation, not the more accurate nonlinear lookup tables from real NES hardware. Nonlinear mixing can be added later as an enhancement.
- The current `repeat_n` resampling approach (nearest-neighbor) produces aliasing. A low-pass filter or bandlimited interpolation can be added later.
- Real NES hardware applies two first-order high-pass filters (~90Hz and ~440Hz). Without these, channel enable/disable will cause audible pops. Deferred for a future iteration.
#### 1.7 Runtime Integration
Update `NesRuntime::run_until_frame_complete_with_audio()` in `src/runtime/core.rs` to pass `ChannelOutputs` (from `self.bus.apu_channel_outputs()`) instead of the register slice to the mixer.
## 2. Lock-Free Ring Buffer
### Location
New file: `src/runtime/ring_buffer.rs`.
### Design
SPSC (single-producer, single-consumer) ring buffer using `AtomicUsize` for head/tail indices:
- **Capacity:** 4096 f32 samples (~85ms at 48kHz) — enough to absorb frame timing jitter
- **Producer:** emulation thread writes samples after each frame via `push_samples()`
- **Consumer:** cpal audio callback reads samples via `pop_samples()`
- **Underrun (buffer empty):** consumer outputs silence (0.0)
- **Overrun (buffer full):** producer **drops new samples** (standard SPSC behavior — only the consumer moves the tail pointer)
```rust
pub struct RingBuffer {
buffer: Box<[f32]>,
capacity: usize,
head: AtomicUsize, // write position (producer only)
tail: AtomicUsize, // read position (consumer only)
}
impl RingBuffer {
pub fn new(capacity: usize) -> Self;
pub fn push(&self, samples: &[f32]) -> usize; // returns samples actually written
pub fn pop(&self, out: &mut [f32]) -> usize; // returns samples actually read
pub fn clear(&self); // reset both pointers (call when no concurrent access)
}
```
Thread safety: `RingBuffer` is `Send + Sync`. Shared via `Arc<RingBuffer>`.
## 3. Desktop cpal Audio Backend
### Dependencies
Add to `crates/nesemu-desktop/Cargo.toml`:
```toml
cpal = "0.15"
```
### CpalAudioSink
```rust
pub struct CpalAudioSink {
_stream: cpal::Stream, // keeps the audio stream alive
ring: Arc<RingBuffer>,
volume: Arc<AtomicU32>, // f32 bits stored atomically
}
```
- Implements `nesemu::AudioOutput``push_samples()` writes to ring buffer
- Created when a ROM is loaded; the ring buffer is cleared on ROM change to prevent stale samples
- cpal callback: reads from ring buffer, multiplies each sample by volume, writes to output buffer
- On pause: emulation stops producing samples → callback outputs silence (underrun behavior)
- On ROM change: old stream is dropped, ring buffer cleared, new stream created
### Error Handling
If no audio device is available, or the requested format is unsupported, or the stream fails to build:
- Log the error to stderr
- Fall back to `NullAudio` behavior (discard samples silently)
- The emulator continues to work without sound
The cpal error callback also logs errors to stderr without crashing.
### Stream Configuration
- Sample rate: 48,000 Hz
- Channels: 1 (mono — NES is mono)
- Sample format: f32
- Buffer size: let cpal choose (typically 2561024 frames)
### Volume
- `Arc<AtomicU32>` shared between UI and cpal callback
- Stored as `f32::to_bits()` / `f32::from_bits()`
- Default: 0.75 (75%)
- Applied in cpal callback: `sample * volume`
## 4. UI — Volume Slider
### Widget
`gtk::Scale` (horizontal) added to the header bar:
- Range: 0.0 to 1.0 (displayed as 0100%)
- Default: 0.75
- `connect_value_changed` → atomically update volume
### Placement
In the header bar, after the existing control buttons (open, pause, reset), with a small speaker icon label.
## 5. Threading Model
- **GTK main thread:** runs emulation via `glib::timeout_add_local` (~16ms tick), UI events, volume slider updates
- **cpal OS thread:** audio callback reads from ring buffer — this is the only cross-thread boundary
- The ring buffer (`Arc<RingBuffer>`) and volume (`Arc<AtomicU32>`) are the only shared state between threads
## 6. Data Flow
```
CPU instruction step (GTK main thread)
→ APU.clock_cpu_cycle() [updates internal channel state]
→ AudioMixer.push_cycles(cycles, apu.channel_outputs())
→ mix 5 channels → f32 sample
→ append to frame audio buffer (Vec<f32>)
Per frame (GTK main thread):
→ FrameExecutor collects audio_buffer
→ CpalAudioSink.push_samples(audio_buffer)
→ write to Arc<RingBuffer>
cpal callback (separate OS thread):
→ read from Arc<RingBuffer>
→ multiply by volume (Arc<AtomicU32>)
→ write to hardware audio buffer
```
## 7. Files Changed
| File | Change |
|------|--------|
| `src/native_core/apu/types.rs` | Add `ChannelOutputs` struct, new timer/sequencer fields to `Apu` and `ApuStateTail` |
| `src/native_core/apu/api.rs` | Add `channel_outputs()` method, update `save_state_tail`/`load_state_tail` |
| `src/native_core/apu/timing.rs` | Clock new timer/sequencer fields in `clock_cpu_cycle()` |
| `src/native_core/bus.rs` | Add `apu_channel_outputs()` |
| `src/runtime/audio.rs` | Rewrite mixer with 5-channel formula |
| `src/runtime/ring_buffer.rs` (new) | Lock-free SPSC ring buffer |
| `src/runtime/core.rs` | Pass `channel_outputs()` to mixer in `run_until_frame_complete_with_audio()` |
| `src/runtime/mod.rs` | Export `ring_buffer`, `ChannelOutputs` |
| `crates/nesemu-desktop/Cargo.toml` | Add `cpal` dependency |
| `crates/nesemu-desktop/src/main.rs` | Replace stub AudioSink with CpalAudioSink, add volume slider |
## 8. Testing
- Existing tests in `tests/public_api.rs` must continue to pass (they use NullAudio). **Note:** the regression hash test (`public_api_regression_hashes_for_reference_rom`) will produce a different audio hash due to the mixer change — the expected hash must be updated.
- Unit test for ring buffer: push/pop, underrun, overrun, clear
- Unit test for mixer: known channel outputs → expected sample values
- Manual test: load a ROM, verify audible sound through speakers