docs: add audio output design spec and implementation plan
This commit is contained in:
1032
docs/superpowers/plans/2026-03-13-audio-output.md
Normal file
1032
docs/superpowers/plans/2026-03-13-audio-output.md
Normal file
File diff suppressed because it is too large
Load Diff
245
docs/superpowers/specs/2026-03-13-audio-output-design.md
Normal file
245
docs/superpowers/specs/2026-03-13-audio-output-design.md
Normal file
@@ -0,0 +1,245 @@
|
|||||||
|
# Audio Output Design — Full 5-Channel Mixer + cpal Backend
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Add real audio output to the desktop NES emulator client. This involves two independent pieces of work:
|
||||||
|
|
||||||
|
1. **Full APU mixer** — replace the current DMC-only mixer with proper 5-channel mixing (Pulse 1, Pulse 2, Triangle, Noise, DMC) using NES hardware-accurate formulas.
|
||||||
|
2. **cpal audio backend** — replace the stub `AudioSink` in the desktop client with a real audio output using `cpal`, connected via a lock-free ring buffer. Add a volume slider to the GTK4 header bar.
|
||||||
|
|
||||||
|
## 1. Full APU Mixer
|
||||||
|
|
||||||
|
### Current State
|
||||||
|
|
||||||
|
`AudioMixer::push_cycles()` in `src/runtime/audio.rs` reads only `apu_regs[0x11]` (DMC output level) and generates a single-channel signal. All other channels are ignored.
|
||||||
|
|
||||||
|
### Design
|
||||||
|
|
||||||
|
#### 1.1 Channel Outputs Struct
|
||||||
|
|
||||||
|
Add to `src/native_core/apu/`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Debug, Clone, Copy, Default)]
|
||||||
|
pub struct ChannelOutputs {
|
||||||
|
pub pulse1: u8, // 0–15
|
||||||
|
pub pulse2: u8, // 0–15
|
||||||
|
pub triangle: u8, // 0–15
|
||||||
|
pub noise: u8, // 0–15
|
||||||
|
pub dmc: u8, // 0–127
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 1.2 New APU Internal State
|
||||||
|
|
||||||
|
The current `Apu` struct lacks timer counters and sequencer state needed to compute channel outputs. The following fields must be added:
|
||||||
|
|
||||||
|
**Pulse channels (×2):**
|
||||||
|
- `pulse_timer_counter: [u16; 2]` — countdown timer, clocked every other CPU cycle
|
||||||
|
- `pulse_duty_step: [u8; 2]` — position in 8-step duty cycle sequence (0–7)
|
||||||
|
|
||||||
|
**Triangle channel:**
|
||||||
|
- `triangle_timer_counter: u16` — countdown timer, clocked every CPU cycle
|
||||||
|
- `triangle_step: u8` — position in 32-step triangle sequence (0–31)
|
||||||
|
|
||||||
|
**Noise channel:**
|
||||||
|
- `noise_timer_counter: u16` — countdown timer, clocked every other CPU cycle
|
||||||
|
- `noise_lfsr: u16` — 15-bit linear feedback shift register, initialized to 1
|
||||||
|
|
||||||
|
These must be clocked in `Apu::clock_cpu_cycle()`:
|
||||||
|
- Pulse and noise timers decrement every **2** CPU cycles (APU rate, tracked via existing `cpu_cycle_parity`)
|
||||||
|
- Triangle timer decrements every **1** CPU cycle
|
||||||
|
- When a timer reaches 0, it reloads from the period register and advances the corresponding sequencer
|
||||||
|
|
||||||
|
#### 1.3 APU Method
|
||||||
|
|
||||||
|
Add `Apu::channel_outputs(&self) -> ChannelOutputs` that computes the current output level of each channel:
|
||||||
|
|
||||||
|
- **Pulse 1/2:** Output is 0 if length counter is 0, or sweep mutes the channel, or duty cycle sequencer output is 0. Otherwise output is the envelope volume (0–15).
|
||||||
|
- **Triangle:** Output is the value from the 32-step triangle waveform lookup at `triangle_step`. Muted (output 0) if length counter or linear counter is 0.
|
||||||
|
- **Noise:** Output is 0 if length counter is 0 or LFSR bit 0 is 1. Otherwise output is the envelope volume (0–15).
|
||||||
|
- **DMC:** Output is `dmc_output_level` (0–127), already tracked.
|
||||||
|
|
||||||
|
#### 1.4 Save-State Compatibility
|
||||||
|
|
||||||
|
Adding new fields to `Apu` changes the save-state binary format. The `save_state_tail()` and `load_state_tail()` methods must be updated to serialize/deserialize the new fields. This is a **breaking change** to the save-state format — old save states will not be compatible. Since the project is pre-1.0, this is acceptable without a migration strategy.
|
||||||
|
|
||||||
|
#### 1.5 Bus Exposure
|
||||||
|
|
||||||
|
Add `NativeBus::apu_channel_outputs(&self) -> ChannelOutputs` to expose channel outputs alongside the existing `apu_registers()`.
|
||||||
|
|
||||||
|
#### 1.6 Mixer Update
|
||||||
|
|
||||||
|
Change `AudioMixer::push_cycles()` signature:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Before:
|
||||||
|
pub fn push_cycles(&mut self, cpu_cycles: u8, apu_regs: &[u8; 0x20], out: &mut Vec<f32>)
|
||||||
|
|
||||||
|
// After:
|
||||||
|
pub fn push_cycles(&mut self, cpu_cycles: u8, channels: ChannelOutputs, out: &mut Vec<f32>)
|
||||||
|
```
|
||||||
|
|
||||||
|
Mixing formula (nesdev wiki linear approximation):
|
||||||
|
|
||||||
|
```
|
||||||
|
pulse_out = 0.00752 * (pulse1 + pulse2)
|
||||||
|
tnd_out = 0.00851 * triangle + 0.00494 * noise + 0.00335 * dmc
|
||||||
|
output = pulse_out + tnd_out
|
||||||
|
```
|
||||||
|
|
||||||
|
Output range is approximately [0.0, 1.0]. Normalize to [-1.0, 1.0] by: `sample = output * 2.0 - 1.0`.
|
||||||
|
|
||||||
|
**Known simplifications:**
|
||||||
|
- This uses the linear approximation, not the more accurate nonlinear lookup tables from real NES hardware. Nonlinear mixing can be added later as an enhancement.
|
||||||
|
- The current `repeat_n` resampling approach (nearest-neighbor) produces aliasing. A low-pass filter or bandlimited interpolation can be added later.
|
||||||
|
- Real NES hardware applies two first-order high-pass filters (~90Hz and ~440Hz). Without these, channel enable/disable will cause audible pops. Deferred for a future iteration.
|
||||||
|
|
||||||
|
#### 1.7 Runtime Integration
|
||||||
|
|
||||||
|
Update `NesRuntime::run_until_frame_complete_with_audio()` in `src/runtime/core.rs` to pass `ChannelOutputs` (from `self.bus.apu_channel_outputs()`) instead of the register slice to the mixer.
|
||||||
|
|
||||||
|
## 2. Lock-Free Ring Buffer
|
||||||
|
|
||||||
|
### Location
|
||||||
|
|
||||||
|
New file: `src/runtime/ring_buffer.rs`.
|
||||||
|
|
||||||
|
### Design
|
||||||
|
|
||||||
|
SPSC (single-producer, single-consumer) ring buffer using `AtomicUsize` for head/tail indices:
|
||||||
|
|
||||||
|
- **Capacity:** 4096 f32 samples (~85ms at 48kHz) — enough to absorb frame timing jitter
|
||||||
|
- **Producer:** emulation thread writes samples after each frame via `push_samples()`
|
||||||
|
- **Consumer:** cpal audio callback reads samples via `pop_samples()`
|
||||||
|
- **Underrun (buffer empty):** consumer outputs silence (0.0)
|
||||||
|
- **Overrun (buffer full):** producer **drops new samples** (standard SPSC behavior — only the consumer moves the tail pointer)
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct RingBuffer {
|
||||||
|
buffer: Box<[f32]>,
|
||||||
|
capacity: usize,
|
||||||
|
head: AtomicUsize, // write position (producer only)
|
||||||
|
tail: AtomicUsize, // read position (consumer only)
|
||||||
|
}
|
||||||
|
|
||||||
|
impl RingBuffer {
|
||||||
|
pub fn new(capacity: usize) -> Self;
|
||||||
|
pub fn push(&self, samples: &[f32]) -> usize; // returns samples actually written
|
||||||
|
pub fn pop(&self, out: &mut [f32]) -> usize; // returns samples actually read
|
||||||
|
pub fn clear(&self); // reset both pointers (call when no concurrent access)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Thread safety: `RingBuffer` is `Send + Sync`. Shared via `Arc<RingBuffer>`.
|
||||||
|
|
||||||
|
## 3. Desktop cpal Audio Backend
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
Add to `crates/nesemu-desktop/Cargo.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
cpal = "0.15"
|
||||||
|
```
|
||||||
|
|
||||||
|
### CpalAudioSink
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct CpalAudioSink {
|
||||||
|
_stream: cpal::Stream, // keeps the audio stream alive
|
||||||
|
ring: Arc<RingBuffer>,
|
||||||
|
volume: Arc<AtomicU32>, // f32 bits stored atomically
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- Implements `nesemu::AudioOutput` — `push_samples()` writes to ring buffer
|
||||||
|
- Created when a ROM is loaded; the ring buffer is cleared on ROM change to prevent stale samples
|
||||||
|
- cpal callback: reads from ring buffer, multiplies each sample by volume, writes to output buffer
|
||||||
|
- On pause: emulation stops producing samples → callback outputs silence (underrun behavior)
|
||||||
|
- On ROM change: old stream is dropped, ring buffer cleared, new stream created
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
If no audio device is available, or the requested format is unsupported, or the stream fails to build:
|
||||||
|
- Log the error to stderr
|
||||||
|
- Fall back to `NullAudio` behavior (discard samples silently)
|
||||||
|
- The emulator continues to work without sound
|
||||||
|
|
||||||
|
The cpal error callback also logs errors to stderr without crashing.
|
||||||
|
|
||||||
|
### Stream Configuration
|
||||||
|
|
||||||
|
- Sample rate: 48,000 Hz
|
||||||
|
- Channels: 1 (mono — NES is mono)
|
||||||
|
- Sample format: f32
|
||||||
|
- Buffer size: let cpal choose (typically 256–1024 frames)
|
||||||
|
|
||||||
|
### Volume
|
||||||
|
|
||||||
|
- `Arc<AtomicU32>` shared between UI and cpal callback
|
||||||
|
- Stored as `f32::to_bits()` / `f32::from_bits()`
|
||||||
|
- Default: 0.75 (75%)
|
||||||
|
- Applied in cpal callback: `sample * volume`
|
||||||
|
|
||||||
|
## 4. UI — Volume Slider
|
||||||
|
|
||||||
|
### Widget
|
||||||
|
|
||||||
|
`gtk::Scale` (horizontal) added to the header bar:
|
||||||
|
|
||||||
|
- Range: 0.0 to 1.0 (displayed as 0–100%)
|
||||||
|
- Default: 0.75
|
||||||
|
- `connect_value_changed` → atomically update volume
|
||||||
|
|
||||||
|
### Placement
|
||||||
|
|
||||||
|
In the header bar, after the existing control buttons (open, pause, reset), with a small speaker icon label.
|
||||||
|
|
||||||
|
## 5. Threading Model
|
||||||
|
|
||||||
|
- **GTK main thread:** runs emulation via `glib::timeout_add_local` (~16ms tick), UI events, volume slider updates
|
||||||
|
- **cpal OS thread:** audio callback reads from ring buffer — this is the only cross-thread boundary
|
||||||
|
- The ring buffer (`Arc<RingBuffer>`) and volume (`Arc<AtomicU32>`) are the only shared state between threads
|
||||||
|
|
||||||
|
## 6. Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
CPU instruction step (GTK main thread)
|
||||||
|
→ APU.clock_cpu_cycle() [updates internal channel state]
|
||||||
|
→ AudioMixer.push_cycles(cycles, apu.channel_outputs())
|
||||||
|
→ mix 5 channels → f32 sample
|
||||||
|
→ append to frame audio buffer (Vec<f32>)
|
||||||
|
|
||||||
|
Per frame (GTK main thread):
|
||||||
|
→ FrameExecutor collects audio_buffer
|
||||||
|
→ CpalAudioSink.push_samples(audio_buffer)
|
||||||
|
→ write to Arc<RingBuffer>
|
||||||
|
|
||||||
|
cpal callback (separate OS thread):
|
||||||
|
→ read from Arc<RingBuffer>
|
||||||
|
→ multiply by volume (Arc<AtomicU32>)
|
||||||
|
→ write to hardware audio buffer
|
||||||
|
```
|
||||||
|
|
||||||
|
## 7. Files Changed
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `src/native_core/apu/types.rs` | Add `ChannelOutputs` struct, new timer/sequencer fields to `Apu` and `ApuStateTail` |
|
||||||
|
| `src/native_core/apu/api.rs` | Add `channel_outputs()` method, update `save_state_tail`/`load_state_tail` |
|
||||||
|
| `src/native_core/apu/timing.rs` | Clock new timer/sequencer fields in `clock_cpu_cycle()` |
|
||||||
|
| `src/native_core/bus.rs` | Add `apu_channel_outputs()` |
|
||||||
|
| `src/runtime/audio.rs` | Rewrite mixer with 5-channel formula |
|
||||||
|
| `src/runtime/ring_buffer.rs` (new) | Lock-free SPSC ring buffer |
|
||||||
|
| `src/runtime/core.rs` | Pass `channel_outputs()` to mixer in `run_until_frame_complete_with_audio()` |
|
||||||
|
| `src/runtime/mod.rs` | Export `ring_buffer`, `ChannelOutputs` |
|
||||||
|
| `crates/nesemu-desktop/Cargo.toml` | Add `cpal` dependency |
|
||||||
|
| `crates/nesemu-desktop/src/main.rs` | Replace stub AudioSink with CpalAudioSink, add volume slider |
|
||||||
|
|
||||||
|
## 8. Testing
|
||||||
|
|
||||||
|
- Existing tests in `tests/public_api.rs` must continue to pass (they use NullAudio). **Note:** the regression hash test (`public_api_regression_hashes_for_reference_rom`) will produce a different audio hash due to the mixer change — the expected hash must be updated.
|
||||||
|
- Unit test for ring buffer: push/pop, underrun, overrun, clear
|
||||||
|
- Unit test for mixer: known channel outputs → expected sample values
|
||||||
|
- Manual test: load a ROM, verify audible sound through speakers
|
||||||
Reference in New Issue
Block a user