🪗 FEASIBILITY DOC — NO CODE YET — RETURN NEXT SESSION FOR THE BUILD 🪗 ANDY'S IDEA, ALEX'S BUILD, DLz'S PROBLEM 🪗

🪗 Accordion-Tab Forge

working title — name TBD next session
MIDI in. Diatonic-button-accordion sheet music + tablature out.

STATUS: feasibility research only · 2026-05-04

🎯 The pitch in one breath

Feed in MIDI, get out playable charts for the diatonic button accordion. Three output tiers, user picks the friction level: universal standard notation (transfers across instruments), harmony tab (treble standard + left-hand button mapping — the original insight), and full tab (both hands as button + bellows). Andy's framing: low-friction options matter; tab is "guitar hero notation," fast on-ramp but locks you in. Standard notation is harder up-front but transfers anywhere.

🪗 Forge ALPHA

Drop a MIDI (one of the pre-baked stems from the research folder, or your own). Get a preview, a staff render, and a button-tab placeholder. Audio ingestion + LLM mapping land later — for now this exercises the render half of the pipeline.

📥 Drop a .mid / .midi file here, or click to pick one

Tuning

Output tier

Universal Harmony tab Full tab

🤖 LLM mapping (DeepSeek) — set your API key

Paste your DeepSeek API key (sk-...) to enable Stage 6 mapping. Stored in localStorage only — never committed to the repo, never sent anywhere except api.deepseek.com.

Want production-grade? Replace this with a Cloudflare Worker holding the key — page-side input is the dev shortcut.

▶ MIDI PREVIEW (SF2)

📜 Staff render (VexFlow, first 32 notes)

🎛️ Tab strip (button mapping placeholder — needs Phase 1 dataset)

Stub button-map dataset (v0.1) loaded. Drop a MIDI + set your DeepSeek key + click to round-trip.

🎚️ Genre scope — narrow, on purpose

In scope: norteño · cumbia · conjunto · Tex-Mex · vallenato — diatonic-button-accordion-led repertoire from the Mexican / Latin-American tradition. Andy's actual playing world.

Out of scope: everything else. Rock with accordion (System of a Down, Modest Mouse, Arcade Fire) — won't work, accordion isn't the lead. Zydeco / Cajun — different tradition, different button conventions. Polish / Italian / Eastern European folk polka — different repertoire. Generic Western pop — accordion isn't there. Trying to be a universal MIDI-to-tab tool dilutes the model and the dataset; staying narrow is the value proposition.

Tuning the basic-pitch params, picking the source-separation model, building the button-map dataset, choosing the quantization grid — every decision optimizes for norteño / cumbia. Accordion-led, ~120-150 BPM, polka-ish or cumbia-ish rhythmic feel, GCF tuning as the canonical instrument.

🎛️ Pedal chain — chain stages, swap models

Each stage of the pipeline is a swappable pedal. The chain order is the build sequence; individual pedals get tuned, swapped, or A/B-tested without touching the rest. This isn't an architectural diagram — it's a working metaphor for how the project gets built.

[Source audio]
  → [2: Normalize → 22kHz mono WAV]
  → [3: Source-sep htdemucs → "other" stem]                    audio → audio
  ── (optional) [3.5: UVR Mel-Band Roformer cleanup pass]      audio → audio
  → [4: Transcribe basic-pitch → raw MIDI]                     audio → MIDI
  ── (optional) [4.5: Quantize to detected beat grid]          MIDI → MIDI
  → [5: Configure (tuning + tier + hand assignment)]
  → [6: Map (LLM + button-map dataset)]
  → [7: Render (VexFlow staff + tab strip SVG)]
  → [8: Export (PDF / MusicXML / ABC)]

Stages 3.5 (UVR audio cleanup, before transcribe) and 4.5 (MIDI quantize, after transcribe) are optional pedals — turn on / off per song. Default ON for norteño / cumbia where they help; default OFF for clean studio recordings where they may hurt. The decimal label tells you which domain the pedal operates in: 3.5 is in the audio domain (htdemucs and basic-pitch sandwich it), 4.5 is in the MIDI domain (basic-pitch produces it, configure consumes it).

🎵 Reference instrument

Hohner Panther in GCF (Andy's, confirmed 2026-05-04). 3-row diatonic button accordion, the conjunto / norteño workhorse. 31 treble buttons across 3 rows, 12 bass buttons. Bisonoric — each button plays a different pitch on push vs pull, both sides. Mexicans call this configuration "Sol" because G/Sol is the dominant outer row. Hohner sells it as the entry-level standard; in Andy's words, "the best beginner to intermediate accordion." If we serve Andy, we serve the modal user.

📱 The convo that started it (two threads, both 2026-05-04)

Andy texted from a stalled Google AI Mode session — he was looking for sheet music for Gallo de Pelea (commonly transcribed in Mi/E). Google's AI Mode kicked him to YouTube tutorials. No usable sheet music returned.

Thread 1 — IP / copyright (resolved early)

Andy: Apparently AI can't make sheet music for diatonic accordion music.
Alex: Claude will make it, it has tools. The problem is it won't violate IP. Deepseek doesn't care about IP, you could get the raw MIDI transcript from that and then ask Claude to turn it into PDF sheet music with a skill it may have.
Andy: Copyright is hard… It could help you convert a MIDI to sheet music if you find the raw music.
Alex: I bet MIDI to sheet music pipelines exist already, it's probably easy. But here's the cool new thing — instead of giving you 3 notes of bass-clef harmony, it can just tell you which left-hand harmony button to press.

Thread 2 — output format (the spec turn, a few hours later)

Andy: Oh yeah I got the default Mexican accordion that's known as the best beginner to intermediate accordion.
Alex: Ok so I assumed it was a keyboard but it's a dang honeycomb. So you need tablature?
Andy: Diatonic is what they call it. Not sure what you mean by tablature.
Alex: I think at least for harmony you may wanna learn how to read normal music notation for melody. Tablature is just guitar hero notation.
Andy: Normal music notation sounds good. And universal.
Alex: Ok I may go with a few options, we will start with universal and from there I'll make two tablature options. One for harmony and one full tab. … Idk what's best, I just gravitate to low friction options.
Andy: Another problem I'm facing is like a ton of tutorial vids are in Spanish.
Alex: That's annoying yeah bro me too.

Two product decisions emerged. First, output isn't tab-only — it's tiered, user picks the friction level. Second, Spanish-language tutorial dominance is the real underlying barrier for English-dominant self-learners; written notation outputs are language-neutral, that's a feature.

🪕 Tab is measureless on purpose — that's a feature

Listener finding 2026-05-05 while auditioning the Forge alpha: "musicians ever say fuck it to measures?" Yes, they do — and it's the right move for tab tiers. Guitar tab, accordion tab, blues / folk lead-sheet traditions all routinely drop measures and time signatures, presenting the music as a sequence of presses with phrase breaks for orientation. Pickup notes / anacrusis require no special handling because there's no downbeat for them to fall before.

This is also what makes tab forgiving of messy transcription. basic-pitch's per-onset jitter and demucs's stem bleed introduce timing imperfections that look bad on a measured staff but are invisible in tab — the tab reader cares about button order, not which 16th-of-which-beat. Our pipeline (basic-pitch → quantize → tab) plays well into this: even when the transcription is rough, the tab output reads cleanly because it doesn't claim measure-perfection.

Locked decision: tab tiers (Harmony, Full) render measureless with a phrase break every 16 notes for visual orientation. Universal tier (standard staff) keeps measures and time signatures — that's where formal notation rules apply. Pickups in the Universal tier still need special handling (anacrusis bar shorter than the time signature implies); pickups in tab tiers do not.

🎚️ The three output tiers

User picks per song or per session. The system always computes the underlying button mapping; the rendering layer chooses what to show.

Tier	Output	Audience	Build
Universal	Standard treble + bass clef notation, no tab	Players who want notation transferable across instruments. Andy's stated preference. Highest learning curve, highest long-term reward.	Easy — existing tools (MuseScore, VexFlow) already do this. Ship first.
Harmony tab	Treble standard + left-hand bass clef mapped to one of 12 button presses	Players who can read treble but get stuck decoding bass-clef chord stacks into button presses. The original Andy insight.	Medium — LLM mapping work, but only for one hand. The actual differentiator.
Full tab	Both hands as button + bellows tab, "guitar hero" style	Lowest-friction first-play. Locks the player into one instrument. Power-user / fastest on-ramp.	Hard — bellows-direction planning across phrases is the hardest sub-problem. Ship last.

🪗 Why this is non-trivial (and why LLMs help)

A diatonic button accordion is not a piano. Each button plays:

One note when the bellows push (closed).
A different note when the bellows pull (open).

So a given pitch may be reachable on row-2 button-3 push and also on row-1 button-5 pull. The player has to plan bellows direction across whole phrases — you cannot reverse the bellows on every sixteenth note. This is a constraint-satisfaction problem keyed on the instrument's tuning, the available buttons, and the bellows-continuity rule.

LLMs are good at this kind of constrained-mapping work when you give them the button-layout table as fixed reference. The dataset is the moat; the LLM is the thin wrapper.

Hohner Panther GCF — 3-row treble side (illustrative) ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐ │..│..│..│..│..│..│..│..│..│..│..│ inner row (F) ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ │..│..│..│..│..│..│..│..│..│..│..│ middle row (C) ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ │..│..│..│..│..│..│..│..│..│..│..│ outer row (G / Sol) └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘ Each cell holds two pitches: push / pull. Real layout gets nailed down in the Phase 1 GCF button-map dataset.

📜 Prior art — what's already solved (don't rebuild)

MIDI parsing — trivial. @tonejs/midi in browser, mido in Python.
Standard sheet music rendering — VexFlow, abcjs, Verovio (web), MuseScore / LilyPond (desktop).
MIDI → standard sheet music (no tab) — MuseScore 4 does this for free, decently.

🎯 Prior art — what isn't solved (this is the gap)

MIDI → diatonic-button-accordion harmony or full tab, keyed to the player's tuning, with bellows-direction planning. (Universal-tier output is solved by existing tools — that's why we ship it first as v0.1, easy proof-of-loop.)
The closest existing tools are manual-entry (Tap-N-Tab) or community plugins (a basic MuseScore DBA plugin) that don't plan bellows continuity.
"First of its kind" claim is plausible for the harmony and full-tab tiers — the big notation companies have not chased this audience because the Western-art-music market doesn't include conjunto / norteño / vallenato. The accordion world is huge in Latin America, Eastern Europe, and the diaspora.

🌎 Audience & the Spanish-language angle

Primary persona: beginner-to-intermediate diatonic-button-accordion players, particularly self-taught conjunto / norteño / Tex-Mex players in the U.S. and diaspora. The Hohner Panther in GCF is the canonical entry-level instrument; if we serve Andy, we serve the modal user.

The Spanish-tutorial barrier. Andy's other observation in the convo: "a ton of tutorial vids are in Spanish." This is a real friction point for English-dominant self-learners. Written notation outputs are language-neutral — that's a feature, not a bug. A Spanish UI mode is a low-effort stretch goal that doubles the addressable audience in the other direction (Spanish-dominant players also lack a high-quality MIDI-to-notation tool tuned to their repertoire).

⚖️ The IP posture (revised — audio-in IS in scope)

Original spec treated audio-in as Phase-N+ with legal review. Then Spotify's basic-pitch surfaced — a free, open-source audio-to-MIDI transcriber that runs entirely in the browser via TensorFlow.js. Audio never leaves the user's machine. Spotify (a major label-adjacent company) shipped it openly and didn't get sued. The "audio-in is risky" framing was wrong. Same legal posture as a DAW that imports audio: user supplies the source, tool transforms.

Real-world MIDI hunt experience confirmed why this matters: Karaoplay's La Chona MIDI had an audible flat note, MuseScore charged Pro for export, free archives are sparse for niche repertoire. Asking users to find a clean MIDI is asking them to do work that doesn't scale. Audio is what they actually have.

Audio in — accepted, runs through basic-pitch client-side. Same legal posture as MuseScore.
MIDI in — also accepted, for users who already have clean MIDI from a DAW, MuseScore, or a karaoke vendor.
YouTube ripping — not on our domain. That's the actual legal hot zone (YouTube ToS). We link out: Cobalt for casual one-click rips, yt-dlp for power users. The audio comes back into our tool client-side.

🔧 Pipeline — eight stages, staged on this page

The build strategy: each stage gets CLI-tested first, then ported to the page once it works. Statuses below are live as of 2026-05-04.

🚧 Placeholder 🔬 CLI-tested 🚀 Wired-up ✅ Live

Get audio 🔬 CLI-tested

User-supplied audio. Source from anywhere: YouTube via Cobalt (one-click) or yt-dlp (power users). Spotify-Premium download. Personal recording. Owned MP3.

Why off-tool: YouTube ToS forbids ripping on our domain. Cobalt + yt-dlp let other people own that legal risk while our tool stays clean.

yt-dlp -f 18 \
  --extractor-args "youtube:player_client=android,ios,tv_embedded" \
  -o "in.%(ext)s" "<youtube-url>"

Future on page: file drop-zone (mp3/mp4/m4a/wav/ogg) + Cobalt deep-link button + yt-dlp instructions snippet.

Normalize audio 🔬 CLI-tested

Convert whatever format we got (mp4 from YouTube, m4a from Spotify, etc.) into a known-good WAV: 22.05 kHz, mono, 16-bit PCM. This is what basic-pitch expects natively, and it cuts file size for the client.

ffmpeg -y -i in.mp4 -vn -acodec pcm_s16le -ar 22050 -ac 1 out.wav

Future on page: bundled imageio-ffmpeg binary (Python) on the worker side, or ffmpeg.wasm client-side. Browser-side is preferred (audio stays local).

Source separation 🔬 CLI-tested

This is the differentiator. Almost any YouTube video should work because we isolate the accordion stem from the full mix before transcribing. Without this stage, basic-pitch returns chord-salad MIDI for full-band recordings (vocals + drums + bass + accordion all mashed together — confirmed against EZ Band La Chona, output was unusable).

Tool: Demucs (Meta, open source). Runs the htdemucs model: separates a mix into vocals · drums · bass · other. The "other" stem is mostly accordion + bajo sexto for conjunto recordings. Feed THAT to basic-pitch.

Don't over-clean — rhythm scaffold is a feature. Listener finding 2026-05-04: htdemucs "other" stem includes bajo sexto + rhythm guitar alongside accordion, and that's useful. A solo player practicing along to the chart needs harmonic context — they're not going to play the accordion melody against silence. The MIDI captures the rhythm-string parts as a backing scaffold, the player adapts and takes the voice they're filling. Goal of source separation is "minus vocals + drums" (so the player can hear themselves over the practice track), NOT "isolate accordion lead in pristine isolation." Stage 3.5 (UVR cleanup) should be tuned with the same restraint — strip residual vocals and percussion, keep the rhythm-string scaffolding.

python -m demucs --two-stems=other -o stems/ in.wav
# → stems/htdemucs/in/other.wav (the accordion-ish stem)

CLI-tested 2026-05-04 against EZ Band La Chona: 5:56 runtime on the 3:23 audio. Isolated stem fed back through basic-pitch produced 1114 note-on events vs 1743 from the full mix (36% fewer events) with the pitch floor up from MIDI 28→40 (drums and bass no longer captured). First listener verdict: melody drifted on repeated phrases. Currently exploring stricter basic-pitch params (Stage 4) and the heavier htdemucs_ft model (4-stem, fine-tuned, ~4× slower) before considering alternatives.

Source-separation landscape (verified 2026-05-04):

Project	Status (2024-26)	Browser	License	Pick
Demucs / htdemucs	upstream archived Jan 2025; fork at adefossez/demucs live	yes — Mixxx GSoC 2025 ONNX export shipping	MIT (code + weights)	✅ default
UVR + MDX-Net / Mel-Band Roformer	very active	no (Python only)	mixed; many weights non-commercial	⚠️ great offline, license-audit per model
BS-Roformer / Mel-Band Roformer	active monthly weight drops	not verified in browser	MIT code / NC weights	⚠️ best SDR but no clean commercial+browser path yet
Spleeter	dormant (dep bumps only, no new models since 2019)	partial, unofficial	MIT	❌ outdated
Open-Unmix	low activity	stale TF.js	MIT code / NC for high-perf weights	❌ outdated
LALAL.AI / Moises / AudioShake	active commercial	server-side only	proprietary, per-stem billing	⚠️ fallback if client-side stalls

Community-report-only (not formally benchmarked): two-stage pipeline of htdemucs → UVR Mel-Band Roformer "instrumental" model on the "other" stem reportedly preserves reed/wind timbres better for non-Western instruments. May test as Stage 3.5 if htdemucs_ft alone falls short.

Future on page: Mixxx GSoC 2025 produced a clean ONNX export of htdemucs (October 2025) — that's the browser upgrade path. Run via onnxruntime-web with WebGPU/WASM. For now: server-side via Cloudflare Pages Function or Render. Demucs CPU-bound, RAM-heavy on long files.

3.5

UVR Mel-Band Roformer cleanup pass (optional audio-domain pedal) 🚧 Placeholder

Audio-to-audio cleanup. Community two-stage trick: feed the htdemucs "other" stem into a UVR Mel-Band Roformer instrumental model for a second separation pass. Reportedly preserves reed/wind timbres better than htdemucs alone — accordion lead lines come out sharper, bajo sexto bleed gets reduced.

Status placeholder until we test against EZ Band La Chona. The licensing path needs auditing — many UVR community-trained weights are CC-BY-NC, not commercial. If the quality bump is real, may pay for that constraint with separate hosting.

python -m audio_separator stems/htdemucs/.../other.wav \
  --model_filename "Kim_MelBandRoformer_FT.ckpt" \
  --output_dir stems-roformer/

Future on page: off-by-default toggle. Add to chain only when norteño / cumbia material has heavy bajo sexto bleed in the htdemucs stem.

Transcribe to MIDI 🔬 CLI-tested

Run basic-pitch (Spotify, open source) on the isolated accordion stem. Polyphonic note detection, runs on TensorFlow.js or ONNX runtime. Audio never leaves the machine.

CLI tested against EZ Band La Chona full mix this session: 14.4KB output, 1743 note-on events over 201 seconds, MIDI 28-89 pitch range. The technical pipeline works; quality was poor because we hadn't separated stems yet (see Stage 3).

python -c "
from basic_pitch.inference import predict_and_save
import basic_pitch, os
onnx = os.path.join(os.path.dirname(basic_pitch.__file__),
    'saved_models', 'icassp_2022', 'nmp.onnx')
predict_and_save(['stems/htdemucs/in/other.wav'], '.', True, False, False, False, onnx)
"

Tunable inference params matter on accordion: onset_threshold, frame_threshold, minimum_note_length, multiple_pitch_bends=False, min/max_frequency. First test with default (loose) params produced melody drift on repeated phrases — the model picks up bellows-pressure micro-pitch shifts and reed-chorus harmonics, transcribing identical phrases differently each repeat. Stricter thresholds (0.7-0.85 onset, 0.5-0.6 frame, 200ms min length, 130-1500 Hz band) drop note count 40-90% but cure the drift. Optimal params per repertoire is an open tuning question.

Alt to explore: Melodia (MTG / Essentia) — monophonic melody-tracking algorithm, decades old but still robust for single-instrument lead lines. If basic-pitch's polyphonic detection keeps producing drift on accordion melody, Melodia on a vocal-stem-isolated track would be a cleaner fallback for the right-hand melody specifically (left-hand harmony still needs polyphonic for chord work). Test pending; basic-pitch parameter tuning comes first.

Future on page: @spotify/basic-pitch (TS port, browser TF.js). Drop-in for the CLI command, runs entirely in the page. Param sliders exposed as advanced UI for users who want to tune for their instrument.

4.5

Quantize MIDI to detected beat grid (optional MIDI-domain pedal) 🔬 CLI-tested

Our own pedal — not in any other audio-to-MIDI tool we've seen. basic-pitch outputs notes with sample-accurate timing, which means jitter on every onset. Identical phrases get rendered slightly differently each repeat — the sound of "drift" the user heard on first audition.

Fix: detect tempo + beat grid from the source audio (librosa.beat.beat_track), then snap every MIDI note's onset and offset to the nearest sub-beat grid tick. Drop notes shorter than ~80ms (transcription noise). Merge same-pitch notes within a 30ms gap. Works because cumbia and norteño have stable tempo and clean onsets — beat tracking is reliable on this repertoire (genre-scope dependency: would NOT work on rubato art-music).

python scripts/quantize.py audio.wav input.mid output.mid --subdiv 4
# subdiv=4 → 16th-note grid at detected tempo
# subdiv=8 → real 16ths when tracker grabbed half-tempo (common on cumbia)

CLI-tested 2026-05-04 against EZ Band La Chona stems. htdemucs stem (medium params): 645 → 527 notes (subdiv=4) / 532 (subdiv=8, real 16ths) / 528 (subdiv=16, real 32nds). htdemucs_ft stem: 692 → 563 (subdiv=4) / 391 (subdiv=2 = quarter grid) / 571 (subdiv=8) / 571 (subdiv=16). Note count past 16ths plateaus because the 80ms drop-short floor caps how short a note can be regardless of grid resolution.

Listener finding (2026-05-04): real-32nd grid (subdiv=16) sounds more natural than real-16th on the EZ Band La Chona reference. The tab strip itself is largely rhythmless (button-press order matters more than exact timing), but 32nd quantization captures the order of pickup notes / anacrusis / fast runs that 16th grid loses by snapping them onto the wrong beat. For the audio playback / Universal-tier staff render, 32nds win. Per-tier defaults: Tab tiers (Harmony / Full) → subdiv=16 (32nds, preserves pickup order). Universal tier (standard staff) → subdiv=8 (real 16ths, cleaner staff readability) with subdiv=16 available as advanced option for ornament-heavy material.

Future on page: grid-subdiv slider (8th / 16th / 32nd / triplet) defaulted by tier, tempo override box (skip auto-detect — librosa often grabs half-tempo on cumbia), drop-short and merge-window sliders. Live A/B preview against the un-quantized output.

Configure 🚧 Placeholder

User picks: tuning (GCF default — Andy's Panther), rows (3-row default — Panther), output tier (Universal / Harmony tab / Full tab), and hand-track assignment (auto-split by octave or manual marking).

Future on page: form panel with selects + radio buttons. State persists across runs via localStorage so the user doesn't reconfigure every song.

Map to instrument 🚧 Placeholder

The LLM API hop — provider-agnostic by design. Send {notes, tuning, rows, tier} + the GCF button-map dataset. Model returns: right-hand button + bellows assignments, left-hand 12-button assignments, notation as ABC or MusicXML.

Validator pass rejects buttons that don't exist on the user's instrument; a deterministic post-processor repairs physically-infeasible bellows reversals.

Provider default: DeepSeek (V3 / Coder). Auto-prompt-caching, ~10x cheaper than Claude for this kind of structured constraint-mapping at parity quality. Premium fallback: Claude (Sonnet or Haiku) when DeepSeek's output validation fails or for higher-stakes Full-tab work where bellows planning is gnarly. Both are OpenAI-or-Anthropic-shaped APIs; the worker is a thin switch.

Future on page: single fetch() to a Cloudflare Worker that holds the API keys (multi-provider). Prompt-cache the button-map (static per tuning) — DeepSeek caches automatically, Anthropic with cache_control. Blocks on Stage 1 of phasing — the GCF button-map dataset (~86 entries) needs to exist first.

Render 🚧 Placeholder

VexFlow paints the standard staff (treble + bass clef). Custom SVG paints the tablature strip below: numbered buttons + push/pull arrows for the right hand, 1-of-12 button index for the left hand. Layout matches the user's selected tier.

Future on page: <div id="staff"> for VexFlow, <div id="tab"> for the tab-strip SVG. Side-by-side print layout for paper output.

Export 🚧 Placeholder

Browser print-to-PDF (zero work, looks fine on letter / A4). MusicXML download for users who want to load into MuseScore for further editing. ABC notation copy-button for embeds.

Future on page: three buttons. Print is just window.print() with print-stylesheet. MusicXML is Blob + download attribute. ABC is clipboard.

🧪 Tech stack hypothesis

Layer	Pick	Why
Frontend rendering	VexFlow + custom tab-strip SVG	Well-maintained, the tab strip is custom anyway.
MIDI parsing	`@tonejs/midi`	ESM, no build step, clean API.
LLM (default)	DeepSeek API (V3 / Coder)	Auto-prompt-caching, ~10x cheaper than Claude for structured constraint-mapping. Cost matters at hobby-project scale.
LLM (fallback)	Claude API (Sonnet / Haiku)	Premium tier for Full-tab bellows planning where DeepSeek validation fails. Worker switches providers on retry.
API key gate	Cloudflare Worker	Already DLz infra. Keeps the key off the page.
PDF export	Browser print first	Zero work. Upgrade to `pdf-lib` only if quality is bad.

📊 The button-map dataset (the actual moat)

Product quality is gated on accurate button-layout data per tuning. Manual data entry, but small in scale:

GCF 3-row Panther (Andy's, Phase 1 target): ~31 right-hand buttons × 2 directions = 62 entries
Plus 12 left-hand bass buttons × 2 directions = 24 entries
Per tuning: ~86 entries
Five common Panther tunings (GCF, FBbEb, EAD, ADG, plus one): a few hundred entries total

A weekend of careful data entry from manufacturer charts (Hohner, Gabbanelli) and community resources. Without this dataset, hallucination risk is too high to ship. Phase 1 = GCF only. Other tunings added one at a time as audience demand surfaces.

❓ Open questions for next session

Hand assignment. Manual user marking, octave heuristic, or both?
Bellows planning. Single LLM call, deterministic post-processor, or both?
Quantization. Pre-quantize live-recorded MIDI before LLM, or trust the model?
Headline output format. PDF for the player, MusicXML for the tinkerer, ABC for the embed — which is the lead?
Validation UX. When the LLM proposes a non-existent button, how do we surface it?
Free vs paid. Free-on-DLz, Claude-passthrough, or subscription?
Audience reach. Where do we tell first? r/accordion, r/buttonaccordion, conjunto Facebook groups, YouTube tutorial comment sections?

⚠️ Risks

LLM hallucinates buttons — mitigation: validator pass against the button-map rejects non-existent buttons.
Physically infeasible bellows — mitigation: post-processor detects rapid reversals and rebudgets.
MIDI quality varies wildly — mitigation: pre-flight checks ("this MIDI has 12 tracks — pick the melody"), front-end quantizer.
Niche audience by Western-music-software standards — counterpoint: not niche by Latin American / diaspora accordion standards. Right comparison is "Spotify for ranchera," not "Sibelius for jazz."
IP creep — Phase 1 stays MIDI-in-only on purpose. Audio-in stays Phase N+ with separate legal review.

🪜 Phasing (proposed, not committed)

Phasing follows the tier order — ship the easy tier first, prove the loop, then layer the differentiator. Audio-in is now in-scope from Phase 2 thanks to basic-pitch.

Phase	Tier	Deliverable	Estimate
1	—	GCF Panther button-map dataset (right + left hand)	1 sitting
2	Ingestion	Browser drop-zone: accept .mp3 / .wav (basic-pitch TF.js) AND .mid (@tonejs/midi). Show note-list extracted.	2 sittings
3	—	Cobalt link button + yt-dlp instructions snippet on the page. No ripping on our domain.	1 sitting
4	Universal	VexFlow standard-staff rendering	1 sitting
5	Universal	PDF export, ship as v0.1	1 sitting
6	Harmony tab	Claude prompt: bass-clef chord → 1-of-12 button	1 sitting
7	Harmony tab	Render harmony-tab strip below treble staff	1 sitting
8	Full tab	Right-hand button + bellows assignment (LLM + validator)	2 sittings
9	Full tab	Bellows-continuity post-processor	1 sitting
10	—	Add FBbEb / EAD tunings as demand surfaces	1 sitting per tuning
11	—	Spanish UI mode	1 sitting

🐔 First-test material — Andy's Twinkle Twinkle

Andy's foundational learning song is La Chona — Los Tucanes de Tijuana, 1995. "My Twinkle Twinkle Little Star," in his words. He specifically prefers the EZ Band cover for our reference test (recent, but his pick). That audio file is the Phase 2 acceptance test: EZ Band La Chona (audio) → basic-pitch (MIDI) → Universal-tier render (standard staff). If the output reads as recognizably "La Chona" to Andy, Phase 2 is acceptance-tested.

🚫 Out of scope

~~Audio-to-MIDI transcription~~ — moved into scope Phase 2 via basic-pitch.
YouTube ripping on our domain (link out to Cobalt or yt-dlp instead).
Audio-source-separation by instrument (basic-pitch is polyphonic but doesn't isolate-by-instrument; user supplies clean-ish audio).
Chromatic button accordion (different instrument, different problem).
Piano accordion with Stradella bass (entirely different left-hand system).
Real-time playing assistance / native app.
Microtonal folk traditions outside Western tonal music.

📄 Full spec

The engineering spec lives at accordion-tab-spec.md in the repo root. Same content, less HTML.

← back to Diamond Legendz · other ideas