Phideus
Interactive Neural Architecture Visualizations
Visualizations
Gate 4.3 — D0 Baseline (MERT + MIDI)
Full cross-modal architecture. D0 baseline: MERT audio encoder + MIDI transformer with split learning rates. Foundation model (S=73.4%) for all Gate 4.x experiments.
Gate 4.3 — a4r Reverse Cross-Attention
Reverse cross-attention: descriptors (Q) organize encoder features (K/V). A4 spectral descriptor on audio, D4 interval descriptor on MIDI. S=82.0%.
Gate 4.3 — d4a4 Concat Injection
Descriptors concatenated to encoder output before projection. A4 (8-band octave DSP) on audio, D4 (local intervals) on MIDI. S=83.8% (record).
Gate 4.3 — d4x-a4x Forward Cross-Attention
Forward cross-attention: encoder tokens (Q) query descriptor (K/V). Encoder "asks" the descriptor for guidance. Audio attn [2400, 188] — full temporal resolution preserved.
Gate 4.3 — d4r-a4r Mixed Descriptors + Reverse
Mixed descriptor strategy: A4 (spectral) on audio + D4 (intervals) on MIDI, both with reverse cross-attention. S=79.8%.
Gate 5A — T3 Third Tower
Three independent encoder towers converge into shared 256D space. T3 is a lightweight 2-layer transformer (d=256) with 3-way VICReg loss.
Gate 4.3 — Bloque A Training Results
Hybrid adapter architecture. Adapters on frozen layers 0-1, direct unfreeze on layers 2-3. S=49.4%, hard negatives 88.4%.
Gate 3 — DANN Adversarial Analysis
Gradient reversal layer for domain-invariant embeddings. Audio and MIDI towers with domain classifier and adversarial training.
HRM Architecture (Research)
L-Module (fast local) + H-Module (slow global) with Adaptive Computation Time. Q-learning decides when to halt the hierarchical processing loop.
Constellation Tokens (UOEMD)
Dual symmetric encoders with sparse ratio tokens [B,T,48,5]. Factored latent space (shared+private) with modular encoder/decoder configurations.
JEPA-Lite (UOEMD)
Symmetric encoder paths without decoder. Bidirectional predictors with stop-gradient and InfoNCE contrastive alignment.
Roseta VAE (UOEMD)
Dual-domain variational autoencoder. Audio and vibration encoders with shared/private latent space factorization and InfoNCE contrastive loss.
Built with the rendering engine from bbycroft/llm-viz by Brendan Bycroft.
Part of the Phideus research program.