Phideus — Interactive Neural Architecture Visualizations

Phideus

Interactive Neural Architecture Visualizations

Visualizations

Gate 4.3 — D0 Baseline (MERT + MIDI)

Full cross-modal architecture. D0 baseline: MERT audio encoder + MIDI transformer with split learning rates. Foundation model (S=73.4%) for all Gate 4.x experiments.

Gate 4.3 — a4r Reverse Cross-Attention

Reverse cross-attention: descriptors (Q) organize encoder features (K/V). A4 spectral descriptor on audio, D4 interval descriptor on MIDI. S=82.0%.

Gate 4.3 — d4a4 Concat Injection

Descriptors concatenated to encoder output before projection. A4 (8-band octave DSP) on audio, D4 (local intervals) on MIDI. S=83.8% (record).

Gate 4.3 — d4x-a4x Forward Cross-Attention

Forward cross-attention: encoder tokens (Q) query descriptor (K/V). Encoder "asks" the descriptor for guidance. Audio attn [2400, 188] — full temporal resolution preserved.

Gate 4.3 — d4r-a4r Mixed Descriptors + Reverse

Mixed descriptor strategy: A4 (spectral) on audio + D4 (intervals) on MIDI, both with reverse cross-attention. S=79.8%.

Gate 5A — T3 Third Tower

Three independent encoder towers converge into shared 256D space. T3 is a lightweight 2-layer transformer (d=256) with 3-way VICReg loss.

Gate 4.3 — Bloque A Training Results

Hybrid adapter architecture. Adapters on frozen layers 0-1, direct unfreeze on layers 2-3. S=49.4%, hard negatives 88.4%.

Gate 3 — DANN Adversarial Analysis

Gradient reversal layer for domain-invariant embeddings. Audio and MIDI towers with domain classifier and adversarial training.

HRM Architecture (Research)

L-Module (fast local) + H-Module (slow global) with Adaptive Computation Time. Q-learning decides when to halt the hierarchical processing loop.

Constellation Tokens (UOEMD)

Dual symmetric encoders with sparse ratio tokens [B,T,48,5]. Factored latent space (shared+private) with modular encoder/decoder configurations.

JEPA-Lite (UOEMD)

Symmetric encoder paths without decoder. Bidirectional predictors with stop-gradient and InfoNCE contrastive alignment.

Roseta VAE (UOEMD)

Dual-domain variational autoencoder. Audio and vibration encoders with shared/private latent space factorization and InfoNCE contrastive loss.

Built with the rendering engine from bbycroft/llm-viz by Brendan Bycroft.
Part of the Phideus research program.