Matrix Orthogonalization Improves Memory in Recurrent Models
- 06-30-2026 This work was funded by Paradigm.
- Transformers exhibit remarkable associative recall (AR) abilities: attention provides each token direct access to those preceding it, a mechanism that has been hard for other architectures, like recurrent neural networks (RNNs), to match.
- But for some domains, we can't afford the quadratic-attention overhead of transformers.
Unverified
- 06-30-2026 This work was funded by Paradigm.
- Transformers exhibit remarkable associative recall (AR) abilities: attention provides each token direct access to those preceding it, a mechanism that has been hard for other architectures, like recurrent neural networks (RNNs), to match.
- But for some domains, we can't afford the quadratic-attention overhead of transformers.
Sources: Ayushtambde