Psst - neuromorphic folks. Did you know that you can solve the SHD dataset with 90% accuracy using only 22 kb of parameter memory by quantising weights and delays? Check out our preprint with @pengfei-sun.bsky.social and @danakarca.bsky.social, or read the TLDR below. 👇🤖🧠🧪 arxiv.org/abs/2510.27434
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.232Z
We've known for a while now that adding learnable delays is a great way to improve performance for this and other temporal datasets. Just check out how often the word 'delay' crops up on @fzenke.bsky.social's leaderboard for the top performers at SHD.
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.233Z
Most people (including us) are using axonal delays so the number of parameters scales well as you increase network size - O(n) for axonal delays compared to O(n²) for weights. You get a lot more bang for your buck if you add in delays.
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.234Z
So how small (in terms of memory usage) could you make your network and keep good performance? We used learnable quantisations of both weights and delays using different bit budgets, and checked performance. Here's SHD. The point marked I has almost optimal performance but uses hardly any bits.
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.235Z
Optimal performance is at point II on the graph above, with 4 bit weights and 5 bit delays gives 94% accuracy - only 2% off the best known performance on this dataset using only 4% of the memory footprint. Or with 1.58 bit weights and 3 bit delays we get 90% accuracy using only 22 kB memory.
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.236Z
Just as a note here, the 1.58 bits for weights is because weights can take on three values, a positive, negative, or zero weight. So that's log₂3=1.58 bits per weight. Depending on sparsity level, a sparse matrix format might be able to use even fewer bits on average.
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.237Z
What we find is that the most important delays are the long ones. Selectively pruning the longest delays leads to a sharper fall in performance than the shorter ones. This is actually a problem for devices where long delays are expensive. We can reduce this with regularisation, but more work needed.
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.238Z
So here's our challenge to the community: how few bits do you need to solve the SHD dataset at 90%+ accuracy? Let us know if you can beat 22 kB. Just to motivate you, here's something you can do with just 16 kB of code: youtu.be/oITx9xMrAcM?... Time to bring the PC demoscene spirit to SNNs?
— Dan Goodman (@neural-reckoning.org) 2025-11-13T17:40:46.239Z
Exploiting heterogeneous delays for efficient computation in low-bit neural networks
Abstract
Links
Categories



