Attention is just Kernel Smoothing?
Attention is just Kernel Smoothing?
I was reading this blog post, where the author explains that attention is just a re-invention of kernel smoothing.
I was reading this blog post, where the author explains that attention is just a re-invention of kernel smoothing.