THIS IS A DRAFT v1 as of 9 June 2022. Posting it early. Beware of bugs/errors. Will remove this notice once they’re cleaned up. Will also post full code soon.
I present the transformer architecture from ground up in Julia in this post. While my main purpose is to understand it in all detail without relying on any framework, I’m also hoping to try this out a way to teach machine learning … i.e. assuming whole program gradient calculation as a primitive. I won’t be explaining the why of the transformer architecture – for which I refer to Jay Alammar’s awesome tutorial – but mostly the how.
This is a post by my kid, His first post! The rest of the text below is his attempt at explaining odd and even numbers.
As a TLA+ newbie, I found the fairness (weak and strong) formulae in TLA+ a bit hard going initially, so sharing the way I managed to wrap my head around it in case it is of help to others. What I’m hoping to get to is to be able to “read” the formulae in a chunked manner so that they make logical sense to me.