Transformers: Attention in Disguise
We discuss the Transformer, a purely attention-based architecture that is more performant, more efficient, and more parallelizable than recurrent network-based models.
We discuss the Transformer, a purely attention-based architecture that is more performant, more efficient, and more parallelizable than recurrent network-based models.
I describe ELMo, a recently released set of neural word representations that are pushing the state-of-the-art in natural language processing pretraining methodology.
A discussion of fundamental deep learning algorithms people new to the field should learn along with a recommended course of study.
Given all the recent buzz around artificial intelligence, I discuss three reasons for why we are seeing such widespread interest in the field today.
A discussion of the most important skills necessary for being an effective machine learning engineer or data scientist.
Following my attendance at the 18th Annual Meeting on Discourse and Dialogue, I summarize the most promising directions for future dialogue research, as gleaned from discussions with other researchers at the conference.