<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
  <atom:link href="https://travis.engineer/bookmarks.xml" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
        <title>Bookmarks RSS feed by Travis Van Nimwegen</title>
        <link>https://travis.engineer/bookmarks</link>
        <description>Stay up to date with my latest selection of various handpicked bookmarks</description>
        <lastBuildDate>Thu, 23 Apr 2026 03:49:34 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>All rights reserved 2026, Travis Van Nimwegen</copyright>
        <item>
            <title><![CDATA[Why is the sky blue?]]></title>
            <link>https://explainers.blog/posts/why-is-the-sky-blue/</link>
            <guid>https://explainers.blog/posts/why-is-the-sky-blue/</guid>
            <pubDate>Tue, 10 Feb 2026 00:27:30 GMT</pubDate>
            <enclosure length="0" type="image/png" url="https://explainers.blog/assets/img/meta/sky.png"/>
        </item>
        <item>
            <title><![CDATA[The Illustrated Transformer]]></title>
            <link>https://jalammar.github.io/illustrated-transformer/</link>
            <guid>https://jalammar.github.io/illustrated-transformer/</guid>
            <pubDate>Tue, 23 Dec 2025 03:10:12 GMT</pubDate>
            <description><![CDATA[Discussions:
Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)


Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese

Watch: MIT’s Deep Learning State of the Art lecture referencing this post

Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others


 
  

  
  Update: This post has now become a book! Check out LLM-book.com which contains (Chapter 3) an updated and expanded version of this post speaking about the latest Transformer models and how they've evolved in the seven years since the original Transformer (like Multi-Query Attention and RoPE Positional embeddings).
  
 


In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.

The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.

2025 Update: We’ve built a free short course that brings the contents of this post up-to-date with animations:




A High-Level Look
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.]]></description>
            <content:encoded><![CDATA[Discussions:
Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)


Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese

Watch: MIT’s Deep Learning State of the Art lecture referencing this post

Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others


 
  

  
  Update: This post has now become a book! Check out LLM-book.com which contains (Chapter 3) an updated and expanded version of this post speaking about the latest Transformer models and how they've evolved in the seven years since the original Transformer (like Multi-Query Attention and RoPE Positional embeddings).
  
 


In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.

The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.

2025 Update: We’ve built a free short course that brings the contents of this post up-to-date with animations:




A High-Level Look
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.]]></content:encoded>
            <enclosure length="0" type="image/png" url="https://jalammar.github.io/images/t/The_transformer_encoders_decoders.png"/>
        </item>
        <item>
            <title><![CDATA[Size of Life]]></title>
            <link>https://neal.fun/size-of-life/</link>
            <guid>https://neal.fun/size-of-life/</guid>
            <pubDate>Fri, 12 Dec 2025 05:33:52 GMT</pubDate>
            <description><![CDATA[From an amoeba to a blue whale]]></description>
            <content:encoded><![CDATA[From an amoeba to a blue whale]]></content:encoded>
            <enclosure length="0" type="image/png" url="https://neal.fun/share-cards/size-of-life.png"/>
        </item>
        <item>
            <title><![CDATA[ambient.garden]]></title>
            <link>https://ambient.garden/</link>
            <guid>https://ambient.garden/</guid>
            <pubDate>Thu, 26 Jun 2025 20:00:30 GMT</pubDate>
            <description><![CDATA[An Algorithmic Audio Landscape]]></description>
            <content:encoded><![CDATA[An Algorithmic Audio Landscape]]></content:encoded>
            <enclosure length="0" type="image/png" url="https://ambient.garden/vF/img/sharecard-main.png"/>
        </item>
        <item>
            <title><![CDATA[Neal.fun]]></title>
            <link>https://neal.fun/</link>
            <guid>https://neal.fun/</guid>
            <pubDate>Sun, 29 Dec 2024 22:56:54 GMT</pubDate>
            <description><![CDATA[Games, visualizations, interactives and other weird stuff.]]></description>
            <content:encoded><![CDATA[Games, visualizations, interactives and other weird stuff.]]></content:encoded>
            <enclosure length="0" type="image/jpeg" url="https://rdl.ink/render/https%253A%252F%252Fneal.fun%252F"/>
        </item>
        <item>
            <title><![CDATA[Flag Stories]]></title>
            <link>https://flagstories.co/</link>
            <guid>https://flagstories.co/</guid>
            <pubDate>Sun, 29 Dec 2024 22:53:59 GMT</pubDate>
            <description><![CDATA[a project by ferdio]]></description>
            <content:encoded><![CDATA[a project by ferdio]]></content:encoded>
            <enclosure length="0" type="image/jpeg" url="https://rdl.ink/render/https%253A%252F%252Fflagstories.co%252F"/>
        </item>
        <item>
            <title><![CDATA[Space Elevator]]></title>
            <link>https://neal.fun/space-elevator/</link>
            <guid>https://neal.fun/space-elevator/</guid>
            <pubDate>Sun, 29 Dec 2024 22:48:40 GMT</pubDate>
            <description><![CDATA[Take a trip to space!]]></description>
            <content:encoded><![CDATA[Take a trip to space!]]></content:encoded>
            <enclosure length="0" type="image/png" url="https://neal.fun/share-cards/space-elevator.png"/>
        </item>
        <item>
            <title><![CDATA[It Looks Like You’re Trying To Take Over The World]]></title>
            <link>https://gwern.net/fiction/clippy</link>
            <guid>https://gwern.net/fiction/clippy</guid>
            <pubDate>Sat, 14 Dec 2024 00:43:31 GMT</pubDate>
            <description><![CDATA[Fictional short story about Clippy &amp; AI hard takeoff scenarios grounded in contemporary ML scaling, self-supervised learning, reinforcement learning, and meta-learning research literature.]]></description>
            <content:encoded><![CDATA[Fictional short story about Clippy &amp; AI hard takeoff scenarios grounded in contemporary ML scaling, self-supervised learning, reinforcement learning, and meta-learning research literature.]]></content:encoded>
            <enclosure length="0" type="image/jpeg" url="https://gwern.net/doc/fiction/science-fiction/2021-microsoft-windows11-emoji-msclippypaperclipemoji.jpg"/>
        </item>
    </channel>
</rss>