Home » Transformers from Scratch: Half 2 | by Paula Ceccon Ribeiro | Jul, 2023

Transformers from Scratch: Half 2 | by Paula Ceccon Ribeiro | Jul, 2023

by Narnia
0 comment
Photo by Jeffery Ho on Unsplash

In my earlier article, I wrote about the important thing ideas of the Transformer mannequin, particularly, the Attention Mechanism. In this publish, I’m going via the structure of the Transformer mannequin, and we’re going to code it from scratch.

The full supply code proven on this article may be discovered right here:

The Transformer mannequin structure makes use of an encoder-decoder structure, which is depicted within the following determine:

Figure 1: Model structure, from “Attention is All You Need

As may be seen, it consists of two parts: an encoder and a decoder. These parts play essential roles in duties corresponding to sequence-to-sequence studying, language translation, textual content era, and extra.

The encoder takes an enter sequence and converts it right into a fixed-length illustration, sometimes called a context vector or latent illustration. The fundamental goal of the encoder is to seize the related info from the enter sequence and encode it right into a extra significant and compressed illustration that may be additional utilized by the decoder. In the context of pure language processing, the encoder processes a sequence of phrases or tokens and creates a context vector that encodes the important info from the enter textual content.

The decoder, then again, takes the fixed-length illustration (context vector) produced by the encoder and generates an output sequence step-by-step. It is chargeable for decoding the context vector right into a sequence of tokens or phrases in a means that aligns with the specified output. During the decoding course of, the decoder attends to the context vector and beforehand generated tokens to make knowledgeable selections in regards to the subsequent token to generate.

You may also like

Leave a Comment