To understand generative AI, it is imperative to understand the technologies powering it. In short, those are Large Language Models (LLMs) which in simple terms imply that computers have been shown a lot of ways in language is written and is tested to guess how they will write theirs.

Use Cases

LLMs are used in sentiment analysis, translations, summarizing, and content creation for text-based, image and audio.

How they work

The way LLMs are used is that they take instructions called Prompts which are passed through the Model and offer a Completion. For example, you can have the following steps:

(user) Prompt : “What is the capital of the USA”

Model (Offers Completion): “Washington, DC”

History of LLMS

The mathematical idea of LLMs come from a paper by google called Attention is all you need which proposed the idea of the self-attention transformer architecture. The architecture is composed of two main layers: the encoder and the decoder.

Encoders

The encoder layer is basically in charge of taking inputs (a sentence), tokenizing the sentence (converting each word to a number) and converting them to embeddings (embeddings are large group of numeric values e.g 512 numbers or more that capture the underlying meaning behind the word/sentence).

Encoders can be used by itself in sentiment analysis, replacing a word in a sentence etc.

Decoders

Decoders are used to generate new information from embeddings, the idea of decoders are quite simple in the sense that, they are the generating arm of the transformer architecture. Decoders can be used in summarizing a text, answering questions and other form of generating new textual information. Tools like ChatGPT primarily are decoder based

Encoders and Decoders

Encoder and decoders together are used in what is described as sequence to sequence tasks such as translating languages.

Prompt Engineering (How to make LLMs work better)

To make LLMs respond better, using techniques to guide the AI model have been proposed. This is where prompt engineering comes in. Prompt engineering uses the of showing examples of how a question to be answered called In-Context Learning. There are 3 forms of in-context learning:

Zero-shot inference: no examples are given to the model
One-shot inference: one example is given to the model while asking the question.
Few-shot inference: Few examples are given to the model while the questions are being asked.