This past year has been a whirlwind of advancements in artificial intelligence, making AI a topic that is impossible to ignore. If you’re anything like me, the barrage of new terminology and concepts can seem daunting.

Let’s navigate this together by breaking down the ten essential terms needed to grasp AI conversations. We will explore what I believe are the ten most important terms, starting with the basics.

None of these sections provide a complete picture—by any means—but I hope that they give you a good foundation to build upon. With that said, let’s jump right in.

Generative AI (GenAI)

Generative AI is a leap in artificial intelligence where the focus shifts from imitation to generation. Using data patterns it has learned, Generative AI generates new content, whether that’s text, images, or even music. Unlike traditional AI, that might simply replicate existing examples, Generative AI will produce original works.

Generative AI falls within the umbrella of the broader AI hierarchy as shown in the image below. The term generative refers to the ability of Generative AI to apply complex algorithms to imagine, create and generate something new.

ChatGPT, released in November 2022, made Generative AI widely known to the public by reaching one million users in five days, a record for any site or online tool to that point. ChatGPT showed the potential of Generative AI to interact and respond at a level that wasn’t seen before. Generative AI focuses not just on understanding but on crafting something anew.

Generative AI as a Subnet of Artificial Intelligence

Large Language Model (LLM)

Large Language Models (LLMs) are a class of Generative AI systems that have been trained on huge sets of data. As opposed to standard language models, LLMs contain billions or trillions of parameters. LLMs can generate remarkably human-like output, something smaller models could not achieve. The designation ‘Large’ Language Models comes from the massive scale of their datasets and models.

Although no exact threshold has been universally set, having around 1 billion parameters generally categorizes a Language Model as a large one. This could change. Below is a chart that shows many of the LLMs in existence today. GPT-2 was the first to break 1B parameters in February 2019.

Up to a certain point, the larger the model, the more it can handle a wide range of tasks in as human-like a manner as possible. There are many other considerations too, such as the quality of data that is used to train the models.

See the following diagram that gives an idea of the LLMs that are out there, including their relative sizes to each other, over time.

Comparing the Relative Sizes of Various Large Language Models over time

Multimodality

Multimodality, a term often used now, refers to processing various types of information, including text, images, audio, and video.

It also means that most any combination of these can work together. For instance, describing a picture in words to generate an image is an example of Text-to-Image multimodality. Other examples include:

  • Image-to-text
  • Image-to-video
  • Audio-to-text
  • Or any number of combinations

Some have labeled 2023 the “Year of AI” (see Collins Dictionary word of the year). A large part of this is because Generative AI doesn’t just output text, but there has been amazing progress in the area of generating images, video, audio, and more. For example, you’re now able to take a 1-5 minute recording of your voice and have a digitalized version of yourself that can create new audio content that sounds just like you (try it at ElevenLabs). Or, taking it even further, you can do the same with video at HeyGen. Many similar tools are emerging, as seen in the site There’s An AI For That (TAAFT), which lists over 10,000 AI tools.

What used to take an expert designer working for hours in the past, is becoming accessible to non-designers with amazing creativity and ease. Many tools are still hit or miss, but it’s progressing at a breakneck speed.

Semantic Understanding/Natural Language Processing (NLP)

This continues to blow my mind. It’s astounding how current LLMs grasp your questions’ intent, mimicking human speech understanding. Gone are the days of carefully phrasing questions for computers or machines. Due to their ability to analyze words, meanings, and sentence structures, LLMs engage in what is called Natural Language Processing, making interactions feel surprisingly natural.

Try this experiment with ChatGPT: ask it to summarize a complex paragraph, document, or meeting notes, and see how it captures the essence with remarkable accuracy. It can determine the intent of the document in short or long form, depending what you ask. You can carry on full conversations with many of these Generative AI tools. Sites like Character.ai even make a lot of fun out of letting people create different types of characters or personalities and enable you to converse with them. Just last week ChatGPT turned on the voice feature for non-paid accounts. Carrying on a human-like voice conversation with a machine has been achieved.

Until Generative AI, systems like phone auto-attendants would let you move through the prompts with voice instructions. Those followed decision tree structures with a limited dictionary of words, and they were pretty good, but it was hard to trust that they would provide the right answer unless you used the exact right words at the right time. Today’s Generative AI is in a whole new class, enabling you to converse as if you were talking to a friend, interacting as though it fully understands the semantics of your discussions, and can carry on a complete conversation.

Prompting / Prompt Engineering

This is a new area of expertise which refers to asking the right questions (prompts) of the AI tools to have the output be tailored to unique needs. While LLMs supports semantic search in an amazing way, you can take it even further by using different tricks. It’s kind of like programming the AI to perform specific tasks in a manner that you define. We’re even starting to see companies offering full-time Prompt Engineer positions.

There are many styles of prompt engineering (see this article on 5 prompt frameworks), but due to the semantic understanding of the new LLMs, you don’t necessarily need to follow any particular framework. That said, there are many methods that people are discovering that are more effective than others, and it’s changing rapidly as the LLMs are tuned to handle different situations.

Let’s take a look at one example of prompt engineering:

Use an LLM site like ChatGPT or Claude as the AI tool. Try by first asking (prompting) with the following:

Ask me 5 questions about myself, and then create a short humorous made up story about me. Ask the questions 1 at a time.

Amazingly, AI will lead you through 5 questions, one at a time, and only after the 5 questions are answered will it create a funny story about you.

When you’re working with LLMs, think big. Be creative. You’ll be surprised what you’re able to guide it to do.

Transformers

Transformers are a type of architecture used in this new world of Generative AI. It was introduced in a 2017 paper titled “Attention Is All You Need” by researchers at Google.

Transformers are a type of neural network architecture that has a key innovation of weighing the importance of different words in a sentence, regardless of their position, which helps in understanding the context and relationship between words, where the meaning of a word can depend heavily on the words around it. It generates the words based on the probability of what word comes next.

Transformers are particularly powerful in predicting the next word in a sequence. They will consider the entire input sequence by being autoregressive, which means that they will predict one word at a time, using the previously generated words to guide each subsequent prediction. They don’t start with the end in mind, but due to the sheer size, diversity and quality of their training data and the autoregressive nature, they are able to be very human-like in their output.

This methodology has also made huge advances in translation, question-answering, summarization and much more.

See the image below to give a visual on how the Transformers methodology categorizes sentences into embeddings (the blocks) which, depending on the LLM, and for simplicity sake, let’s consider that it’s approximately the same as words. Notice in the image below how the Transformers model will recommend what word can come after the beginning sentence of “My pet goat has a”. It shows the five most likely next words along with how likely each next word would be.

Tools utilizing LLMs can adjust their output settings, allowing for high precision or a more lenient approach in considering multiple options. One such setting is temperature, which controls the randomness of the LLM’s output. For example, for a creative situation like writing a humorous story, more freedom gives a wider range of possibilities, leading to more fun stories. Conversely, for more factual situations like generating educational textbook materials, a more strict setting will provide a better result.

While this is just one aspect of the Transformers model, this gives a hint at how associated words are used. For a great interactive way to see more about Transformers, check out this site.

Transformers predict the most likely next word in a sequence

Generative Pre-trained Transformer (GPT)

Now that we’ve learned what Generative and Transformers mean, the term GPT will make a lot more sense. The acronym GPT is made up of three words which are:

G – Generative. As described above, new content is generated, rather than following strict decision trees or programming logic.
P – Pre-trained. The models are trained in advance on a large set of data. They just wait to be asked questions.
T – Transformers. See the section above about the Transformers model.

Speaking of Pre-training, training GPT-3 cost $4.6 million1, while GPT-4’s training was over $100 million2.

It’s helpful to know that there is a difference between GPT and ChatGPT. GPT-3.5 (and GPT-4) is the LLM, while ChatGPT is an application that provides an extremely accessible user interface to interact with the LLM in a natural type of way. ChatGPT is more than just an interface since it takes care of things like maintaining your conversation history, supporting personalized settings and much more. However, in essence, it’s an application that utilizes the GPT LLM.

OpenAI, the organization responsible for developing models like GPT-3.5 and GPT-4, coined the term ‘GPT’ for their Large Language Models and ‘ChatGPT’ for their general-purpose chat applications. While ‘GPT’ is a designation unique to OpenAI, the underlying transformer-based architecture has become the foundation for most contemporary LLMs.

Inference

We covered the Pre-training above in GPT. Inference is the other side of the coin. While the pre-training builds the model, inference refers to the processing that occurs to generate a response based on its pre-trained patterns. Essentially, inference is a term that refers to getting an answer from the LLM.

A simplistic way to look at it would be to consider building and using a dictionary—you know, those old fashioned books with lots of words. Building the dictionary takes extensive time and effort to build, index, and publish. That parallels the pre-training in an LLM. After being published, people need to actually use it. It still takes effort to use a dictionary. You need to pick up the dictionary, locate the word on the right page, and then read it. In the LLM world, the parallel to using the dictionary is called inference.

There are smaller LLMs that can be run on a desktop computer, or even a phone. While pre-trained, models require processing power for inference to produce answers. Various companies are competing for hardware and cloud solutions to make this faster and less expensive. For well-known models like ChatGPT, Claude, or Microsoft Copilot, the costs and complexities of inference are hidden from us.

Fine-Tuning

As mentioned above, these LLMs are pre-trained with a set of information. However, pre-trained information quickly gets out of date, and, it only knows about what it was trained on at the time that it was trained.

Fine-tuning is the ability to include new information and overlay it on an existing Language Model. Without the expense of retraining the entire model, you can include just the new or additional information that you need. For example, it could be your business information, current statistics, a particular business vertical, or new trends that you want to target. Think of it as honing the skills of an AI to make it an expert in a particular field.

Many open source and even some commercial LLMs can now be fine-tuned. While capability for fine-tuned models is growing, the process remains challenging without the proper knowledge and infrastructure. However, this is becoming steadily easier with new tools.

There are alternates to fine-tuning, like prompt injection and Retrieval Augmented Generation (RAG). They include relevant information in-flight (at inference time) rather than pre-training, but that’s a topic for another time.

Hallucination

In AI speak, hallucinating refers to the instances where a language model generates information that is not grounded in reality or the input provided. It’s when a language model generates plausible but incorrect or nonsensical information. For example, it may hallucinate a fact about a historical event that never occurred.

Hallucination in action.

Since these Language Models generate new content, sometimes they can literally generate something completely new, not even based on fact. This is a real problem for modern AI, but it’s not a show stopper due to the many positive benefits and the ability to cross-reference factual information. It’s important for people to be aware that these models aren’t perfect, and everything should be verified when you’re retrieving important factual information.

It’s the gray area beside the hallucinations that provides the benefit when you ask to write a new original bedtime story or to suggest new marketing ideas. That said, none of us want a hallucinating AI model when it comes to evaluating a medical scan.

While hallucinating AI models are a problem, they are being worked on and are rapidly improving all the time. Interestingly, it is possible for you to reduce AI hallucinations. For example, simply asking it to not lie leads to more accurate results. Remember, don’t do that if you’re trying to write a creative story because, in that case, you want more variety. Here’s an article with good guidance if you would like to read further on this.

Bonus item: AGI

Artificial General Intelligence (AGI), the holy grail of AI research, aims to create a machine that understands, learns, and applies intelligence as well as a human. Think of the Terminator movies. The goal of those advocating for AGI is to have a machine that has cogitative functions like reasoning, problem-solving, learning from experience, and self-awareness.

At this time, AI is very human-like and extremely impressive. Yet, there’s a difference between appearing human-like, and actually possessing human-like understanding and adaptability.

With the release of these new Large Language Models, AI is getting closer to this holy grail. However, there are differing opinions on how close we are to achieving AGI. There are some people who argue that AGI has already been attained, while others estimate we are still months, years, or even decades away. These varying perspectives stem in part from differences in how AGI is defined. There are also different forecasts for the pace at which AI capabilities are advancing.

Regardless of when the threshold for AGI is ultimately met, modern Generative AI represents significant progress toward more human-like artificial intelligence.

Summary

While I realize that I have only scraped the surface, I hope that this overview of these key AI term has been helpful. It’s my hope that this will lay a foundation to make it easier for you to understand the plethora of terms and concepts needed to navigate the rapidly growing area of Generative AI.

  1. https://heits.digital/articles/gpt3-overview ↩︎
  2. https://en.wikipedia.org/wiki/GPT-4#:~:text=The%20report%20claimed%20that%20%22the,was%20more%20than%20%24100%20million ↩︎

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *