The Basics of AI, Generative AI and Large Language Models

Created Oct 9, 2024 | Last updated Oct 22, 2024

Like many engineers I sigh when Generative AI is mentioned because of the hype, which is annoying to say the least and counter productive. Don’t get me wrong, Machine Learning, Large Language Models and Generative AI are fantastic technologies, but setting unrealistic expectations creates disappointment when it fails to live up to the hype which in turn leads to scepticism.

AI and it’s sub-fields are fascinating topics but impenetrable to most people outside the tech community. In this post we’ll take a look at some of the basics of AI and what they mean.

# 1. What are AI, ML, NLP and LLMs?

Artificial Intelligence is a family of technologies that simulates human intelligence, hence the name. It’s been around for many years but gained a lot of attention recently due to advancements in Natural Language Processing (NLP) which have made the technology accessible to people beyond the technology community. These advancements in the form of [[ Large Language Model ]]s can ingest huge amounts of learning data and generate language outputs.

There are multiple branches of AI, in fact too many to describe here so to keep things simple, we’ll focus on just three branches of AI – Machine Learning (ML), [[ Visioning ]] and Natural Language Processing (NLP).

There’s a lot of debate about whether Large Language Models are a subset of Natural Language Processing because there are differences, but a simple view might be that Large Language Models are the intersection of Machine Learning and Natural Language Processing.

Whilst Natural Language Processing systems are used for a broad range of things from text to speech or sentiment analysis, Large Language Models by comparison focus on complex language processing tasks and outputting human readable text, where the focus on text is key. Large Language Models have gotten a lot of attention because they’re use and applicable to a broad range of everyday tasks, the most famous example being ChatGPT.

If want to know more about technical differences Large Language Models and Natural Language Processing technologies there’s a great article by Softermii here.

# 2. Generative AI

So if you’re not confused already, then let’s introduce Generative AI which is a superset of Large Language Models. Generative AI systems, as the name suggests, are designed to generate new content based on data it was trained with, which could be text, audio, images or video. Generating content is the key here.

The key difference between Large Language Models and Generative AI is the former deals with text and natural language tasks, where as the latter goes beyond just text. Large Language Models are a subset of Generative AI.

# 3. It’s All About Training

We all know good teachers matter and teaching, or rather training, a Large Language Models is the key. Two Large Language Models can be identical in every measure, but perform very differently depending on the depth and breadth data they’ve been trained with.

The number parameters a model is trained determines the size of the model so the more parameters, the bigger the model. Training an LLM is expensive and the bigger the model, the more electricity it uses. LLMs use a LOT of energy.

ChatGPT’s GPT-3 model required 1,287 MWh of electricity to train. To put this in context, 1 MWh of energy is enough to power 330 UK homes for 1 hour. GPT-3 consumed enough electricity to power 424,000 uk homes for 1 hour! That’s why LLM training is dominated by five big companies who accounted for 88% of all LLM revenues¹; OpenAI, Google DeepMind, Anthropic, and Cohere.

The volume of training data is just one consideration. The number of topics and the amount of information on each topic (i.e breadth and depth), is another consideration. One way to think of this is imagine you’ve a a bucket which you can either fill with a little bit of information on lots of topics or lots of information on a few topics.

Model training is complex and hard which is a significant barrier. Sure you can get an off-the-shelf [[ Large Language Model ]] working prototype up quickly, but if you have a use case that the model wasn’t trained in or designed for, then it needs to be trained. Any task that’s specific to your company or sector, or requires domain expertise (e.g health, finance or engineering) will need a trained [[ Large Language Model ]] and that requires investment and technical expertise.

Unless you’ve got the right technical expertise and budget I’d suggest looking for a commercial solution that’s been designed for your specific use case before you decide to go it alone.

# 4. Where Do Your Start?

LLMs like all technologies are a tool for getting a job done. Start with identifying all the problems you think LLMs can help with. Questions such as:

What tasks would benefit from using an LLM? eg Internal staff, Back office processes and customer experiences
Are there any relevant examples or case studies to research?
How do you quantify the benefit? What is the RoI?
Is there a commercial pre-trained LLM available for your specific scenario?
How can you test the model and run a trial?
What skills do we need? Are they skills you have in-house or do need a partner?

# 5. References

Large Language Model (LLM) Market size to exceed USD 82.1 Billion by 2033 ↩

Notes mentioning this note

There are no notes linking to this note.