Understanding How AI Works
Learn the fundamentals of large language models to use them more effectively and understand their capabilities and limitations.
Why Understanding AI Matters for Students
Understanding how AI tools work helps you use them more effectively, recognise their limitations, and make better decisions about when and how to apply them to your studies. You do not need to be a computer scientist to benefit from this knowledge.
What Is a Large Language Model?
Large language models (LLMs) like ChatGPT, Claude, and Gemini are AI systems trained to understand and generate human language. Think of them as sophisticated pattern-recognition systems that have learnt from billions of examples of written text.
These models do not "think" or "understand" in the way humans do. Instead, they predict what words are most likely to come next based on patterns they have learnt during training. When you ask a question, the model generates a response by predicting the most appropriate sequence of words, one token at a time.
๐ง What They Are
- Pattern-matching systems trained on vast amounts of text
- Statistical models that predict likely word sequences
- Tools that can process and generate human-like text
- Systems that learn relationships between concepts and words
โ What They Are Not
- Not conscious beings with understanding or awareness
- Not databases that look up factual information
- Not connected to the internet in real-time (unless explicitly enabled)
- Not infallible sources of truth
How LLMs Are Trained
Stage 1: Pre-Training
The model is trained on massive datasets of text from books, websites, academic papers, and other sources. During this phase, it learns patterns, relationships between words, grammar, facts about the world, and reasoning patterns.
Stage 2: Fine-Tuning
After pre-training, models are further trained to be more helpful, accurate, and safe. This involves reinforcement learning from human feedback (RLHF), where human reviewers rate different responses, teaching the model to prefer certain types of answers.
Key Concepts Every Student Should Understand
๐ Tokens
AI models break text into small units called tokens. A token might be a word, part of a word, or punctuation. The model processes and generates text one token at a time.
๐งต Context Window
The context window is the amount of text the model can "remember" at once. Everything you have written in a conversation, plus the model's responses, counts towards this limit.
๐ฒ Temperature
Temperature controls how "creative" or "random" the model's responses are. Lower temperature (0-0.3) makes responses more focused and deterministic. Higher temperature (0.7-1.0) makes responses more varied and creative.
๐ฎ Probabilistic Generation
The model generates responses by predicting the most probable next word based on the input and its training. This means it might give slightly different answers to the same question asked multiple times.
How AI Generates Responses
When you send a message to an AI, here is what happens:
Input Processing
Your message is broken into tokens and converted into numbers (embeddings) that the model can process.
Pattern Matching
The model analyses patterns in your input and relates them to patterns it learnt during training.
Token Prediction
The model predicts the most likely next token, considering the entire conversation context and what would be most helpful.
Iterative Generation
This process repeats, with each new token influencing what comes next, until the model determines the response is complete.
โ ๏ธ What This Means for Your Studies
- Always verify important facts: The model generates plausible-sounding text, not guaranteed truth.
- Understand that AI "hallucinates": Sometimes it confidently states false information because it is predicting likely-sounding text, not retrieving facts.
- Be specific in your prompts: Better input leads to better output because it helps the model identify the most relevant patterns.
- Use AI as a tool, not an authority: Think critically about responses and use your own judgement.
- Be aware of the knowledge cut-off: Models are trained on data up to a certain date and cannot know about events after that unless they have web search capabilities.