Today, the average person’s understanding of artificial intelligence (AI) is shaped by the likes of ChatGPT and Bard, as evidenced in Google is all set to launch a ChatGPT competition called Bard. These systems can provide conversational experiences to users thanks to the large language models (LLMs) that power them. Indeed, LLMs are transforming the world of AI, as they are behind the numerous AI advancements being made in various sectors.
You may not realize it, but these machines have significantly impacted our daily lives, and their influence is poised to expand further. But how did LLMs achieve such prominence? Let’s dive deeper into their journey.
Early Days of Natural Language Processing
The history of natural language processing (NLP) as we know it – the ability of AI to understand and interact in human language – dates back to the 1960s. According to Understanding Large Language Models (LLMs) in the World of Artificial Intelligence, a chatbot called Eliza was one of the earliest systems to mimic conversation. Compared to today’s sophisticated models, Eliza was indeed rudimentary. Limited to tiny word banks and constrained conversation patterns, it nevertheless helped lay the foundation for today’s LLMs.
The Birth of Statistical Models
The emerging technology of statistical models in the 1980s and 90s has paved the way for
more sophisticated NLP systems. No longer confined to specific word banks or pattern-
matching algorithms alone, these models made use of probability and statistics to predict
what word or phrase might come next in a given sequence. The ‘N-gram Language
Models’, for instance, were developed to anticipate the succeeding word in a sentence
based on the previous ‘n’ number of words.
Here Come the Deep Learning Models
Several notable models emerged from the 2010s onward with the advancement of deep
learning – a subset of machine learning that uses neural networks with several layers
(deep neural networks) to improve the accuracy of predictions and decision-making
capabilities. They revolutionized the AI industry, with applications spanning Google Search
to Siri to autonomous vehicles.
The pioneering Stanford's GloVe model, Facebook’s FastText, and Google’s ‘Word2Vec’
took NLP to a whole new level by building word embeddings, representations of text where
words that have the same meaning have a similar representation. This was a drastic step
away from the old practice of treating each word as a unique entity. It allowed models to
understand semantic relationships between words, hence significantly improving their
comprehension and generation of language.
Transformer Models Emerge
The release of the Transformer Model Architecture (Vaswani et al., 2017) initiated the next pivotal shift in NLP. The model introduced the concept of attention mechanisms, which helped decipher the relevance and relationship of all words in a sentence, not just the preceding ones. Transformer models led to drastically improved machine interpretation and generation of human languages.
Prominent among these models is Google’s BERT (Bidirectional Encoder Representations from Transformers), launched in 2018 and enhanced in 2020. BERT utilizes bidirectional training, allowing the model to understand context from both the left and the right of a word in a sentence. With BERT, not only did Google enhance Search, but it opened up a new way for AI to grasp the intricacies of language similar to how humans do, considering not only the words themselves but also their sequence.
Enter the Giants: GPT and T5
The advent of OpenAI’s Generative Pre-Trained Transformer (GPT) brought the field of NLP another game-changing milestone. Unlike its predecessors, GPT was trained unsupervised on a diversity of internet text. Yet, it can generate coherent and contextually relevant sentences by predicting the next word in a sequence. This feature made it remarkably adept at understanding and generating human-like text.
OpenAI escalated things further still with GPT-2 and its successors GPT-3 and GPT-4, both of which provide astonishing performance improvements. GPT propelled machine learning capabilities to unprecedented heights, proving capable of tasks such as translation, question answering, and even crafting whole articles.
At around the same time as GPT’s progression, Google was working on the Text-to-Text Transfer Transformer, T5. Launched in 2019, the T5 model presented a radical shift in approaching natural language understanding tasks. Instead of designing a unique model for each individual NLP task, it converted them all into a text-to-text problem, where a model is trained to predict a target sequence of text given an input sequence. This unique approach introduces a unified framework to handle any language-related task, from machine translation to summarization, which further boosted the language model’s versatility and efficiency.
The Challenge of LLMs
Despite their remarkable advancements and widespread applications, LLMs are not without their challenges. As noted in the report Custom GPTs from OpenAI May Leak Sensitive Information, these AI models have raised concerns over privacy and ethics. But as long as companies continue to engage in rigorous security and ethical review processes and strive for transparency in their applications, these challenges can be moderated.