12 of the best large language models – TechTarget

You forgot to provide an Email Address.
This email address doesn’t appear to be valid.
This email address is already registered. Please log in.
You have exceeded the maximum character limit.
Please provide a Corporate Email Address.
Please check the box if you want to proceed.
Please check the box if you want to proceed.
By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.
Large language models are the dynamite behind the generative AI boom of 2023. However, they’ve been around for a while.
LLMs are black box AI systems that use deep learning on extremely large datasets to understand and generate new text. Modern LLMs began taking shape in 2014 when the attention mechanism — a machine learning technique designed to mimic human cognitive attention — was introduced in a research paper titled “Neural Machine Translation by Jointly Learning to Align and Translate.” In 2017, that attention mechanism was honed with the introduction of the transformer model in another paper, “Attention Is All You Need.”
Some of the most well-known language models today are based on the transformer model, including the generative pre-trained transformer series of LLMs and bidirectional encoder representations from transformers (BERT).
ChatGPT, which runs on a set of language models from OpenAI, attracted more than 100 million users just two months after its release in 2022. Since then, many competing models have been released. Some belong to big companies such as Google and Microsoft; others are open source.
Constant developments in the field can be difficult to keep track of. Here are some of the most influential models, both past and present. Included in it are models that paved the way for today’s leaders as well as those that could have a significant effect in the future.
This article is part of
Below are some of the most relevant large language models today. They do natural language processing and influence the architecture of future models.
BERT is a family of LLMs that Google introduced in 2018. BERT is a transformer-based model that can convert sequences of data to other sequences of data. BERT’s architecture is a stack of transformer encoders and features 342 million parameters. BERT was pre-trained on a large corpus of data then fine-tuned to perform specific tasks along with natural language inference and sentence text similarity. It was used to improve query understanding in the 2019 iteration of Google search.
Falcon 40B is a transformer-based, causal decoder-only model developed by the Technology Innovation Institute. It is open source and was trained on English data. The model is available in two smaller variants as well: Falcon 1B and Falcon 7B (1 billion and 7 billion parameters). Amazon has made Falcon 40B available on Amazon SageMaker. It is also available for free on GitHub.
Just three days into its public release in November 2022, Galactica was Meta’s LLM designed specifically for scientists. It was trained on a collection of academic material — 48 million papers, lecture notes, textbooks and websites. As most models do, it produced AI “hallucinations” that members of the scientific community deemed unsafe because they sounded authoritative. This made them hard to detect quickly and were generated in a domain that requires little margin for error.
GPT-3 is OpenAI’s large language model with more than 175 billion parameters, released in 2020. GPT-3 uses a decoder-only transformer architecture. In September 2022, Microsoft announced it had exclusive use of GPT-3’s underlying model. GPT-3 is 10 times larger than its predecessor. GPT-3’s training data includes Common Crawl, WebText2, Books1, Books2 and Wikipedia.
GPT-3 is the last of the GPT series of models in which OpenAI made the parameter counts publicly available. The GPT series was first introduced in 2018 with OpenAI’s paper “Improving Language Understanding by Generative Pre-Training.”
GPT-3.5 is an upgraded version of GPT-3 with fewer parameters. GPT-3.5 was fine-tuned using reinforcement learning from human feedback. GPT-3.5 is the version of GPT that powers ChatGPT. There are several models, with GPT-3.5 turbo being the most capable, according to OpenAI. GPT-3.5’s training data extends to September 2021.
It was also integrated into the Bing search engine but has since been replaced with GPT-4.
GPT-4 is the largest model in OpenAI’s GPT series, released in 2023. Like the others, it’s a transformer-based model. Unlike the others, its parameter count has not been released to the public, though there are rumors that the model has more than 170 trillion. OpenAI describes GPT-4 as a multimodal model, meaning it can process and generate both language and images as opposed to being limited to only language. GPT-4 also introduced a system message, which lets users specify tone of voice and task.
GPT-4 demonstrated human-level performance in multiple academic exams. At the model’s release, some speculated that GPT-4 came close to artificial general intelligence (AGI), which means it is as smart or smarter than a human. GPT-4 powers Microsoft Bing search, is available in ChatGPT Plus and will eventually be integrated into Microsoft Office products.
LaMDA (Language Model for Dialogue Applications) is a family of LLMs developed by Google Brain announced in 2021. LaMDA used a decoder-only transformer language model and was pre-trained on a large corpus of text. In 2022, LaMDA gained widespread attention when then-Google engineer Blake Lemoine went public with claims that the program was sentient. It was built on the Seq2Seq architecture.
Large Language Model Meta AI (Llama) is Meta’s LLM released in 2023. The largest version is 65 billion parameters in size. Llama was originally released to approved researchers and developers but is now open source. Llama comes in smaller sizes that require less computing power to use, test and experiment with.
Llama uses a transformer architecture and was trained on a variety of public data sources, including webpages from CommonCrawl, GitHub, Wikipedia and Project Gutenberg. Llama was effectively leaked and spawned many descendants, including Vicuna and Orca.
Orca was developed by Microsoft and has 13 billion parameters, meaning it’s small enough to run on a laptop. It aims to improve on advancements made by other open source models by imitating the reasoning procedures achieved by LLMs. Orca achieves the same performance as GPT-4 with significantly fewer parameters and is on par with GPT-3.5 for many tasks. Orca is built on top of the 13 billion parameter version of LLaMA.
The Pathways Language Model is a 540 billion parameter transformer-based model from Google powering its AI chatbot Bard. It was trained across multiple TPU 4 Pods — Google’s custom hardware for machine learning. PaLM specializes in reasoning tasks such as coding, math, classification and question answering. PaLM also excels at decomposing complex tasks into simpler subtasks.
PaLM gets its name from a Google research initiative to build Pathways, ultimately creating a single model that serves as a foundation for multiple use cases. There are several fine-tuned versions of PaLM, including Med-PaLM 2 for life sciences and medical information as well as Sec-PaLM for cybersecurity deployments to speed up threat analysis.
Phi-1 is a transformer-based language model from Microsoft. At just 1.3 billion parameters, Phi-1 was trained for four days on a collection of textbook-quality data. Phi-1 is an example of a trend toward smaller models trained on better quality data and synthetic data.
“We’ll probably see a lot more creative scaling down work: prioritizing data quality and diversity over quantity, a lot more synthetic data generation, and small but highly capable expert models,” wrote Andrej Karpathy, former director of AI at Tesla and OpenAI employee, in a tweet.
Phi-1 specializes in Python coding and has fewer general capabilities because of its smaller size.
StableLM is a series of open source language models developed by Stability AI, the company behind image generator Stable Diffusion. There are 3 billion and 7 billion parameter models available and 15 billion, 30 billion, 65 billion and 175 billion parameter models in progress at time of writing. StableLM aims to be transparent, accessible and supportive.
Although LLMs are a recent phenomenon, their precursors go back decades. Learn how recent precursor Seq2Seq and distant precursor ELIZA set the stage for modern LLMs.
Seq2Seq is a deep learning approach used for machine translation, image captioning and natural language processing. It was developed by Google and underlies some of their modern LLMs, including LaMDA. Seq2Seq also underlies AlexaTM 20B, Amazon’s large language model. It uses a mix of encoders and decoders.
ELIZA was an early natural language processing program created in 1966. It is one of the earliest examples of a language model. ELIZA simulated conversation using pattern matching and substitution. ELIZA, running a certain script, could parody the interaction between a patient and therapist by applying weights to certain keywords and responding to the user accordingly. The creator of ELIZA, Joshua Weizenbaum, wrote a book on the limits of computation and artificial intelligence.
Generative AI challenges that businesses should consider
Generative AI ethics: 8 biggest concerns
Generative AI landscape: Potential future trends
Generative models: VAEs, GANs, diffusion, transformers, NeRFs
AI content generators to explore
‘Network fabric’ is a general term used to describe underlying data network infrastructure as a whole.
Loose coupling is an approach to interconnecting the components in a system, network or software application so that those …
Nessus is a platform developed by Tenable that scans for security vulnerabilities in devices, applications, operating systems, …
Google Play Protect is a malware protection and detection service built into Android devices that use Google Mobile Services.
Insecure deserialization is a vulnerability in which untrusted or unknown data is used to inflict a denial-of-service attack, …
An orphan account, also referred to as an orphaned account, is a user account that can provide access to corporate systems, …
Sustainability risk management (SRM) is a business strategy that aligns profit goals with a company’s environmental, social and …
An executive dashboard is a computer interface that displays the key performance indicators (KPIs) that corporate officers need …
A change management strategy is a plan for or systematic approach to dealing with a transition or transformation in an …
Workforce planning is the strategy used by employers to anticipate labor needs and deploy workers most effectively, usually with …
Benefits administration is the process of assembling and managing the benefits an organization provides to employees.
A 360 review (360-degree review) is a continuous performance management strategy aimed at helping employees at all levels obtain …
Multichannel marketing refers to the practice of companies interacting with customers via multiple channels, both direct and …
Demand generation is the process of creating and cultivating interest in a product or service with the goal of generating …
Account-based experience (ABX) is a business-to-business (B2B) strategy in which the sales, marketing and customer success …
All Rights Reserved, Copyright 1999 – 2023, TechTarget

Privacy Policy
Cookie Preferences
Do Not Sell or Share My Personal Information