MM Ep-44 – Long vs Short…The Battle of Language Models!

30th November 2022!

ChatGPT has taken the world by storm. The launch was part of a free research preview of GPT3.

ChatGPT is powered by large language models (LLM). LLM systems can engage in natural language conversations, is utilized for content generation and summarization.

Those models are used to provide users with information, assistance, and even simulating human-like interactions.

In today’s episode of Monday Muse let’s look at the language model used by ChatGPT and the considerations of language model in Natural language processing.

Historically…

NLP efforts started in 1960’s & 70’s. Employing statistics in NLP happened in 1990’s.

Historically, lightweight NLP architecture used to be the n-gram model, which dates back to the early days of statistical language modeling.

N-gram models are based on the probability distribution of sequences of n words in a given text.

This is similar to Short Language model (SLM) that uses a relatively limited number of parameters compared to larger counterparts.

These models are designed for tasks requiring language understanding but are constrained by computational resources, training data availability, or specific use-case requirements.

Small language models can include traditional rule-based systems, simpler machine learning models, or compact neural network architectures.

Two things that make it work –

SLMs have a smaller number of parameters, which limits their ability to capture complex language patterns and relationships but makes them computationally effective.
Small language models are often tailored for specific tasks, such as sentiment analysis, named entity recognition, or keyword extraction, where a more lightweight approach is sufficient.

OpenAI wanted to do it differently…

When OpenAI wanted to create their language model, they wanted to do it differently…

1) Use higher Scale of Parameters: GPT-3, have billions or even trillions of parameters, enabling them to capture intricate language patterns and generate contextually relevant text across a wide range of tasks.
2) Versatility: They wanted to employ Large language models as they are more versatile and can generalize across various language tasks without task-specific fine-tuning. They excel in understanding context, generating coherent text, and performing complex language-related tasks.
3) Computational Requirements: Since large language models demand substantial computational resources for training and inference, OpenAI was always in need for more high-end computing resources.
4) Training Data: They trained the large language models on massive and diverse datasets to capture a broad understanding of language.

The result of this new approach?

Natural human-like interaction of ChatGPT powered by LLM has been overwhelming.

ChatGPT has been seen as the fastest-growing consumer internet app of all time after its launch. It gathered an estimated 100 million monthly users in just two months. Facebook, for example, took around four and a half years to hit 100 million users.

The 175-billion-parameter model of ChatGPT receives 10 million queries a day.

As you see, a large Language Model is a sophisticated natural language processing (NLP) model with a vast number of parameters, typically in the order of millions or even billions.

Large language models leverage deep learning architectures, such as transformers, and are often pre-trained on extensive datasets to capture intricate language patterns and nuances. Examples of large language models include OpenAI’s GPT-3, BERT, and similar state-of-the-art models.

4 things are unique to LLMs in sharp contrast to Short Language Model:

1) Scale of Parameters: LLMs are characterized by a massive number of parameters, enabling them to capture complex language patterns and relationships. This scale allows for a broad understanding of context and semantics.
2) Versatility: Large language models are versatile and can generalize across various language tasks without task-specific fine-tuning. They exhibit the ability to perform well on a wide range of NLP tasks, from language translation to sentiment analysis.
3) Contextual Understanding: These models excel in contextual understanding, considering the relationships between words and phrases in a given context. This is crucial for generating coherent and contextually relevant text.
4) Computational Requirements: Training and fine-tuning large language models demand substantial computational resources, including powerful GPUs or TPUs. Inference with these models can also be computationally intensive.

Solution of Future?

Now that you have understood the dynamics of LLM and SLM, the question is – Is LLM the solution for all future?

Not really…

Short Language Model is not computationally intensive and can be good for several common use cases:

Digital marketing agencies use small language model SpaCy for keyword extraction. This type of model helps identify and extract relevant keywords from content, optimizing websites for search engines and improving their online visibility.
https://spacy.io/usage/models
In the healthcare sector, small language model ClinicalBERT is employed for Named Entity Recognition (NER). These types of models can identify and categorize entities such as medical conditions, treatments, and medications in clinical texts, aiding in information extraction for research and patient care. https://github.com/kexinhuang12345/clinicalBERT
Small language model Sumy is used for text summarization, particularly in the context of news articles. Organizations implement this model to automatically generate concise summaries of lengthy articles, providing users with quick insights into the content.
https://miso-belica.github.io/sumy/
Small language model TextBlob finds applications in sentiment analysis on social media. Companies leverage these types of models to analyze user comments and posts, gaining insights into public sentiment regarding products, services, or brand reputation.
https://textblob.readthedocs.io/en/dev/

Large Language Model will be employed for computationally intensive contextual understanding and relationships, such as:

OpenAI’s GPT-3: Large language models like GPT-3 are employed by companies to power advanced chatbots and virtual assistants
Google’s BERT (Bidirectional Encoder Representations from Transformers) is utilized for content generation and summarization
Salesforce’s Einstein Language employ large language models for sentiment analysis to understand customer feedback and opinions.
Facebook’s M2M-100 utilize models such as M2M-100 to enable accurate and context-aware translation between multiple languages

These use cases highlight the versatility of large language models in addressing various language-related tasks and applications across different industries. Companies leverage the power of these models to enhance user experiences, automate content creation, and derive valuable insights from vast amounts of textual data.

In Conclusion

In the realm of natural language processing, the battle between large and small language models will continue.

Advocates for Large Language Models argue that their vast parameter scale allows for unparalleled versatility.

LLMs, exemplified by GPT-3 and BERT, showcase impressive contextual understanding, excelling in tasks ranging from language translation to sentiment analysis without task-specific fine-tuning.

Small Language Models (SLMs) champion efficiency and task specificity. SLMs, designed to be lightweight and computationally frugal, shine in targeted applications where computational resources are constrained.

These models may lack the broad generalization of their larger counterparts but prove invaluable for specialized tasks requiring nimble processing.

While LLMs boast a grand arsenal, SLMs offer a strategic advantage in resource efficiency and task-focused performance.

Choosing between them is answering to the question of RoI.

Which might keep LLMs to be consumed as a service (per token) as we move forward.

That’s all for today. Hope, it was useful.

Till next week!