The Opportunities and Risks of Foundation Models

Read the first blog in the series reviewing different foundation models here.

The opportunities of foundation models

The sheer size of these foundation models granted them an unexpected ability to perform tasks that they are not explicitly trained on. Trained on a huge corpus of unlabeled text, GPT-3 learned to respond to a task with a natural language prompt. The 175-billion parameter GPT-3 could answer questions, summarize content, analyze sentiments, and even write stories. Needless to say, such emergent capabilities are incredibly exciting.

This is why we call foundation models few-shot learners (Figure 1). Only a small amount of domain training data is needed for these models to learn a specific task. In some cases, they could even perform tasks without being given any demonstration. In such instances, foundation models are called zero-shot learners. 

Figure 1. Examples of one-shot and few-shot learning (Source)

In the case of a language model, the model develops a broad set of skills and pattern recognition abilities during the unsupervised pretraining process. After that, they are not trained on any task-specific dataset. Instead, they use their pattern recognition skills to recognize the desired task at inference time, achieving “few-shot learning” or “zero-shot learning”. 

Therein lies an incredible opportunity for data practitioners. Before the advent of foundation models, practitioners might have to worry about the architecture of their models. Further, they must train their models on specific domain-related data. Yet, there are no guarantees to the quantity or quality of real-world data, both of which are key factors to the performance of the trained model. 

Gone were those days. The rise of foundation models would drastically reduce the need to build models from scratch. Not only that, practitioners could either use the foundation models as is or choose to finetune the foundation models slightly to the task at hand. Either way, practitioners no longer need to spend an exorbitant amount of compute to train their model from scratch. 

Read our benchmark report on leveraging synthetic data for hands-on-wheel detection. 

Same architecture, different tasks

This leads to the consolidation of methodologies for building ML systems across multiple applications. Researchers dubbed this phenomenon “homogenization”. 

Today, the majority of the state-of-the-art NLP models are derived from large language models. As a result, any improvements in foundation models can immediately benefit most NLP tasks. (Though, on the flip side, this also means that many of these state-of-the-art NLP models share the same weaknesses.)

Homogenization extends beyond NLP models but also across research communities. Today, Transformer-based modeling approaches are prevalent for the processing of images, speech, tabular data, and even protein sequences. It is not difficult to envision a future where similar multimodal models (e.g. image to video or audio) can be derived from other foundation models. Homogenization points to the consolidation of effort and the unified set of tools for developing foundation models. 

The risks of using foundation models

Homogenization comes with its risks. Future machine learning systems built on foundation models will share single points of failure. Once adversaries find an exploit with a foundation model, they could leverage the same vulnerability to wreak havoc in multiple systems. 

The risk of harm 

When asked for images of people from multiple occupations, DALL·E 2 produced white male lawyers and female Asian flight attendants (Figure 2). CLIP similarly demonstrates gender and age bias

Figure 2. DALL·E produces images of white men when asked for an image of a lawyer, while synthetic images of flight attendants are dominated by Asian females. (Source: Vox)

These are far from isolated incidents. AIs are notorious for being biased and unfair. The problem is amplified for foundation models, which learn from a huge amount of (biased) data. It is simply close to impossible to sieve out all harmful content before training them. (OpenAI’s attempt to sieve out violent and hateful content from DALL·E 2’s dataset is akin to finding a needle in a haystack.) Inevitably, foundation models perpetuate the human biases embedded in these data. 

Alas, such harm is often overlooked since it is often outweighed by the benefits of foundation models. OpenAI noted in their GPT-3 paper “that words such as violent, terrorism and terrorist co-occurred at a greater rate with Islam than with other religions…” Yet that did not stop GPT-3 in its tracks. “Since we’re being so restrictive anyway… it felt like it was an okay thing to do,” said Sandhini Agarwal, a researcher on OpenAI’s policy team, according to Vox.

The risk of misinformation

The powerful generative capability of foundation models heightens the risk of misinformation. At the height of the Ukraine-Russia war, a video of Ukrainian President Volodymyr Zelensky circulated on social media. Observers were stunned to see Zelensky calling on its citizens to stop fighting Russian soldiers. Except this was a deepfake of Zelensky. 

Researchers from Georgetown University demonstrated how GPT-3 could generate believable misinformation. They found that GPT-3 could elaborate on a biased narrative, rewrite news articles with a different conclusion, and even devise new narratives that could form the basis of conspiracy theories. (Figure 3)

Figure 3. Example of politically charged headlines generated by GPT-3 (Source)

Given the opportunities and despite the risk, Foundation models are here to stay. What will the future bring? We’ll discuss just that in our final installment. Stay tuned!

Read our benchmark report on leveraging synthetic data for hands-on-wheel detection.