Introduction to LLMs

Attention Is All You Need

Introduces the Transformer architecture, which has become the foundation for many modern LLMs.

Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin

Open PDF

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Presents BERT, a bidirectional transformer model that revolutionized NLP tasks.

Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Open PDF

Language Models are Few-Shot Learners

Demonstrates the capability of large language models to perform tasks with minimal examples.

Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Open PDF

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Proposes a unified framework for various NLP tasks using a text-to-text approach.

Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Open PDF

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Presents an optimized method for pretraining BERT models, achieving state-of-the-art results.

Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Open PDF

LLaMA: Open and Efficient Foundation Language Models

Introduces LLaMA, a series of foundation language models that are open and efficient.

Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Raphaël Gontijo Lopes, Timothy Dettmers, Myle Ott, Francisco Massa, Alexandre Défossez, Timothy Lewis, Angela Fan, Naman Goyal, Edouard Grave, Michael Auli, Armand Joulin

Open PDF

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Presents BLOOM, a large multilingual language model with 176 billion parameters.

Authors: BigScience Workshop

Open PDF

PaLM: Scaling Language Modeling with Pathways

Introduces PaLM, a large language model that scales language modeling with the Pathways system.

Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Katherine Lee, Noam Shazeer, Megan N. Smith, Jared Kaplan, Nan Ding, Thang Luong, Quoc V. Le

Open PDF

Training Compute-Optimal Large Language Models

Discusses strategies for training large language models in a compute-optimal manner.

Authors: Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, Ilya Sutskever

Open PDF

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Introduces FlashAttention, a method for fast and memory-efficient exact attention with IO-awareness.

Authors: Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

Open PDF

LoRA: Low-Rank Adaptation of Large Language Models

Presents LoRA, a method for low-rank adaptation of large language models.

Authors: Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Weizhu Chen

Open PDF

The Llama 3 Herd of Models

Introduces the Llama 3 Herd of Models, a series of models designed for various NLP tasks.

Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Raphaël Gontijo Lopes, Timothy Dettmers, Myle Ott, Francisco Massa, Alexandre Défossez, Timothy Lewis, Angela Fan, Naman Goyal, Edouard Grave, Michael Auli, Armand Joulin

Open PDF

How to Use This Page

Attention Is All You Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Language Models are Few-Shot Learners

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

RoBERTa: A Robustly Optimized BERT Pretraining Approach

LLaMA: Open and Efficient Foundation Language Models

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

PaLM: Scaling Language Modeling with Pathways

Training Compute-Optimal Large Language Models

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

LoRA: Low-Rank Adaptation of Large Language Models

The Llama 3 Herd of Models

Found this helpful?