Collection of Arxiv papers of the Gemma model family
This post is also published in Medium and LinkedIn.
Gemma is not just a model, it’s a family of models and tools built from the same research and technology used to create the Gemini models of Google.
Gemma was released just 10 months ago, and the amount of innovations released by Google related to this model is just impressive.
Here is a collection of Arxiv papers showing the main innovations introduced by Google on Gemma in the last 10 months.
This is a summary of all the Arxiv papers describing the variety of innovations introduced in Gemma:
- Gemma 1: first LLM model family announced in February 2024, a decoder-only transformer with 18/28 layers (2B/7B models respectively), GeLU activation functions and a vocab size of 256k tokens. Text-only, English-only and with an input window of 8K tokens.
- RecurrentGemma: variant of Gemma 1 that includes the innovative Griffin architecture, which essentially replaces the attention layer of the transformer by a new Linear Recurrent layer, with the objective of reduce the quadratical complexitity of a standard transformer.
- Griffin: detailed description of the Griffin architecture, which is the basis of the RecurrentGemma model. Griffin combines mixes gated linear recurrences with local attention blocks.
- CodeGemma 1: variant of Gemma 1 with an extra fine-tuning for coding tasks.
- PaliGemma 1: 3B Vision-Language Model (VLM) to generate text from images, which implements SigLIP vision encoder and the Gemma 1-2B language model.
- SigLIP: is one of the key technologies behind PaliGemma. Unlike standard contrastive learning with softmax normalization, this new technique uses the sigmoid loss only on image-text pairs and does not require a global view of the pairwise similarities for normalization.
- Gemma 2: new model family announced in May 2024 (just four months after Gemma 1), with three sizes 2B, 9B and 27B. Main architectural differences include interleaving local-global attention layers and group-query attention. Another interesting topic is that the smaller 2B and 9B models are trained with knowledge distillation.
- ShieldGemma: content moderation models built on Gemma 2.
- Gemma Scope: interpretability is a challenge in LLM, due to architecture and computational requirements. Gemma Scope is a collection of hundreds of open sparse autoencoders (SAEs), that can help to understand the inner workings of for Gemma 2-9B and Gemma 2-2B (only those two models for now).
- DataGemma: fine-tuned Gemma 2 models grounded to the real-world data of Data Commons.
- PaliGemma 2: new model announced in December 2024 updated with Gemma 2-2B, 9B and 27B while keeping the same SigLIP encoder used in PaliGemma 1. Main improvements include long detailed captioning capabilities, as well as advanced vision use cases like music score recognition or chest X-ray report generation.
Non-Google researchers have also contributed to the Gemma ecosystem with papers:
- GemmAr: describes InstAr-500k, a new Arabic instruction dataset, and the subsequent finetuning process of Gemma 1-7B on several downstream tasks.
- ColPali: a PaliGemma 1-3B extension which implements document understanding capabilities, especially for images of documents.
Deploying open models should not be difficult and has never been so easy with a cloud platform like Vertex AI, which allows deployment, fine-tuning, evaluation and many other tasks on Gemma.
- For Gemma 1, you can use the free (and no sign-in required) demo playground in Vertex AI Model Garden here.
- For Gemma 2, you can use the free demo playground in Vertex AI Model Garden here
Additionally, you can also find pre-trained models and code on Hugging Face and Kaggle.
If you have read or published a paper on Gemma not included above, would appreciate if you can include it in the comments.
Blog posts in reverse chronological order
[1] Blog post Gemma 1 announcement
[2] Blog post New Gemma 1 variants
[3] Blog post PaliGemma 1, Gemma 2, and an Upgraded Responsible AI Toolkit
[4] Blog post Gemma 1 explained: Gemma models family architectures
[5] Blog post Gemma 1 explained: RecurrentGemma Architectures
[6] Blog post Gemma 1 explained: PaliGemma Architectures
[7] Blog post Gemma 1 explained: What’s new in Gemma 2
[8] Blog post Advancing Multilingual AI with Gemma 2 and a $150K Challenge
[9] Blog post ShieldGemma and Gemma Scope
[10] Blog post DataGemma
[11] Blog post PaliGemma 2