gray
red
blue
green
purple
大模型发展史
大模型发展史
by WZhang published 2026-02-07 views 158

大模型发展史

Abstract: 阅读大模型/多模态模型发展过程中一些经典论文,进行翻译和解读,加深印象。


后续要翻译和解读的论文

Year Method Paper Link
2017 Transformer Attention is All You Need https://arxiv.org/pdf/1706.03762
2018 GPT-1 Improving Language Understanding by Generative Pre-Tranining https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
2018 BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/pdf/1810.04805
2019 GPT-2 Language Models are Unsupervised Multitask Learners https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
2020 GPT-3 Language Models are Few-Shot Learners https://arxiv.org/pdf/2005.14165
2020 ViT An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/pdf/2010.11929
2020 simCLR A Simple Framework for Contrastive Learning of Visual Representations https://arxiv.org/pdf/2002.05709
2020 DETR End-to-End Object Detection with Transformers https://arxiv.org/pdf/2005.12872
2021 CLIP Learning Transferable Visual Models From Natural Language Supervision https://arxiv.org/pdf/2103.00020
2021 ViLT ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision https://arxiv.org/pdf/2102.03334
2021 GLIP Grounded Language-Image Pre-training https://arxiv.org/pdf/2112.03857
2021 Swin Transformer Swin Transformer: Hierarchical Vision Transformer using Shifted Windows https://arxiv.org/pdf/2103.14030
2021 MAE Masked Autoencoders Are Scalable Vision Learners https://arxiv.org/pdf/2111.06377
2021 MoCo Momentum Contrast for Unsupervised Visual Representation Learning https://arxiv.org/pdf/1911.05722
2021 MoCo 2 Improved Baselines with Momentum Contrastive Learning https://arxiv.org/pdf/2003.04297
2021 MoCo 3 An Empirical Study of Training Self-Supervised Vision Transformers https://arxiv.org/pdf/2104.02057
2022 BLIP BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation https://arxiv.org/pdf/2201.12086
2023 BLIP 2 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models https://arxiv.org/pdf/2301.12597
2023 Llama LLaMA: Open and Efficient Foundation Language Models https://arxiv.org/pdf/2302.13971
2023 Llama 2 Llama 2: Open Foundation and Fine-Tuned Chat Models https://arxiv.org/pdf/2307.09288
2024 Llama 3 The Llama 3 Herd of Models https://arxiv.org/pdf/2407.21783
2024 Sora Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models https://arxiv.org/pdf/2402.17177
2024 GPT-4 GPT-4 Technical Report https://arxiv.org/pdf/2303.08774
2025 Qwen 3 Qwen3 Technical Report https://arxiv.org/pdf/2505.09388
2025 DeepSeek-R1 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning https://arxiv.org/pdf/2501.12948
0comment(s)