大模型发展史

Abstract: 阅读大模型/多模态模型发展过程中一些经典论文，进行翻译和解读，加深印象。

后续要翻译和解读的论文

Year	Method	Paper	Link
2017	Transformer	Attention is All You Need	https://arxiv.org/pdf/1706.03762
2018	GPT-1	Improving Language Understanding by Generative Pre-Tranining	https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
2018	BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	https://arxiv.org/pdf/1810.04805
2019	GPT-2	Language Models are Unsupervised Multitask Learners	https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
2020	GPT-3	Language Models are Few-Shot Learners	https://arxiv.org/pdf/2005.14165
2020	ViT	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	https://arxiv.org/pdf/2010.11929
2020	simCLR	A Simple Framework for Contrastive Learning of Visual Representations	https://arxiv.org/pdf/2002.05709
2020	DETR	End-to-End Object Detection with Transformers	https://arxiv.org/pdf/2005.12872
2021	CLIP	Learning Transferable Visual Models From Natural Language Supervision	https://arxiv.org/pdf/2103.00020
2021	ViLT	ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision	https://arxiv.org/pdf/2102.03334
2021	GLIP	Grounded Language-Image Pre-training	https://arxiv.org/pdf/2112.03857
2021	Swin Transformer	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	https://arxiv.org/pdf/2103.14030
2021	MAE	Masked Autoencoders Are Scalable Vision Learners	https://arxiv.org/pdf/2111.06377
2021	MoCo	Momentum Contrast for Unsupervised Visual Representation Learning	https://arxiv.org/pdf/1911.05722
2021	MoCo 2	Improved Baselines with Momentum Contrastive Learning	https://arxiv.org/pdf/2003.04297
2021	MoCo 3	An Empirical Study of Training Self-Supervised Vision Transformers	https://arxiv.org/pdf/2104.02057
2022	BLIP	BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation	https://arxiv.org/pdf/2201.12086
2023	BLIP 2	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	https://arxiv.org/pdf/2301.12597
2023	Llama	LLaMA: Open and Efficient Foundation Language Models	https://arxiv.org/pdf/2302.13971
2023	Llama 2	Llama 2: Open Foundation and Fine-Tuned Chat Models	https://arxiv.org/pdf/2307.09288
2024	Llama 3	The Llama 3 Herd of Models	https://arxiv.org/pdf/2407.21783
2024	Sora	Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models	https://arxiv.org/pdf/2402.17177
2024	GPT-4	GPT-4 Technical Report	https://arxiv.org/pdf/2303.08774
2025	Qwen 3	Qwen3 Technical Report	https://arxiv.org/pdf/2505.09388
2025	DeepSeek-R1	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	https://arxiv.org/pdf/2501.12948

0comment(s)