Lil'Log (Lilian Weng)

51 articles archived · Visit source ↗ · RSS

Lil'Log (Lilian Weng) research 12mo ago

Why We Think

Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute ( Graves et al. 2016 , Ling, et al. 2017 , Cobbe et al. 2021 ) and Chain-of-thought (CoT) ( Wei et al. 2022 , Nye et al. 2021 ), have led to significant…

25
Lil'Log (Lilian Weng) research 17mo ago

Reward Hacking in Reinforcement Learning

Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task. Reward hacking exists because RL environments are often imperfect, and it is…

26
Lil'Log (Lilian Weng) research 22mo ago

Extrinsic Hallucinations in LLMs

Hallucination in large language models usually refers to the model generating unfaithful, fabricated, inconsistent, or nonsensical content. As a term, hallucination has been somewhat generalized to cases when the model makes mistakes. Here, I would like to narrow down the…

4
Lil'Log (Lilian Weng) research 25mo ago

Diffusion Models for Video Generation

Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it…

12
Lil'Log (Lilian Weng) research 27mo ago

Thinking about High-Quality Human Data

[Special thank you to Ian Kivlichan for many useful pointers (E.g. the 100+ year old Nature paper “Vox populi”) and nice feedback. 🙏 ] High-quality data is the fuel for modern data deep learning model training. Most of the task-specific labeled data comes from human…

23
Lil'Log (Lilian Weng) research 31mo ago

Adversarial Attacks on LLMs

The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF ).…

5
Lil'Log (Lilian Weng) research 35mo ago

LLM Powered Autonomous Agents

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT , GPT-Engineer and BabyAGI , serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories,…

26
Lil'Log (Lilian Weng) research 38mo ago

Prompt Engineering

Prompt Engineering , also known as In-Context Prompting , refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among…

4
Lil'Log (Lilian Weng) research 40mo ago

The Transformer Family Version 2.0

Many new Transformer architecture improvements have been proposed since my last post on “The Transformer Family” about three years ago. Here I did a big refactoring and enrichment of that 2020 post — restructure the hierarchy of sections and improve many…

14
Lil'Log (Lilian Weng) research 40mo ago

Large Transformer Model Inference Optimization

[Updated on 2023-01-24: add a small section on Distillation .] Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in both time and memory, is a…

30
Lil'Log (Lilian Weng) research 44mo ago

Some Math behind Neural Tangent Kernel

Neural networks are well known to be over-parameterized and can often easily fit data with near-zero training loss with decent generalization performance on test dataset. Although all these parameters are initialized at random, the optimization process can consistently lead to…

38
Lil'Log (Lilian Weng) research 47mo ago

Generalized Visual Language Models

Processing images to generate text, such as image captioning and visual question-answering, has been studied for years. Traditionally such systems rely on an object detection network as a vision encoder to capture visual features and then produce text via a text decoder. Given a…

7
Lil'Log (Lilian Weng) research 49mo ago

Learning with not Enough Data Part 3: Data Generation

Here comes the Part 3 on learning with not enough data (Previous: Part 1 and Part 2 ). Let’s consider two approaches for generating synthetic data for training. Augmented data . Given a set of existing training samples, we can apply a variety of augmentation, distortion and…

8
Lil'Log (Lilian Weng) research 51mo ago

Learning with not Enough Data Part 2: Active Learning

This is part 2 of what to do when facing a limited amount of labeled data for supervised learning tasks. This time we will get some amount of human labeling work involved, but within a budget limit, and therefore we need to be smart when selecting which samples to label.

29
Lil'Log (Lilian Weng) research 54mo ago

Learning with not Enough Data Part 1: Semi-Supervised Learning

When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed.

24
Lil'Log (Lilian Weng) research 56mo ago

How to Train Really Large Models on Many GPUs?

[Updated on 2022-03-13: add expert choice routing .] [Updated on 2022-06-10]: Greg and I wrote a shorted and upgraded version of this post, published on OpenAI Blog: “Techniques for Training Large Neural Networks”

11
Lil'Log (Lilian Weng) research 58mo ago

What are Diffusion Models?

[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-free guidance , GLIDE , unCLIP and Imagen . [Updated on 2022-08-31: Added latent…

36
Lil'Log (Lilian Weng) research 60mo ago

Contrastive Representation Learning

The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart. Contrastive learning can be applied to both supervised and unsupervised settings. When working with…

37
Lil'Log (Lilian Weng) research 62mo ago

Reducing Toxicity in Language Models

Large pretrained language models are trained over a sizable collection of online data. They unavoidably acquire certain toxic behavior and biases from the Internet. Pretrained language models are very powerful and have shown great success in many NLP tasks. However, to safely…

12
Lil'Log (Lilian Weng) research 65mo ago

Controllable Neural Text Generation

[Updated on 2021-02-01: Updated to version 2.0 with several work added and many typos fixed.] [Updated on 2021-05-26: Add P-tuning and Prompt Tuning in the “prompt design” section.] [Updated on 2021-09-19: Add “unlikelihood training” .]

33
Lil'Log (Lilian Weng) research 67mo ago

How to Build an Open-Domain Question Answering System?

[Updated on 2020-11-12: add an example on closed-book factual QA using OpenAI API (beta). A model that can answer any question with regard to factual knowledge can lead to many useful and practical applications, such as working as a chatbot or an AI assistant🤖. In this post, we…

30
Lil'Log (Lilian Weng) research 70mo ago

Neural Architecture Search

Although most popular and successful model architectures are designed by human experts, it doesn’t mean we have explored the entire network architecture space and settled down with the best option. We would have a better chance to find the optimal solution if we adopt a…

20
Lil'Log (Lilian Weng) research 72mo ago

Exploration Strategies in Deep Reinforcement Learning

[Updated on 2020-06-17: Add “exploration via disagreement” in the “Forward Dynamics” section . Exploitation versus exploration is a critical topic in Reinforcement Learning. We’d like the RL agent to find the best solution as fast as possible.…

27
Lil'Log (Lilian Weng) research 74mo ago

The Transformer Family

[Updated on 2023-01-27 : After almost three years, I did a big refactoring update of this post to incorporate a bunch of new Transformer models since 2020. The enhanced version of this post is here: The Transformer Family Version 2.0 . Please refer to that post on this topic.]

22
Lil'Log (Lilian Weng) research 76mo ago

Curriculum for Reinforcement Learning

[Updated on 2020-02-03: mentioning PCG in the “Task-Specific Curriculum” section. [Updated on 2020-02-04: Add a new “curriculum through distillation” section.

6
Lil'Log (Lilian Weng) research 79mo ago

Self-Supervised Representation Learning

[Updated on 2020-01-09: add a new section on Contrastive Predictive Coding ]. [Updated on 2020-04-13: add a “Momentum Contrast” section on MoCo, SimCLR and CURL.] [Updated on 2020-07-08: add a “Bisimulation” section on DeepMDP and DBC.] [Updated on…

5
Lil'Log (Lilian Weng) research 81mo ago

Evolution Strategies

Stochastic gradient descent is a universal choice for optimizing deep learning models. However, it is not the only option. With black-box optimization algorithms, you can evaluate a target function $f(x): \mathbb{R}^n \to \mathbb{R}$, even when you don’t know the precise…

11
Lil'Log (Lilian Weng) research 83mo ago

Meta Reinforcement Learning

In my earlier post on meta-learning , the problem is mainly defined in the context of few-shot classification. Here I would like to explore more into cases when we try to “meta-learn” Reinforcement Learning (RL) tasks by developing an agent that can solve unseen…

12
Lil'Log (Lilian Weng) research 85mo ago

Domain Randomization for Sim2Real Transfer

In Robotics, one of the hardest problems is how to make your model transfer to the real world. Due to the sample inefficiency of deep RL algorithms and the cost of data collection on real robots, we often need to train models in a simulator which theoretically provides an…

9
Lil'Log (Lilian Weng) research 87mo ago

Are Deep Neural Networks Dramatically Overfitted?

[Updated on 2019-05-27: add the section on Lottery Ticket Hypothesis.] If you are like me, entering into the field of deep learning with experience in traditional machine learning, you may often ponder over this question: Since a typical deep neural network has so many…

38
Lil'Log (Lilian Weng) research 88mo ago

Generalized Language Models

[Updated on 2019-02-14: add ULMFiT and GPT-2 .] [Updated on 2020-02-29: add ALBERT .] [Updated on 2020-10-25: add RoBERTa .] [Updated on 2020-12-13: add T5 .] [Updated on 2020-12-30: add GPT-3 .] [Updated on 2021-11-13: add XLNet , BART and ELECTRA ; Also updated the Summary…

6
Lil'Log (Lilian Weng) research 89mo ago

Object Detection Part 4: Fast Detection Models

In Part 3 , we have reviewed models in the R-CNN family. All of them are region-based object detection algorithms. They can achieve high accuracy but could be too slow for certain applications such as autonomous driving. In Part 4, we only focus on fast object detection models,…

17
Lil'Log (Lilian Weng) research 90mo ago

Meta-Learning: Learning to Learn Fast

[Updated on 2019-10-01: thanks to Tianhao, we have this post translated in Chinese !]

4
Lil'Log (Lilian Weng) research 92mo ago

Flow-based Deep Generative Models

So far, I’ve written about two types of generative models, GAN and VAE . Neither of them explicitly learns the probability density function of real data, $p(\mathbf{x})$ (where $\mathbf{x} \in \mathcal{D}$) — because it is really hard! Taking the generative model…

38
Lil'Log (Lilian Weng) research 94mo ago

From Autoencoder to Beta-VAE

[Updated on 2019-07-18: add a section on VQ-VAE & VQ-VAE-2 .] [Updated on 2019-07-26: add a section on TD-VAE .] Autocoder is invented to reconstruct high-dimensional data using a neural network model with a narrow bottleneck layer in the middle (oops, this is probably not true…

24
Lil'Log (Lilian Weng) research 96mo ago

Attention? Attention!

[Updated on 2018-10-28: Add Pointer Network and the link to my implementation of Transformer.] [Updated on 2018-11-06: Add a link to the implementation of Transformer model.] [Updated on 2018-11-18: Add Neural Turing Machines .] [Updated on 2019-07-18: Correct the mistake on…

18
Lil'Log (Lilian Weng) research 97mo ago

Implementing Deep Reinforcement Learning Models with Tensorflow + OpenAI Gym

The full implementation is available in lilianweng/deep-reinforcement-learning-gym In the previous two posts, I have introduced the algorithms of many deep reinforcement learning models. Now it is the time to get our hands dirty and practice how to implement the models in the…

6
Lil'Log (Lilian Weng) research 98mo ago

Policy Gradient Algorithms

[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG .] [Updated on 2018-09-30: add a new policy gradient method, TD3 .] [Updated on 2019-02-09: add SAC with automatically adjusted temperature ]. [Updated on 2019-06-26: Thanks to Chanseok, we have a version…

20
Lil'Log (Lilian Weng) research 100mo ago

A (Long) Peek into Reinforcement Learning

[Updated on 2020-09-03: Updated the algorithm of SARSA and Q-learning so that the difference is more pronounced. [Updated on 2021-09-19: Thanks to 爱吃猫的鱼, we have this post in Chinese ].

9
Lil'Log (Lilian Weng) research 101mo ago

The Multi-Armed Bandit Problem and Its Solutions

The algorithms are implemented for Bernoulli bandit in lilianweng/multi-armed-bandit . Exploitation vs Exploration The exploration vs exploitation dilemma exists in many aspects of our life. Say, your favorite restaurant is right around the corner. If you go there every day, you…

5
Lil'Log (Lilian Weng) research 101mo ago

Object Detection for Dummies Part 3: R-CNN Family

[Updated on 2018-12-20: Remove YOLO here. Part 4 will cover multiple fast object detection algorithms, including YOLO.] [Updated on 2018-12-27: Add bbox regression and tricks sections for R-CNN.] In the series of “Object Detection for Dummies”, we started with basic…

6
Lil'Log (Lilian Weng) research 102mo ago

Object Detection for Dummies Part 2: CNN, DPM and Overfeat

Part 1 of the “Object Detection for Dummies” series introduced: (1) the concept of image gradient vector and how HOG algorithm summarizes the information across all the gradient vectors in one image; (2) how the image segmentation algorithm works to detect regions…

4
Lil'Log (Lilian Weng) research 103mo ago

Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS

I’ve never worked in the field of computer vision and has no idea how the magic could work when an autonomous car is configured to tell apart a stop sign from a pedestrian in a red hat. To motivate myself to look into the maths behind object recognition and detection…

29
Lil'Log (Lilian Weng) research 104mo ago

Learning Word Embedding

Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each…

20
Lil'Log (Lilian Weng) research 104mo ago

Anatomize Deep Learning with Information Theory

Professor Naftali Tishby passed away in 2021. Hope the post can introduce his cool idea of information bottleneck to more people. Recently I watched the talk “Information Theory in Deep Learning” by Prof Naftali Tishby and found it very interesting. He presented how…

33
Lil'Log (Lilian Weng) research 106mo ago

From GAN to WGAN

[Updated on 2018-09-30: thanks to Yoonju, we have this post translated in Korean !] [Updated on 2019-04-18: this post is also available on arXiv .] Generative adversarial network (GAN) has shown great results in many generative tasks to replicate the real-world rich content such…

4
Lil'Log (Lilian Weng) research 106mo ago

How to Explain the Prediction of a Machine Learning Model?

The machine learning models have started penetrating into critical areas like health care, justice systems, and financial industry. Thus to figure out how the models make the decisions and make sure the decisioning process is aligned with the ethnic requirements or legal…

22
Lil'Log (Lilian Weng) research 107mo ago

Predict Stock Prices Using RNN: Part 2

In the Part 2 tutorial, I would like to continue the topic on stock price prediction and to endow the recurrent neural network that I have built in Part 1 with the capability of responding to multiple stocks. In order to distinguish the patterns associated with different price…

37
Lil'Log (Lilian Weng) research 107mo ago

Predict Stock Prices Using RNN: Part 1

This is a tutorial for how to build a recurrent neural network using Tensorflow to predict stock market prices. The full working code is available in github.com/lilianweng/stock-rnn . If you don’t know what is recurrent neural network or LSTM cell, feel free to check my…

13
Lil'Log (Lilian Weng) research 108mo ago

An Overview of Deep Learning for Curious People

(The post was originated from my talk for WiMLDS x Fintech meetup hosted by Affirm .) I believe many of you have watched or heard of the games between AlphaGo and professional Go player Lee Sedol in 2016. Lee has the highest rank of nine dan and many world championships. No…

17
Lil'Log (Lilian Weng) research 308mo ago

FAQ

Q: How can I get an update when a new post comes out? A: I post about my new post on this Twitter @lilianweng account. There is also a RSS feed . Q: What tool do you use for plotting? A: I'm using Google Presentation (cloud version of PowerPoint). Q: What if I see something…

5

Why We Think

Reward Hacking in Reinforcement Learning

Extrinsic Hallucinations in LLMs

Diffusion Models for Video Generation

Thinking about High-Quality Human Data

Adversarial Attacks on LLMs

LLM Powered Autonomous Agents

Prompt Engineering

The Transformer Family Version 2.0

Large Transformer Model Inference Optimization

Some Math behind Neural Tangent Kernel

Generalized Visual Language Models

Learning with not Enough Data Part 3: Data Generation

Learning with not Enough Data Part 2: Active Learning

Learning with not Enough Data Part 1: Semi-Supervised Learning

How to Train Really Large Models on Many GPUs?

What are Diffusion Models?

Contrastive Representation Learning

Reducing Toxicity in Language Models

Controllable Neural Text Generation

How to Build an Open-Domain Question Answering System?

Neural Architecture Search

Exploration Strategies in Deep Reinforcement Learning

The Transformer Family

Curriculum for Reinforcement Learning

Self-Supervised Representation Learning

Evolution Strategies

Meta Reinforcement Learning

Domain Randomization for Sim2Real Transfer

Are Deep Neural Networks Dramatically Overfitted?

Generalized Language Models

Object Detection Part 4: Fast Detection Models

Meta-Learning: Learning to Learn Fast

Flow-based Deep Generative Models

From Autoencoder to Beta-VAE

Attention? Attention!

Implementing Deep Reinforcement Learning Models with Tensorflow &#43; OpenAI Gym

Policy Gradient Algorithms

A (Long) Peek into Reinforcement Learning

The Multi-Armed Bandit Problem and Its Solutions

Object Detection for Dummies Part 3: R-CNN Family

Object Detection for Dummies Part 2: CNN, DPM and Overfeat

Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS

Learning Word Embedding

Anatomize Deep Learning with Information Theory

From GAN to WGAN

How to Explain the Prediction of a Machine Learning Model?

Predict Stock Prices Using RNN: Part 2

Predict Stock Prices Using RNN: Part 1

An Overview of Deep Learning for Curious People

FAQ

Implementing Deep Reinforcement Learning Models with Tensorflow + OpenAI Gym