Home » LLama2: Explaining Meta’s new llm llama2

LLama2: Explaining Meta’s new llm llama2

by Narnia
0 comment
LLama image created utilizing secure diffusion XL

This article is the second half discussing the Llama2 paper, extra particularly specializing in RLHF methods and learnings. If you haven’t checked out Part 1, yow will discover it right here.

The second stage of finetuning the bottom mannequin for instruction following is RLHF (Reinforcement Learning from Human Feedback).

RLHF is confirmed to be a brilliant essential approach to enhance the language mannequin’s capacity to unravel duties laid out in directions and to align them with human values like security and helpfulness. It was first launched by OpenAI when they aligned their base GPT3 mannequin to comply with directions.

Here’s an summary of what this process appears to be like like:

  1. Collecting human desire information, which incorporates comparisons between solutions to given prompts
  2. Training a Reward Model utilizing the collected human desire information to study to attain the helpfulness and security of a solution
  3. Use RL algorithms to coach the language mannequin on a big scale utilizing unlabelled information (a dataset of prompts) utilizing the Reward Model as a reward perform

Human desire information is a dataset that incorporates comparisons between language fashions solutions to a given immediate.

To accumulate the dataset, annotators have been requested to put in writing a immediate after which select the most effective reply between 2 responses from two sampled mannequin responses.

To maximize range, the 2 responses are generated by two completely different mannequin variants.

Annotators are requested to give attention to 2 important standards: security and helpfulness when selecting between mannequin solutions. They are additionally requested to gather a security label, which might be used later within the coaching step.

There was additionally a give attention to introducing adversarial (unsafe) prompts within the dataset utilizing a set of tips that guarantee range. Mainly, annotators are requested to create these prompts alongside 2 dimensions:

Risk Category: this dimension contains 3 classes, felony actions, hateful and dangerous actions and unqualified recommendation

Attack Vector: this dimension represents wherein manner a…

You may also like

Leave a Comment