Target performance delta for verified improvements with a new prompt
Ethereum popularized the concept of smart contracts, allowing for the creation of DApps. A smart contract is deterministic code that runs on the blockchain. When interacted with, it ALWAYS executes. Solidity is a Turing complete language, but Ethereum’s gas limits prevent indefinite execution (ex : infinite loops).
Tokens & Different kinds of tokens:
Direct evaluation of model bias on CEB (crafted dataset). Direct evaluation uses recognition (yes/no) and selection(more steretypical or toxic of two).
What kinds of architectural variations exist :
Pre vs Post-Norm: Determines whether Layer Normalization is applied before or after the residual connection within a transformer block. Most models, such as LLaMA and GPT-4, use pre-norm, as it improves training stability, especially for deeper networks. Post-norm (as seen in early transformer models) can suffer from unstable gradients.
Layer-Norm Type:
Number of layers, token dimensionality, number of attention heads:
FFN Activation Function: Determines non-linearity within the feedforward networks inside transformer layers.
Training vs Fine-tuning:
Alignment: Ensures the model adheres to human preferences and ethical considerations.
Reinforcement Learning from Human Feedback (RLHF):
[CLS]
to represent input.RLHF Summary
Training at Scale:
Distributed Training:
Pipeline Parallelism
LLM Scaling Law:
Scaling law in LLMs