Research Roadmap: RLHF & Alignment
From InstructGPT to DPO to ORPO. Read the 7 most important alignment papers in order — understanding how LLMs are made to follow human intent.
From InstructGPT to DPO to ORPO. Read the 7 most important alignment papers in order — understanding how LLMs are made to follow human intent.