research 9 min read
How DPO Cuts Text Degeneration by 59% Without Retraining From Scratch
DharmaOCR's methodology proves Direct Preference Optimization isn't just for chat alignment. Applied after supervised fine-tuning, DPO reduced text degeneration by an average of 59.4% across five vision-language model families—with zero exceptions.
Dr. Sana Okafor