ByteDance study finds that asking LMMs questions beats making it transcribe text for long document training
Problem This preprint addresses the limitations of traditional long document training methods for language models (LMs), particularly the inefficacy of transcription-based approaches when dealing with lengthy, image-heavy documents. Existing literature...