Kog

1 primary articles · 0 secondary mentions

Primary coverage

Notable research Kog

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

Problem This preprint addresses the gap in real-time inference capabilities for large language models (LLMs) on standard GPUs, specifically focusing on single-request decoding speed. Existing inference software stacks are not...

May 29, 2026