Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch
Problem The paper addresses the challenge of adapting dispatch objective weights in three-sided marketplaces, specifically in the context of food delivery services like DoorDash. Traditional reinforcement learning approaches often struggle...