4:00 pm Wednesday, February 21, 2018
Random Structures Seminar : Regret of Queueing Bandits by Sanjay Shakkottai (UT Austin) in RLM 11.176
We consider a variant of the multiarmed bandit (MAB) problem where jobs or tasks queue for service, and service rates of different servers (agents) may be unknown. Such (queueing+learning) problems are motivated by a vast range of service systems, including supply and demand in online platforms (e.g., Uber, Lyft, Airbnb, Upwork, etc.), order flow in financial markets (e.g., limit order books), communication systems, and supply chains. We study algorithms that minimize queue-regret: the expected difference between the queue-lengths (backlogs) obtained by the algorithm, and those obtained by a genie-aided matching algorithm that knows exact service rates. A naive view of this problem would suggest that queue-regret could grow logarithmically: since queue-regret cannot be larger than classical regret, results for the standard MAB problem give algorithms that ensure queue-regret increases no more than logarithmically in time. Our work shows surprisingly more complex behavior — specifically, the optimal queue-regret decreases with time and scales as O(1/t). We next consider holding-cost regret in multi-class (multiple types of tasks) multi-server (servers/agents have task-type dependent service rate) systems. Holding costs correspond to a system where a linear cost (with respect to time spent in the queue) is incurred for each incomplete task. We consider learning-based variants of the c-mu rule – a classic and well-studied scheduling policy that is used when server/agent service rates are known. We develop algorithms that result in constant expected holding-cost regret (independent of time). The key insight that allows such a regret bound is that service systems we consider exhibit explore-free learning, where no penalty is (eventually) incurred for exploring and learning server/agent rates. We finally discuss the implications of our results on building platforms for matching tasks to servers/agents. Base on joint work with Subhashini Krishnasamy, Rajat Sen, Ari Arapostathis and Ramesh Johari.
Submitted by