Personalized web services strive to adapt their services
(advertisements, news articles, etc.) to individual users by making use
of both content and user information. Despite a few recent advances,
this problem remains challenging for at least two reasons. First, web
service is
featured with dynamically changing pools of content, rendering
traditional collaborative filtering methods inapplicable. Second, the
scale of most web services of practical interest calls for solutions
that are both fast in learning and computation. In this work, we model
personalized recommendation of news articles
as a contextual bandit problem, a principled approach in which a
learning algorithm sequentially selects articles to serve users based
on contextual information about the users and articles, while
simultaneously adapting its article-selection strategy based on
user-click feedback to maximize total user clicks. [pdf]