{"title":"推荐系统的离线评估:所有的痛苦都没有收获?","authors":"M. Levy","doi":"10.1145/2532508.2532509","DOIUrl":null,"url":null,"abstract":"A large-scale offline evaluation -- with a big money prize attached -- established recommender systems as a niche discipline worth researching, and one where robust and reproducible experiments would be easy. But since then critiques within academia have shown up shortcomings in the most appealingly objective evaluation metrics, war stories from the commercial front line have suggested that correlation between offline metrics and bottom line gains in production may be non-existent, and several subsequent academic competitions have come under fierce criticism from both advisors and participants.\n In this talk I will draw on practical experience at Last.fm and Mendeley, as well as insights from others, to offer some opinions about offline evaluation of recommender systems: whether we still need it all, what value we can hope to draw from it, how best to do it if we have to, and how to make the experience less painful than it is right now.","PeriodicalId":398648,"journal":{"name":"RepSys '13","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Offline evaluation of recommender systems: all pain and no gain?\",\"authors\":\"M. Levy\",\"doi\":\"10.1145/2532508.2532509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A large-scale offline evaluation -- with a big money prize attached -- established recommender systems as a niche discipline worth researching, and one where robust and reproducible experiments would be easy. But since then critiques within academia have shown up shortcomings in the most appealingly objective evaluation metrics, war stories from the commercial front line have suggested that correlation between offline metrics and bottom line gains in production may be non-existent, and several subsequent academic competitions have come under fierce criticism from both advisors and participants.\\n In this talk I will draw on practical experience at Last.fm and Mendeley, as well as insights from others, to offer some opinions about offline evaluation of recommender systems: whether we still need it all, what value we can hope to draw from it, how best to do it if we have to, and how to make the experience less painful than it is right now.\",\"PeriodicalId\":398648,\"journal\":{\"name\":\"RepSys '13\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"RepSys '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2532508.2532509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"RepSys '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532508.2532509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Offline evaluation of recommender systems: all pain and no gain?
A large-scale offline evaluation -- with a big money prize attached -- established recommender systems as a niche discipline worth researching, and one where robust and reproducible experiments would be easy. But since then critiques within academia have shown up shortcomings in the most appealingly objective evaluation metrics, war stories from the commercial front line have suggested that correlation between offline metrics and bottom line gains in production may be non-existent, and several subsequent academic competitions have come under fierce criticism from both advisors and participants.
In this talk I will draw on practical experience at Last.fm and Mendeley, as well as insights from others, to offer some opinions about offline evaluation of recommender systems: whether we still need it all, what value we can hope to draw from it, how best to do it if we have to, and how to make the experience less painful than it is right now.