SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065030
F. Hieber, S. Riezler
{"title":"Improved answer ranking in social question-answering portals","authors":"F. Hieber, S. Riezler","doi":"10.1145/2065023.2065030","DOIUrl":"https://doi.org/10.1145/2065023.2065030","url":null,"abstract":"Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality content challenging but all the more important. We present an approach to answer ranking and show the usefulness of features that explicitly model answer quality. Furthermore, we introduce the idea of leveraging snippets of web search results for query expansion in answer ranking. We present an evaluation setup that avoids spurious results reported in earlier work. Our results show the usefulness of our features and query expansion techniques, and point to the importance of regularization when learning from noisy data.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117123297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065033
M. Atzmüller
{"title":"Analysis of communities in social media","authors":"M. Atzmüller","doi":"10.1145/2065023.2065033","DOIUrl":"https://doi.org/10.1145/2065023.2065033","url":null,"abstract":"Social media have already woven themselves into the very fabric of everyday life. There are a variety of applications and associated computational social systems. Furthermore, we observe the emergence into more mobile and ubiquitous applications. Various social applications provide for a broad range of user interaction and communication. In this setting, data mining and analysis plays a central role, e.g., for automatically detecting associations and relationships, and identifying interesting topics. In particular, in this talk I will consider the discovery and analysis of communities, e.g., concerning users and user-generated content. Such communities can be applied, for example, for personalization or generating recommendations. However, while there exists a range of community mining options, a thorough evaluation and assessment typically relies on existing gold-standard data or costly user-studies.\u0000 This talk presents approaches for the analysis of communities and descriptive patterns in social media. Methods for mining and assessing communities and descriptive patterns will be introduced. The proposed analysis methodology provides for a cost-efficient approach for identifying descriptive and user-interpretable communities, since the assessment is performed using secondary data that is easy to acquire.\u0000 In this talk, I will provide examples for the presented analysis techniques using social data from real-world systems. In particular, I will focus on data from the social bookmarking system BibSonomy (http://www.bibsonomy.org), and from the social conference guidance system Conferator (http://www.conferator.org).","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128570702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065031
Enrique Vallés, Paolo Rosso
{"title":"Detection of near-duplicate user generated contents: the SMS spam collection","authors":"Enrique Vallés, Paolo Rosso","doi":"10.1145/2065023.2065031","DOIUrl":"https://doi.org/10.1145/2065023.2065031","url":null,"abstract":"Today, the number of spam text messages has grown in number, mainly because companies are looking for free advertising. For the users is very important to filter these kinds of spam messages that can be viewed as near-duplicate texts because mostly created from templates. The identification of spam text messages is a very hard and time-consuming task and it involves to carefully scanning hundreds of text messages. Therefore, since the task of near-duplicate detection can be seen as a specific case of plagiarism detection, we investigated whether plagiarism detection tools could be used as filters for spam text messages. Moreover we solve the near-duplicate detection problem on the basis of a clustering approach using CLUTO framework. We carried out some preliminary experiments on the SMS Spam Collection that recently was made available for research purposes. The results were compared with the ones obtained with the CLUTO. Althought plagiarism detection tools detect a good number of near-duplicate SMS spam messages even better results are obtained with the CLUTO clustering tool.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123685164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065035
Claudia Peersman, Walter Daelemans, L. V. Vaerenbergh
{"title":"Predicting age and gender in online social networks","authors":"Claudia Peersman, Walter Daelemans, L. V. Vaerenbergh","doi":"10.1145/2065023.2065035","DOIUrl":"https://doi.org/10.1145/2065023.2065035","url":null,"abstract":"A common characteristic of communication on online social networks is that it happens via short messages, often using non-standard language variations. These characteristics make this type of text a challenging text genre for natural language processing. Moreover, in these digital communities it is easy to provide a false name, age, gender and location in order to hide one's true identity, providing criminals such as pedophiles with new possibilities to groom their victims. It would therefore be useful if user profiles can be checked on the basis of text analysis, and false profiles flagged for monitoring. This paper presents an exploratory study in which we apply a text categorization approach for the prediction of age and gender on a corpus of chat texts, which we collected from the Belgian social networking site Netlog. We examine which types of features are most informative for a reliable prediction of age and gender on this difficult text type and perform experiments with different data set sizes in order to acquire more insight into the minimum data size requirements for this task.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133148632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065039
Sheila Kinsella, Vanessa Murdock, Neil O'Hare
{"title":"\"I'm eating a sandwich in Glasgow\": modeling locations with tweets","authors":"Sheila Kinsella, Vanessa Murdock, Neil O'Hare","doi":"10.1145/2065023.2065039","DOIUrl":"https://doi.org/10.1145/2065023.2065039","url":null,"abstract":"Social media such as Twitter generate large quantities of data about what a person is thinking and doing in a particular location. We leverage this data to build models of locations to improve our understanding of a user's geographic context. Understanding the user's geographic context can in turn enable a variety of services that allow us to present information, recommend businesses and services, and place advertisements that are relevant at a hyper-local level.\u0000 In this paper we create language models of locations using coordinates extracted from geotagged Twitter data. We model locations at varying levels of granularity, from the zip code to the country level. We measure the accuracy of these models by the degree to which we can predict the location of an individual tweet, and further by the accuracy with which we can predict the location of a user. We find that we can meet the performance of the industry standard tool for predicting both the tweet and the user at the country, state and city levels, and far exceed its performance at the hyper-local level, achieving a three- to ten-fold increase in accuracy at the zip code level.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127023942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065042
Giacomo Inches, A. Basso, F. Crestani
{"title":"On the generation of rich content metadata from social media","authors":"Giacomo Inches, A. Basso, F. Crestani","doi":"10.1145/2065023.2065042","DOIUrl":"https://doi.org/10.1145/2065023.2065042","url":null,"abstract":"This contribution proposes a framework to generate auxiliary rich TV content metadata by processing social networks data. Based on simple criteria to identify authoritative social media sources, we have analysed Twitter short messages relative to TV program content and devised a method to compute their informative value. We have extracted dozen of features and characterized such social data in terms of quality and relevancy. This is a first step towards integrating relevant social media information to enhance the description of TV content as well as for generating recommendations based on social data.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121390775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065040
D. Correa, A. Sureka
{"title":"Mining tweets for tag recommendation on social media","authors":"D. Correa, A. Sureka","doi":"10.1145/2065023.2065040","DOIUrl":"https://doi.org/10.1145/2065023.2065040","url":null,"abstract":"Automatic tag recommendation or annotation can help in improving the efficiency of text-based information retrieval on online social media services like Blogger, Last.FM, Flickr and YouTube. In this work, we investigate alternate solutions for tag recommendations by employing a Wisdom of Crowd approach in a mashup framework. In particular, we mine tweets on Twitter and use their hashtag(s) and content to annotate videos on Flickr, Photobucket, YouTube, Dailymotion and SoundCloud. We crawl Twitter to collect a random sample of tweets containing Flickr, Photo- bucket, YouTube, Dailymotion and SoundCloud URLs. We then recommend tags for these services using hashtag(s) and content present in tweets. We use a hybrid technique (automated and manual) to validate our results on different subsets (presence / absence of hashtags, presence / absence of media tags) of data. Experimental results demonstrate that the proposed solution approach is effective and reliable.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125885980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065036
Guangyu Wu, Martin Harrigan, P. Cunningham
{"title":"Characterizing Wikipedia pages using edit network motif profiles","authors":"Guangyu Wu, Martin Harrigan, P. Cunningham","doi":"10.1145/2065023.2065036","DOIUrl":"https://doi.org/10.1145/2065023.2065036","url":null,"abstract":"Good Wikipedia articles are authoritative sources due to the collaboration of a number of knowledgeable contributors. This is the many eyes idea. The edit network associated with a Wikipedia article can tell us something about its quality or authoritativeness. In this paper we explore the hypothesis that the characteristics of this edit network are predictive of the quality of the corresponding article's content. We characterize the edit network using a profile of network motifs and we show that this network motif profile is predictive of the Wikipedia quality classes assigned to articles by Wikipedia editors. We further show that the network motif profile can identify outlier articles particularly in the 'Featured Article' class, the highest Wikipedia quality class.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123634157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065027
Olga Streibel, R. Alnemr
{"title":"Trend-based and reputation-versed personalized news network","authors":"Olga Streibel, R. Alnemr","doi":"10.1145/2065023.2065027","DOIUrl":"https://doi.org/10.1145/2065023.2065027","url":null,"abstract":"Web users while collaborating over social networks and micro-blogging services also contribute to news coverage worldwide. News feeds come from mainstream media as well as from social networks. Often feeds from social networks are more up-to-date and, for user's view, more credible than those that come from mainstream media. But the overwhelming amount of information requires to personally filter through it until one gets what is really needed. In this paper, we describe our idea of a personalized news network built on current Web technologies and our research projects by filtering Twitter and Facebook messages using both trend mining and reputation approaches. Based on the example of Egyptian revolution, we explain the main idea of personalized news.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128229361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SMUC '11Pub Date : 2011-10-28DOI: 10.1145/2065023.2065028
Alexandra Roshchina, J. Cardiff, Paolo Rosso
{"title":"A comparative evaluation of personality estimation algorithms for the twin recommender system","authors":"Alexandra Roshchina, J. Cardiff, Paolo Rosso","doi":"10.1145/2065023.2065028","DOIUrl":"https://doi.org/10.1145/2065023.2065028","url":null,"abstract":"The appearance of the so-called recommender systems has led to the possibility of reducing the information overload experienced by individuals searching among online resources. One of the areas of application of recommender systems is the online tourism domain where sites like TripAdvisor allow people to post reviews of various hotels to help others make a good choice when planning their trip. As the number of such reviews grows in size every day, clearly it is impractical for the individual to go through all of them. We propose the TWIN (\"Tell me What I Need\") Personality-based Recommender System that analyzes the textual content of the reviews and estimates the personality of the user according to the Big Five model to suggest the reviews written by \"twin-minded\" people. In this paper we compare a number of algorithms to select the better option for personality estimation in the task of user profile construction.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}