iTrack:将用户活动与系统数据关联起来

V. Mann, Anilkumar Vishnoi
{"title":"iTrack:将用户活动与系统数据关联起来","authors":"V. Mann, Anilkumar Vishnoi","doi":"10.1109/NOMS.2012.6212031","DOIUrl":null,"url":null,"abstract":"Human error has been identified one of the major factors behind system outages and network downtime in a number of previous research papers and surveys. Gartner statistics show that almost 40% of unplanned application downtime is caused due to operator errors such as unintentional changes to network configuration resulting in a network outage, patch installations, service restart, etc. Yet, system admin activities on production IT systems are rarely properly logged and monitored. Existing tools to track user activities either produce too much information without any hints of a potential outage scenario or too little information to be useful in a meaningful way. In this paper, we describe the design and implementation of iTrack - a framework for monitoring user activities and correlating them with system data. iTrack makes use of commonly available native monitoring and diagnostic utilities on operating systems to monitor systems events as well as system admin activity, correlates these two sets of information and categorizes the activity as potentially abnormal or harmful based on its impact on the system in terms of file system, network and process activities. We demonstrate the usefulness of iTrack through several use cases and real world examples such as detecting and diagnosing system outages in real time, conducting post mortem analysis of outages, and maintaining audit logs. Our experimental evaluation of iTrack confirms that its monitoring overhead in terms of CPU time, activity completion time and data generated is within the tolerance range of most production systems. In cases, where the overhead was found to be unacceptable, we detect the underlying cause and provide solutions. These solutions improve performance by up to 20% to 90%, in terms of managed server and iTrack server CPU utilization, respectively and by up to 2 times in terms of completion time of certain system admin activities on the managed server.","PeriodicalId":364494,"journal":{"name":"2012 IEEE Network Operations and Management Symposium","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"iTrack: Correlating user activity with system data\",\"authors\":\"V. Mann, Anilkumar Vishnoi\",\"doi\":\"10.1109/NOMS.2012.6212031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human error has been identified one of the major factors behind system outages and network downtime in a number of previous research papers and surveys. Gartner statistics show that almost 40% of unplanned application downtime is caused due to operator errors such as unintentional changes to network configuration resulting in a network outage, patch installations, service restart, etc. Yet, system admin activities on production IT systems are rarely properly logged and monitored. Existing tools to track user activities either produce too much information without any hints of a potential outage scenario or too little information to be useful in a meaningful way. In this paper, we describe the design and implementation of iTrack - a framework for monitoring user activities and correlating them with system data. iTrack makes use of commonly available native monitoring and diagnostic utilities on operating systems to monitor systems events as well as system admin activity, correlates these two sets of information and categorizes the activity as potentially abnormal or harmful based on its impact on the system in terms of file system, network and process activities. We demonstrate the usefulness of iTrack through several use cases and real world examples such as detecting and diagnosing system outages in real time, conducting post mortem analysis of outages, and maintaining audit logs. Our experimental evaluation of iTrack confirms that its monitoring overhead in terms of CPU time, activity completion time and data generated is within the tolerance range of most production systems. In cases, where the overhead was found to be unacceptable, we detect the underlying cause and provide solutions. These solutions improve performance by up to 20% to 90%, in terms of managed server and iTrack server CPU utilization, respectively and by up to 2 times in terms of completion time of certain system admin activities on the managed server.\",\"PeriodicalId\":364494,\"journal\":{\"name\":\"2012 IEEE Network Operations and Management Symposium\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Network Operations and Management Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NOMS.2012.6212031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Network Operations and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOMS.2012.6212031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在之前的许多研究论文和调查中,人为错误已被确定为系统中断和网络停机背后的主要因素之一。Gartner的统计数据显示,几乎40%的计划外应用程序停机是由于运营商的错误造成的,例如无意中更改网络配置导致网络中断、安装补丁、服务重启等。然而,生产IT系统上的系统管理活动很少被正确地记录和监视。现有的跟踪用户活动的工具要么产生过多的信息,而没有提示潜在的中断场景,要么产生的信息太少,无法以有意义的方式发挥作用。在本文中,我们描述了iTrack的设计和实现,iTrack是一个用于监控用户活动并将其与系统数据相关联的框架。iTrack利用操作系统上常见的本机监视和诊断实用程序来监视系统事件和系统管理活动,将这两组信息关联起来,并根据其对文件系统、网络和进程活动的影响将活动分类为潜在的异常或有害活动。我们通过几个用例和真实世界的示例来演示iTrack的有用性,例如实时检测和诊断系统中断、对中断进行事后分析以及维护审计日志。我们对iTrack的实验评估证实,它在CPU时间、活动完成时间和生成的数据方面的监控开销在大多数生产系统的容忍范围内。在发现开销不可接受的情况下,我们会检测潜在的原因并提供解决方案。就托管服务器和iTrack服务器的CPU利用率而言,这些解决方案分别将性能提高了20%到90%,并将托管服务器上某些系统管理活动的完成时间提高了2倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
iTrack: Correlating user activity with system data
Human error has been identified one of the major factors behind system outages and network downtime in a number of previous research papers and surveys. Gartner statistics show that almost 40% of unplanned application downtime is caused due to operator errors such as unintentional changes to network configuration resulting in a network outage, patch installations, service restart, etc. Yet, system admin activities on production IT systems are rarely properly logged and monitored. Existing tools to track user activities either produce too much information without any hints of a potential outage scenario or too little information to be useful in a meaningful way. In this paper, we describe the design and implementation of iTrack - a framework for monitoring user activities and correlating them with system data. iTrack makes use of commonly available native monitoring and diagnostic utilities on operating systems to monitor systems events as well as system admin activity, correlates these two sets of information and categorizes the activity as potentially abnormal or harmful based on its impact on the system in terms of file system, network and process activities. We demonstrate the usefulness of iTrack through several use cases and real world examples such as detecting and diagnosing system outages in real time, conducting post mortem analysis of outages, and maintaining audit logs. Our experimental evaluation of iTrack confirms that its monitoring overhead in terms of CPU time, activity completion time and data generated is within the tolerance range of most production systems. In cases, where the overhead was found to be unacceptable, we detect the underlying cause and provide solutions. These solutions improve performance by up to 20% to 90%, in terms of managed server and iTrack server CPU utilization, respectively and by up to 2 times in terms of completion time of certain system admin activities on the managed server.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信