用于负面内容网站自动识别的互联网浏览历史数据分析(案例研究:TRUST+™Positif)

2018 5th International Conference on Data and Software Engineering (ICoDSE) Pub Date : 2018-11-01 DOI:10.1109/ICODSE.2018.8705919

Army Aristofany, G. A. Putri Saptawati, Y. Asnar

{"title":"用于负面内容网站自动识别的互联网浏览历史数据分析(案例研究:TRUST+™Positif)","authors":"Army Aristofany, G. A. Putri Saptawati, Y. Asnar","doi":"10.1109/ICODSE.2018.8705919","DOIUrl":null,"url":null,"abstract":"Negative content website is a website that contains one or more of these following elements: pornography, violence and coercion in children, incitement to anarchy, and gambling. Negative content website grows along with the development of the internet. The number of internet users who's potentially exposed to negative content is increasing because of the cheaper cost of access to the internet and the increasing number of devices that support the use of the internet. Several programs have been taken by the authorities. Among them is the creation of TRUST + ™ Positive system that holds the huge list of negative website address. The ISP (Internet Service Provider) will blocks any negative content website referring to this system. The number of negative content website listed in the TRUST + ™ Positive list increases when there's reports about new negative content website or after doing the back-crawling process. The problem we faced is that the addition of the number of negative content website listed on the TRUST + ™ Positive list depends heavily on external reports and the ability of the TRUST + ™ Positive back-crawling engine. Therefore, by using ISP's Internet browsing history data we will performing data mining process to identify new negative content website. Data mining is done by using an association algorithm. Some internet user browsing history data setup techniques are used to find the best results according to internet browsing patterns that may arise. To reduce the number of identification errors we will filter any websites that are believed to be a website that has no negative content. The result obtained is that although the results of the association algorithm can be used for the identification of negative content website, but more than 75% of those results are not a negative content website or still need validation about its content.","PeriodicalId":362422,"journal":{"name":"2018 5th International Conference on Data and Software Engineering (ICoDSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Internet Browsing History Data Analysis for Automatic Negative Content Website Identification (Case Study: TRUST+™ Positif)\",\"authors\":\"Army Aristofany, G. A. Putri Saptawati, Y. Asnar\",\"doi\":\"10.1109/ICODSE.2018.8705919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Negative content website is a website that contains one or more of these following elements: pornography, violence and coercion in children, incitement to anarchy, and gambling. Negative content website grows along with the development of the internet. The number of internet users who's potentially exposed to negative content is increasing because of the cheaper cost of access to the internet and the increasing number of devices that support the use of the internet. Several programs have been taken by the authorities. Among them is the creation of TRUST + ™ Positive system that holds the huge list of negative website address. The ISP (Internet Service Provider) will blocks any negative content website referring to this system. The number of negative content website listed in the TRUST + ™ Positive list increases when there's reports about new negative content website or after doing the back-crawling process. The problem we faced is that the addition of the number of negative content website listed on the TRUST + ™ Positive list depends heavily on external reports and the ability of the TRUST + ™ Positive back-crawling engine. Therefore, by using ISP's Internet browsing history data we will performing data mining process to identify new negative content website. Data mining is done by using an association algorithm. Some internet user browsing history data setup techniques are used to find the best results according to internet browsing patterns that may arise. To reduce the number of identification errors we will filter any websites that are believed to be a website that has no negative content. The result obtained is that although the results of the association algorithm can be used for the identification of negative content website, but more than 75% of those results are not a negative content website or still need validation about its content.\",\"PeriodicalId\":362422,\"journal\":{\"name\":\"2018 5th International Conference on Data and Software Engineering (ICoDSE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th International Conference on Data and Software Engineering (ICoDSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICODSE.2018.8705919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Data and Software Engineering (ICoDSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICODSE.2018.8705919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

负面内容网站是包含以下一个或多个元素的网站:色情，暴力和强迫儿童，煽动无政府状态，赌博。负面内容网站是伴随着互联网的发展而成长起来的。由于互联网接入成本的降低和支持互联网使用的设备数量的增加，潜在接触到负面内容的互联网用户数量正在增加。当局已经采取了几项措施。其中包括创建TRUST +™Positive系统，该系统拥有庞大的负面网站地址列表。ISP(互联网服务提供商)将屏蔽任何涉及此系统的负面内容网站。当有关于新的负面内容网站的报告或进行反向爬行过程后，在TRUST +™正面列表中列出的负面内容网站的数量会增加。我们面临的问题是，在TRUST +™Positive列表中列出的负面内容网站数量的增加在很大程度上取决于外部报告和TRUST +™Positive反向爬行引擎的能力。因此，通过使用ISP的互联网浏览历史数据，我们将进行数据挖掘过程来识别新的负面内容网站。数据挖掘是通过使用关联算法完成的。一些互联网用户浏览历史数据设置技术用于根据可能出现的互联网浏览模式找到最佳结果。为了减少识别错误的数量，我们将过滤任何被认为是没有负面内容的网站。得到的结果是，虽然关联算法的结果可以用于负面内容网站的识别，但这些结果中有75%以上不是负面内容网站或仍需要对其内容进行验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Internet Browsing History Data Analysis for Automatic Negative Content Website Identification (Case Study: TRUST+™ Positif)

Negative content website is a website that contains one or more of these following elements: pornography, violence and coercion in children, incitement to anarchy, and gambling. Negative content website grows along with the development of the internet. The number of internet users who's potentially exposed to negative content is increasing because of the cheaper cost of access to the internet and the increasing number of devices that support the use of the internet. Several programs have been taken by the authorities. Among them is the creation of TRUST + ™ Positive system that holds the huge list of negative website address. The ISP (Internet Service Provider) will blocks any negative content website referring to this system. The number of negative content website listed in the TRUST + ™ Positive list increases when there's reports about new negative content website or after doing the back-crawling process. The problem we faced is that the addition of the number of negative content website listed on the TRUST + ™ Positive list depends heavily on external reports and the ability of the TRUST + ™ Positive back-crawling engine. Therefore, by using ISP's Internet browsing history data we will performing data mining process to identify new negative content website. Data mining is done by using an association algorithm. Some internet user browsing history data setup techniques are used to find the best results according to internet browsing patterns that may arise. To reduce the number of identification errors we will filter any websites that are believed to be a website that has no negative content. The result obtained is that although the results of the association algorithm can be used for the identification of negative content website, but more than 75% of those results are not a negative content website or still need validation about its content.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 5th International Conference on Data and Software Engineering (ICoDSE)

自引率

0.00%

发文量