{"title":"非结构化数据的自动化管理工具","authors":"M. Ceglowski, A. Coburn, J. Cuadrado","doi":"10.1109/WI.2003.1241266","DOIUrl":null,"url":null,"abstract":"The rapidly growing quantity of online data has created a need for automated, content-based categorization and search tools. We describe an open-source, Web-based archive management, which uses latent semantic indexing, coupled with vector clustering techniques, to provide users with a fully searchable and automatically categorized interface to a data collection. The default English document parser included in the project uses part-of-speech tagging and recursive maximal noun phrase extraction to create a more effective term list for LSI than traditional stop list techniques. The archive interface supports multiple user views of the data collection. Advanced search features are implemented through relevance feedback, and do not require users to learn a query syntax.","PeriodicalId":403574,"journal":{"name":"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An automated management tool for unstructured data\",\"authors\":\"M. Ceglowski, A. Coburn, J. Cuadrado\",\"doi\":\"10.1109/WI.2003.1241266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapidly growing quantity of online data has created a need for automated, content-based categorization and search tools. We describe an open-source, Web-based archive management, which uses latent semantic indexing, coupled with vector clustering techniques, to provide users with a fully searchable and automatically categorized interface to a data collection. The default English document parser included in the project uses part-of-speech tagging and recursive maximal noun phrase extraction to create a more effective term list for LSI than traditional stop list techniques. The archive interface supports multiple user views of the data collection. Advanced search features are implemented through relevance feedback, and do not require users to learn a query syntax.\",\"PeriodicalId\":403574,\"journal\":{\"name\":\"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2003.1241266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2003.1241266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An automated management tool for unstructured data
The rapidly growing quantity of online data has created a need for automated, content-based categorization and search tools. We describe an open-source, Web-based archive management, which uses latent semantic indexing, coupled with vector clustering techniques, to provide users with a fully searchable and automatically categorized interface to a data collection. The default English document parser included in the project uses part-of-speech tagging and recursive maximal noun phrase extraction to create a more effective term list for LSI than traditional stop list techniques. The archive interface supports multiple user views of the data collection. Advanced search features are implemented through relevance feedback, and do not require users to learn a query syntax.