{"title":"识别故障模式的文本挖掘技术","authors":"Francina Malan, J. L. Jooste","doi":"10.1108/jqme-02-2020-0012","DOIUrl":null,"url":null,"abstract":"PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.","PeriodicalId":16938,"journal":{"name":"Journal of Quality in Maintenance Engineering","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text mining techniques for identifying failure modes\",\"authors\":\"Francina Malan, J. L. Jooste\",\"doi\":\"10.1108/jqme-02-2020-0012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.\",\"PeriodicalId\":16938,\"journal\":{\"name\":\"Journal of Quality in Maintenance Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Quality in Maintenance Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/jqme-02-2020-0012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality in Maintenance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jqme-02-2020-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
Text mining techniques for identifying failure modes
PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.
期刊介绍:
This exciting journal looks at maintenance engineering from a positive standpoint, and clarifies its recently elevatedstatus as a highly technical, scientific, and complex field. Typical areas examined include: ■Budget and control ■Equipment management ■Maintenance information systems ■Process capability and maintenance ■Process monitoring techniques ■Reliability-based maintenance ■Replacement and life cycle costs ■TQM and maintenance