{"title":"Automatic image captioning in Thai for house defect using a deep learning-based approach","authors":"Manadda Jaruschaimongkol, Krittin Satirapiwong, Kittipan Pipatsattayanuwong, Suwant Temviriyakul, Ratchanat Sangprasert, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-023-00068-w","DOIUrl":null,"url":null,"abstract":"<div><p>This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with identified defects. To the best of our knowledge, there are no previous works or datasets that have automated this process. Therefore, this paper proposes a new image captioning dataset for house defect inspection, which is benchmarked with three deep learning-based models. Our models are based on the encoder–decoder architecture where three image encoders (i.e., VGG16, MobileNet, and InceptionV3) and one GRU-based decoder with an additive attention mechanism of Bahdanau are experimented. The experimental results indicate that, despite similar training losses in all models, VGG16 takes the least time to train a model, while MobileNet achieves the highest BLEU-1 to BLEU-4 scores of 0.866, 0.850, 0.823, and 0.728, respectively. However, InceptionV3 is suggested as the optimal model, since it outperforms the others in terms of accurate attention plots and its BLEU scores are comparable to the best scores obtained by MobileNet.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in computational intelligence","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43674-023-00068-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with identified defects. To the best of our knowledge, there are no previous works or datasets that have automated this process. Therefore, this paper proposes a new image captioning dataset for house defect inspection, which is benchmarked with three deep learning-based models. Our models are based on the encoder–decoder architecture where three image encoders (i.e., VGG16, MobileNet, and InceptionV3) and one GRU-based decoder with an additive attention mechanism of Bahdanau are experimented. The experimental results indicate that, despite similar training losses in all models, VGG16 takes the least time to train a model, while MobileNet achieves the highest BLEU-1 to BLEU-4 scores of 0.866, 0.850, 0.823, and 0.728, respectively. However, InceptionV3 is suggested as the optimal model, since it outperforms the others in terms of accurate attention plots and its BLEU scores are comparable to the best scores obtained by MobileNet.