控制符号对文本简化影响的研究

Zihao Li, M. Shardlow, Saeed Hassan
{"title":"控制符号对文本简化影响的研究","authors":"Zihao Li, M. Shardlow, Saeed Hassan","doi":"10.18653/v1/2022.tsar-1.14","DOIUrl":null,"url":null,"abstract":"Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Investigation into the Effect of Control Tokens on Text Simplification\",\"authors\":\"Zihao Li, M. Shardlow, Saeed Hassan\",\"doi\":\"10.18653/v1/2022.tsar-1.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.\",\"PeriodicalId\":247582,\"journal\":{\"name\":\"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.tsar-1.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.tsar-1.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

最近关于文本简化的工作主要集中在使用控制令牌来进一步提高技术水平。然而,如果不深入理解控制令牌背后的机制,就不容易进一步改进。一个未被探索的因素是标记化策略,我们也在探索。在本文中,我们(1)重新实现了ACCESS,(2)探索了不同控制令牌的影响,(3)测试了不同令牌化策略的影响,(4)展示了单独的控制令牌如何影响性能。我们将分别展示四种控制令牌的性能变化。我们还揭示了控制令牌的设计如何影响性能,并提出了一些设计控制令牌的建议,这些建议也适用于其他可控文本生成任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Investigation into the Effect of Control Tokens on Text Simplification
Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信