{"title":"Comparing a Hand-crafted to an Automatically Generated Feature Set for Deep Learning: Pairwise Translation Evaluation","authors":"Despoina Mouratidis, Katia Lida Kermanidis","doi":"10.26615/issn.2683-0078.2019_008","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_008","url":null,"abstract":"The automatic evaluation of machine translation (MT) has proven to be a very significant research topic. Most automatic evaluation methods focus on the evaluation of the output of MT as they compute similarity scores that represent translation quality. This work targets on the performance of MT evaluation. We present a general scheme for learning to classify parallel translations, using linguistic information, of two MT model outputs and one human (reference) translation. We present three experiments to this scheme using neural networks (NN). One using string based hand-crafted features (Exp1), the second using automatically trained embeddings from the reference and the two MT outputs (one from a statistical machine translation (SMT) model and the other from a neural ma-chine translation (NMT) model), which are learned using NN (Exp2), and the third experiment (Exp3) that combines information from the other two experiments. The languages involved are English (EN), Greek (GR) and Italian (IT) segments are educational in domain. The proposed language-independent learning scheme which combines information from the two experiments (experiment 3) achieves higher classification accuracy compared with models using BLEU score information as well as other classification approaches, such as Random Forest (RF) and Support Vector Machine (SVM).","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116665582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Ka-Yin T'sou, Ka-Po Chow, Junru Nie, Yuan Yuan, Hong Kong Chilin Ltd.
{"title":"Towards a Proactive MWE Terminological Platform for Cross-Lingual Mediation in the Age of Big Data","authors":"Benjamin Ka-Yin T'sou, Ka-Po Chow, Junru Nie, Yuan Yuan, Hong Kong Chilin Ltd.","doi":"10.26615/issn.2683-0078.2019_014","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_014","url":null,"abstract":"The emergence of China as a global economic power in the 21st Century has brought about surging needs for cross-lingual and cross-cultural mediation, typically performed by translators. Advances in Artificial Intelligence and Language Engineering have been bolstered by Machine learning and suitable Big Data cultivation. They have helped to meet some of the translator’s needs, though the technical specialists have not kept pace with the practical and expanding requirements in language mediation. One major technical and linguistic hurdle involves words outside the vocabulary of the translator or the lexical database he/she consults, especially Multi-Word Expressions (Compound Words) in technical subjects. A further problem is in the multiplicity of renditions of a term in the target language. This paper discusses a proactive approach following the successful extraction and application of sizable bilingual Multi-Word Expressions (Compound Words) for language mediation in technical subjects, which do not fall within the expertise of typical translators, who have inadequate appreciation of the range of new technical tools available to help him/her. Our approach draws on the personal reflections of translators and teachers of translation and is based on the prior R&D efforts relating to 300,000 comparable Chinese-English patents. The subsequent protocol we have developed aims to be proactive in meeting four identified practical challenges in technical translation (e.g. patents). It has broader economic implication in the Age of Big Data (Tsou et al, 2015) and Trade War, as the workload, if not, the challenges, increasingly cannot be met by currently available front-line translators. We shall demonstrate how new tools can be harnessed to spearhead the application of language technology not only in language mediation but also in the “teaching” and “learning” of translation. It shows how a better appreciation of their needs may enhance the contributions of the technical specialists, and thus enhance the resultant synergetic benefits.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129760835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Translation Quality Assessment Tools and Processes in Relation to CAT Tools","authors":"Viktoriya Petrova","doi":"10.26615/issn.2683-0078.2019_011","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_011","url":null,"abstract":"Modern translation QA tools are the latest attempt to overcome the inevitable subjective component of human revisers. This paper analyzes the current situation in the translation industry in respect to those tools and their relationship with CAT tools. The adoption of international standards has set the basic frame that defines “quality”. Because of the clear impossibility to develop a universal QA tool, all of the existing ones have in common a wide variety of settings for the user to choose from. A brief comparison is made between most popular standalone QA tools. In order to verify their results in practice, QA outputs from two of those tools have been compared. Polls that cover a period of 12 years have been collected. Their participants explained what practices they adopted in order to guarantee quality.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128925067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison between Automatic and Human Subtitling: A Case Study with Game of Thrones","authors":"Sabrina Baldo de Brébisson","doi":"10.26615/issn.2683-0078.2019_001","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_001","url":null,"abstract":"In this submission, I would like to share my experiences with the software DeepL and the comparison analysis I have made with human subtitling offered by the DVD version of the corpus I have chosen as the topic of my study – the eight Seasons of Game of Thrones. The idea is to study if the version proposed by an automatic translation program could be used as a first draft for the professional subtitler. It is expected that the latter would work on the form of the subtitles, that is to say mainly on their length, in a second step.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115683901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Czulo, Tiago Timponi Torrent, E. Matos, Alexandre Diniz da Costa, Debanjana Kar
{"title":"Designing a Frame-Semantic Machine Translation Evaluation Metric","authors":"Oliver Czulo, Tiago Timponi Torrent, E. Matos, Alexandre Diniz da Costa, Debanjana Kar","doi":"10.26615/issn.2683-0078.2019_004","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_004","url":null,"abstract":"We propose a metric for machine translation evaluation based on frame semantics which does not require the use of reference translations or human corrections, but is aimed at comparing original and translated output directly. The metrics is described on the basis of an existing manual frame-semantic annotation of a parallel corpus with an English original and a Brazilian Portuguese and a German translation. We discuss implications of our metrics design, including the potential of scaling it for multiple languages.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114207865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Corpus of Croatian-Italian Administrative Texts","authors":"Marija Brkic Bakaric, Ivana Lalli Paćelat","doi":"10.26615/issn.2683-0078.2019_002","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_002","url":null,"abstract":"Parallel corpora constitute a unique re-source for providing assistance to human translators. The selection and preparation of the parallel corpora also conditions the quality of the resulting MT engine. Since Croatian is a national language and Italian is officially recognized as a minority lan-guage in seven cities and twelve munici-palities of Istria County, a large amount of parallel texts is produced on a daily basis. However, there have been no attempts in using these texts for compiling a parallel corpus. A domain-specific sentence-aligned parallel Croatian-Italian corpus of administrative texts would be of high value in creating different language tools and resources. The aim of this paper is, therefore, to explore the value of parallel documents which are publicly available mostly in pdf format and to investigate the use of automatically-built dictionaries in corpus compilation. The effects that a document format and, consequently sentence splitting, and the dictionary input have on the sentence alignment process are manually evaluated.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Success Story of Mitra Translations","authors":"Mina Ilieva, M. Kancheva","doi":"10.26615/issn.2683-0078.2019_016","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_016","url":null,"abstract":"Technologies and their constant updates and innovative nature drastically and irreversibly transformed this small business into a leading brand on the translation market, along with just few other LSPs integrating translation software solutions. Now, we are constantly following the new developments in software updates and online platforms and we are successfully keeping up with any new trend in the field of translation, localization, transcreation, revision, post-editing, etc. Ultimately, we are positive that proper implementation of technology (with focus on quality, cost and time) and hard work are the stepping stones in the way to become a trusted translation services provider.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130928098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Influences the Features of Post-editese? A Preliminary Study","authors":"Sheila Castilho, Natália Resende, R. Mitkov","doi":"10.26615/issn.2683-0078.2019_003","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_003","url":null,"abstract":"While a number of studies have shown evidence of translationese phenomena, that is, statistical differences between original texts and translated texts (Gellerstam, 1986), results of studies searching for translationese features in postedited texts (what has been called ”posteditese” (Daems et al., 2017)) have presented mixed results. This paper reports a preliminary study aimed at identifying the presence of post-editese features in machine-translated post-edited texts and at understanding how they differ from translationese features. We test the influence of factors such as post-editing (PE) levels (full vs. light), translation proficiency (professionals vs. students) and text domain (news vs. literary). Results show evidence of post-editese features, especially in light PE texts and in certain domains.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133241701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corpus Linguistics, Translation and Error Analysis","authors":"M. Stambolieva","doi":"10.26615/issn.2683-0078.2019_012","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_012","url":null,"abstract":"The paper presents a study of the French Imparfait and its functional equivalents in Bulgarian and English in view of applications in machine translation and error analysis. The aims of the study are: 1/ based on the analysis of a corpus of text, to validate/revise earlier research on the values of the French Imparfait, 2/ to define the contextual factors pointing to the realisation of one or another value of the forms, 3/ based on the analysis of aligned translations, to identify the translation equivalents of these values, 4/ to formulate translation rules, 5/ based on the analysis of the translation rules, to refine the annotation modules of the environment used – the NBU e-Platform for language teaching and research.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125236464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Punster’s Amanuensis: The Proper Place of Humans and Machines in the Translation of Wordplay","authors":"Tristan Miller","doi":"10.26615/issn.2683-0078.2019_007","DOIUrl":"https://doi.org/10.26615/issn.2683-0078.2019_007","url":null,"abstract":"The translation of wordplay is one of the most extensively researched problems in translation studies, but it has attracted little attention in the fields of natural language processing and machine translation. This is because today’s language technologies treat anomalies and ambiguities in the input as things that must be resolved in favour of a single “correct” interpretation, rather than preserved and interpreted in their own right. But if computers cannot yet process such creative language on their own, can they at least provide specialized support to translation professionals? In this paper, I survey the state of the art relevant to computational processing of humorous wordplay and put forth a vision of how existing theories, resources, and technologies could be adapted and extended to support interactive, computer-assisted translation.","PeriodicalId":313947,"journal":{"name":"Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130284759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}