{"title":"Using Natural Language Processing to Search for Textual References","authors":"Brett Graham","doi":"10.1163/9789004399297_008","DOIUrl":null,"url":null,"abstract":"In natural languages, as opposed to computer languages like C or Pascal, the words and syntax are not artificially defined; instead, they develop naturally. Typical examples of natural languages are those that are spoken in human communication, such as the English, French, and Japanese languages. However, the term natural language can also refer to written text, such as Facebook postings, emails or even text messages. As well as changing over time, natural languages also vary among different cultures and people groups. So, for example, the words and syntax that a teenager might use to write a text message on their phone are likely to be different to the words and syntax that Shakespeare used to write Othello. Within computer science, the term Natural Language Processing (NLP) refers to way computers are programmed to understand natural languages. At a basic level, NLP involves three steps – lexical analysis, syntax analysis, and semantic analysis. The complexity of each of these steps is perhaps best illustrated through looking at how three well-known programs incorporate NLP; namely, Microsoft Word, the Google search engine, and Apple’s Siri. If you were to type (or copy and paste) the following string – “Can I be worn jeens to church?” – into Microsoft Word then it will perform simple lexical analysis by grouping the characters into tokens (i.e. words) using the whitespace and punctuation as separators. Having done this, the program will then consult its dictionary and recognize that “jeens” is not a valid entry. As a result, it will place this word in red, somewhat like this:","PeriodicalId":355737,"journal":{"name":"Ancient Manuscripts in Digital Culture","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ancient Manuscripts in Digital Culture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/9789004399297_008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In natural languages, as opposed to computer languages like C or Pascal, the words and syntax are not artificially defined; instead, they develop naturally. Typical examples of natural languages are those that are spoken in human communication, such as the English, French, and Japanese languages. However, the term natural language can also refer to written text, such as Facebook postings, emails or even text messages. As well as changing over time, natural languages also vary among different cultures and people groups. So, for example, the words and syntax that a teenager might use to write a text message on their phone are likely to be different to the words and syntax that Shakespeare used to write Othello. Within computer science, the term Natural Language Processing (NLP) refers to way computers are programmed to understand natural languages. At a basic level, NLP involves three steps – lexical analysis, syntax analysis, and semantic analysis. The complexity of each of these steps is perhaps best illustrated through looking at how three well-known programs incorporate NLP; namely, Microsoft Word, the Google search engine, and Apple’s Siri. If you were to type (or copy and paste) the following string – “Can I be worn jeens to church?” – into Microsoft Word then it will perform simple lexical analysis by grouping the characters into tokens (i.e. words) using the whitespace and punctuation as separators. Having done this, the program will then consult its dictionary and recognize that “jeens” is not a valid entry. As a result, it will place this word in red, somewhat like this: