数据分析和信息处理(英文)最新文献_第7页

Towards Kikamba Computational Grammar Kikamba计算语法研究

数据分析和信息处理(英文) Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74015

Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi

{"title":"Towards Kikamba Computational Grammar","authors":"Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi","doi":"10.4236/jdaip.2019.74015","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74015","url":null,"abstract":"The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47635837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Bayesian Non-Parametric Mixture Model with Application to Modeling Biological Markers 贝叶斯非参数混合模型及其在生物标记建模中的应用

数据分析和信息处理(英文) Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74009

M. K. Peter, L. Mbugua, A. Wanjoya

{"title":"Bayesian Non-Parametric Mixture Model with Application to Modeling Biological Markers","authors":"M. K. Peter, L. Mbugua, A. Wanjoya","doi":"10.4236/jdaip.2019.74009","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74009","url":null,"abstract":"The effect of treatment on patient’s outcome can easily be determined through the impact of the treatment on biological events. Observing the treatment for patients for a certain period of time can help in determining whether there is any change in the biomarker of the patient. It is important to study how the biomarker changes due to treatment and whether for different individuals located in separate centers can be clustered together since they might have different distributions. The study is motivated by a Bayesian non-parametric mixture model, which is more flexible when compared to the Bayesian Parametric models and is capable of borrowing information across different centers allowing them to be grouped together. To this end, this research modeled Biological markers taking into consideration the Surrogate markers. The study employed the nested Dirichlet process prior, which is easily peaceable on different distributions for several centers, with centers from the same Dirichlet process component clustered automatically together. The study sampled from the posterior by use of Markov chain Monte carol algorithm. The model is illustrated using a simulation study to see how it performs on simulated data. Clearly, from the simulation study it was clear that, the model was capable of clustering data into different clusters.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41262284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis 使用分类算法和潜在语义分析的云提供商Tweet情感分析(TSA)

数据分析和信息处理(英文) Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74016

Ioannis Karamitsos, Saeed Albarhami, Charalampos Apostolopoulos

{"title":"Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis","authors":"Ioannis Karamitsos, Saeed Albarhami, Charalampos Apostolopoulos","doi":"10.4236/jdaip.2019.74016","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74016","url":null,"abstract":"The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS; introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43994825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Origin of Dynamic Correlations of Words in Written Texts 书面语篇中词语动态关联的起源

数据分析和信息处理(英文) Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74014

Hiroshi Ogura, Hiromi Amano, Masato Kondo

{"title":"Origin of Dynamic Correlations of Words in Written Texts","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/jdaip.2019.74014","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74014","url":null,"abstract":"In a previous study, we introduced dynamical aspects of written texts by regarding serial sentence number from the first to last sentence of a given text as discretized time. Using this definition of a textual timeline, we defined an autocorrelation function (ACF) for word occurrences and demonstrated its utility both for representing dynamic word correlations and for measuring word importance within the text. In this study, we seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions (an unnormalized version of ACFs) observed in actual written texts. Using this theory, we propose a model for time-varying probability that describes the probability of word occurrence in each sentence in a text. The proposed model considers hierarchical document structures such as chapters, sections, subsections, paragraphs, and sentences. Because such a hierarchical structure is common to most documents, our model for occurrence probability of words has a wide range of universality for interpreting dynamic word correlations in actual written texts. The main contributions of this study are, therefore, finding usability of the additive binary Markov chain theory to analyze dynamic correlations in written texts and offering a new model of word occurrence probability in which common hierarchical structure of documents is taken into account.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48590060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Blockchain in Smart Cities: Exploring Possibilities in Terms of Opportunities and Challenges 智慧城市中的区块链：从机遇和挑战的角度探索可能性

数据分析和信息处理(英文) Pub Date : 2019-08-15 DOI: 10.4236/JDAIP.2019.73008

Raed A. Salha, Maher A. El-Hallaq, Abdelkhalek I. Alastal

引用次数: 24

Time Series Analysis of Energy Intensity, Value Added Tax and Corporate Income Tax: A Case Study of the Non-Ferrous Metal Industry, Jiangxi Province, China 能源强度、增值税和企业所得税的时间序列分析——以江西有色金属工业为例

数据分析和信息处理(英文) Pub Date : 2019-07-26 DOI: 10.4236/JDAIP.2019.73007

Wen-rong Pan, D. Lai, Yu Song, J. Follis

{"title":"Time Series Analysis of Energy Intensity, Value Added Tax and Corporate Income Tax: A Case Study of the Non-Ferrous Metal Industry, Jiangxi Province, China","authors":"Wen-rong Pan, D. Lai, Yu Song, J. Follis","doi":"10.4236/JDAIP.2019.73007","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.73007","url":null,"abstract":"Unprecedented industrialization and urbanization have led to China’s poor energy efficiency. In response, the Chinese government has set goals to reduce energy consumption that may include implementing new tax policies. In this paper, we investigate the relationship between energy intensity, an indicator that measures the efficiency of energy consumption, and two sources of government revenue in China (i.e., value-added tax (VAT) and corporate income tax). As a case study, we developed a Granger co-integration model to analyze the dynamic relationship of energy intensity, VAT and corporate income tax in the non-ferrous metal industry, Jiangxi Province, China, between 1996 and 2010. Augmented Dickey-Fuller tests were used to validate the model. In our time series analyses, we found when controlling for corporate income tax, a one log unit increase of VAT resulted in a decrease of 1.17 log units of energy intensity. However, when controlling for VAT, a one log unit increase of corporate income tax resulted in an increase of 0.34 log units of energy intensity. Understanding the relationship between energy intensity and taxation in industries that consume high volumes of energy can greatly enhance China’s goal to reduce energy consumption. We believe our findings add to this on-going discussion.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49344836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Road Traffic Accident Scenario, Pattern and Forecasting in Bangladesh 孟加拉国道路交通事故情景、模式和预测

数据分析和信息处理(英文) Pub Date : 2019-03-04 DOI: 10.4236/JDAIP.2019.72003

S. Hossain, Omor Faruque

{"title":"Road Traffic Accident Scenario, Pattern and Forecasting in Bangladesh","authors":"S. Hossain, Omor Faruque","doi":"10.4236/JDAIP.2019.72003","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72003","url":null,"abstract":"The main aim of this research work is to be aware of the road traffic accident scenario, injurious effects and pattern in Bangladesh. Moreover we are interested to forecast the magnitude of road traffic accidents for the future so that decision makers can make appropriate decision for precaution. This study also provides an assessment of road traffic accidents in Bangladesh and its impact based on data collected for the period of 1971 to 2017. In this study we have tried to pick up the main reasons of road accidents and to observe the tremendous situation. The study observed that the general trends of road traffic accident (RTA), deaths and injuries reveal that the number of RTA, deaths and injuries increased gradually with little fluctuations form 1971 to 2007 and after 2007 there is a slow decreasing trend. Although the number of RTA and deaths observed decreasing trend in recent years, the ratio of number of deaths to number of accident increased significantly. The rate of register vehicles per 10,000 people increased moderately throughout the period but a sharp increment is exhibited from 2009. Highest percentage of RTA (34%) and deaths is due to RTA (32%) in Dhaka division while the lowest percentage of RTA (4%) in Barisal and Sylhet divisions and deaths is due to RTA (3%) in Barisal division. It is noticed that the maximum number of injuries occurred between ages 21 and 30 while the maximum number of deaths occurred between ages 11 and 30. Most of the RTA and deaths due to RTA are caused by run over by vehicles and head to head collision. The severity of occurring road accident and number of deaths are higher during the festive periods because of involving higher frequency of traveling than usual. The time plot shows that the graph maintains a decreasing movement from 2012 to 2015 but increases from 2015 to 2017. In the research an additive time series model approach is applied. It included the estimation of trend, seasonal variation and random variation using triple exponential smoothing method. We performed forecasting of RTA eliminating seasonal impact for the next three consecutive years (2018-2020) with 95% confidence interval using Holt-Winters exponential technique.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48679248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function 用自相关函数测量文字动态相关性

数据分析和信息处理(英文) Pub Date : 2019-03-04 DOI: 10.4236/JDAIP.2019.72004

Hiroshi Ogura, Hiromi Amano, Masato Kondo

{"title":"Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/JDAIP.2019.72004","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72004","url":null,"abstract":"In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44564790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Technology Foresight Research of Industrial Robot Based on Patent Analysis 基于专利分析的工业机器人技术前瞻研究

数据分析和信息处理(英文) Pub Date : 2019-03-04 DOI: 10.4236/JDAIP.2019.72005

Xionghui Wen

引用次数: 3

Ordinal Outcome Modeling: The Application of the Adaptive Moment Estimation Optimizer to the Elastic Net Penalized Stereotype Logit 有序结果建模:自适应矩估计优化器在弹性网惩罚刻板印象Logit中的应用

数据分析和信息处理(英文) Pub Date : 2019-02-22 DOI: 10.4236/jdaip.2019.71002

A. Williams

引用次数: 1