{"title":"Joint Characterization of Multiscale Information in High Dimensional Data","authors":"D. Sousa, C. Small","doi":"10.54364/AAIML.2021.1113","DOIUrl":"https://doi.org/10.54364/AAIML.2021.1113","url":null,"abstract":"High dimensional feature spaces can contain information onmultiple scales. At global scales, spanning an entire feature space, covariance structure among dimensions can determine topology and intrinsic dimensionality. In addition, local scale information can be captured by the structure of low-dimensionalmanifolds embeddedwithin the high-dimensional feature space. Such manifolds may not easily be resolved by the global covariance structure. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-distributed Stochastic Neighbor Embedding (t-SNE) to characterize local manifold structure, also comparing against a second approach for characterization of local manifold structure, Laplacian Eigenmaps (LE). Using both low dimensional synthetic images and high dimensional imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either algorithm alone. Broadly, t-SNE is effective at rendering a randomly oriented low-dimensional map of local manifolds (clustering), and PCA renders this map interpretable by providing global, physically meaningful structure. LE provides additional useful context by reinforcing and refining the feature space topology found by PCA, simplifying structural interpretation, clarifying endmember identification and highlighting new potential endmembers which are not evident from other methods alone. This approach is illustrated using hyperspectral imagery of agriculture resolving crop-specific, field scale, differencesin vegetation reflectance. The fundamental premise of joint characterization could easily be extended to other high dimensional datasets, including image time series and nonimage data. The approach may prove particularly useful for other geospatial data since both robust manifold structure (due to spatial autocorrelation) and physically interpretable global variance structure (due to physical generative processes) are frequently present.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126146331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Mid-Market Strategies for Big Data Governance","authors":"Ken Knapton","doi":"10.21203/rs.3.rs-114534/v1","DOIUrl":"https://doi.org/10.21203/rs.3.rs-114534/v1","url":null,"abstract":"\u0000 Many data scientists are struggling to adopt effective data governance practices as they transition from traditional data analysis to big data analytics. Data governance of big data requires new strategies to deal with the volume, variety, and velocity attributes of big data. The purpose of this qualitative multiple case study was to explore big data governance strategies employed by data scientists to provide a holistic perspective of those data for making decisions. The participants were 10 data scientists employed in multiple mid-market companies in the greater Salt Lake City, Utah area who have strategies to govern big data. This study’s data collection included semi-structured in-depth individual interviews (n = 10) and analysis of process documentation relating to big data governance in those organizations (n = 4). Through thematic analysis, 4 major themes emerged from the study: ensuring business centricity, striving for simplicity, establishing data source protocols, and designing for security. The strategies outlined in this study can lead to positive social change by proactively addressing the ethical use of personally identifiable information in big data. By implementing strategies relating to the segregation of duties, encryption of data, and personal information, data scientists can mitigate contemporary concerns relating to the use of private information in big data analytics.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122547565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InkGAN: Generative Adversarial Networks for Ink-And-Wash Style Transfer of Photographs","authors":"Keyi Yu, Yu Wang, Sihan Zeng, Chendi Liang, Xiaoyu Bai, Dachi Chen, Wenping Wang","doi":"10.54364/aaiml.2023.1171","DOIUrl":"https://doi.org/10.54364/aaiml.2023.1171","url":null,"abstract":"In this work, we present a novel approach for Chinese Ink-and-Wash style transfer using a GAN structure. The proposed method incorporates a specially designed smooth loss tailored for this style transfer task, and an end-to-end framework that seamlessly integrates various components for efficient and effective image style transferring. To demonstrate the superiority of our approach, comparative results against other popular style transfer methods such as CycleGAN is presented. The experimentation showcased the notable improvements achieved with our proposed method in terms of preserving the intricate details and capturing the essence of the Chinese Ink-and-Wash style. Furthermore, an ablation study is conducted to evaluate the effectiveness of each loss component in our framework. We conclude in the end and anticipate that our findings will inspire further advancements in this domain and foster new avenues for artistic expression in the digital realm.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122835581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification and Counting of Sorghum Panicles Using Artificial Intelligence Based Drone Field Phenotyping","authors":"M. Mbaye, A. Audebert","doi":"10.54364/aaiml.2021.1115","DOIUrl":"https://doi.org/10.54364/aaiml.2021.1115","url":null,"abstract":"One of the most promising and difficult challenges for field phenotyping is accurate and reliable counting of sorghum panicles using drone imagery both from RGB and multispectral cameras. In this paper, we present a hybrid Machine Learning method for sorghum panicle identification and counting.The methodology first consists in building a Machine Learning classifier following the two most used methods in the literature for drone and agriculture applications: Support Vector Machine Learning (SVM) and, Artificial Neural Networks (ANN). The present dataset includes 5300 images, and 60% of the dataset were used for training and 20% for testing and validation. Following the results obtained from these models, image segmentation using super-pixel affinity propagation and k-means clustering was used based on simple linear iterative clustering. With an accuracy of 99%, SVM gave a superior performance also in terms of precision and kappa when compared to the ANN model whose accuracy was 98%. Concerning the SVM, a radial basis kernel was used, and the sigma parameter was kept constant at a value of 5.6 determined analytically.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122035544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer Learning to Detect Age From Handwriting","authors":"Najla AL-Qawasmeh, C. Suen","doi":"10.54364/aaiml.2022.1126","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1126","url":null,"abstract":"Handwriting analysis is the science of determining an individual’s personality from his or her handwriting by assessing features such as slant, pen pressure, word spacing, and other factors. Handwriting analysis has a wide range of uses and applications, including dating and socialising, roommates and landlords, business and professional, employee hiring, and human resources. This study used the ResNet and GoogleNet CNN architectures as fixed feature extractors from handwriting samples. SVM was used to classify the writer’s gender and age based on the extracted features. We built an Arabic dataset named FSHS to analyse and test the proposed system. In the gender detection system, applying the automatic feature extraction method to the FSHS dataset produced accuracy rates of 84.9% and 82.2% using ResNet and GoogleNet, respectively. While the age detection system using the automatic feature extraction method achieved accuracy rates of 69.7% and 61.1% using ResNet and GoogleNet, respectively","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129788223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Brain Is Processing Information: Then, Why Does Research Into Human Brain Disorders So Firmly Avoids Information Processing Modeling?","authors":"E. Diamant","doi":"10.54364/aaiml.2022.1142","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1142","url":null,"abstract":"Brain disorders are a rapidly growing global health problem affecting millions of people worldwide. To date, however, no effective problem treatment is available because the efforts are directed not at the possible roots of disorders (which are still unknown) but only at the disorders’ symptoms. To better understand disorders’ causes and their underlying mechanisms, a large amount of scientific research is focused on developing disease models that rely on biological mechanisms taken from various fields of knowledge, such as genetics, molecular biology, neural and behavioral processes. It is extremely surprising that information processing mechanisms are never mentioned in this regard. (despite the fact that the dictum “The brain is processing information” is widely spread and generally accepted in the research community). A possible answer might be – the research community does not know “what information is”. To reverse this bizarre situation, I introduced my own definition of information. In biological systems information is represented as text strings written with nucleotide letters and amino acid signs. which makes information a physical entity with distinctive physical properties: length, weight, structure. Consequently, brain information processing is regarded as a chain of interconnected neurons (neuron network) with information flowing between successive network stages. In the course of information processing, only part of the processed information is advanced within the network. The remaining (not used) part of the information must be destroyed, demolished, and led out from the neuron for further recycling and utilization. Nature has provided the brain with genetic mechanisms for such “information waste” processing and utilization. But over time and especially in course of human aging such mechanisms become damaged and dysfunctional. Consequently, neurons clogged with the “information waste” become damaged and dysfunctional. And the brain disorders mounting. “Genetical engineering” can serve us as a remedy and the answer to neuronal disorders expansion.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131222252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Viaña, Stephan Ralescu, Kelly Cohen, V. Kreinovich, A. Ralescu
{"title":"Why Cauchy Membership Functions: Reliability","authors":"Javier Viaña, Stephan Ralescu, Kelly Cohen, V. Kreinovich, A. Ralescu","doi":"10.54364/aaiml.2022.1125","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1125","url":null,"abstract":"An important step in designing a fuzzy system is the elicitation of the membership functions for the fuzzy sets used. Often the membership functions are obtained from data in a traininglike manner. They are expected to match or be at least compatible with those obtained from experts knowledgeable of the domain and the problem being addressed. In cases when neither are possible, e.g., insufficient data or unavailability of experts, we are faced with the question of hypothesizing the membership function. We have previously argued in favor of Cauchy membership functions (thus named because their expression is similar to that of the Cauchy distributions) and supported this choice from the point of view of efficiency of training. This paper looks at the same family of membership functions from the point of view of reliability","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124235827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Numerical Evidence That the Power of Artificial Neural Networks Limits Strong AI","authors":"R. Englert, J. Muschiol","doi":"10.54364/aaiml.2022.1122","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1122","url":null,"abstract":"A famous definition of AI is based on the terms weak and strong AI from McCarthy. An open question is the characterization of these terms, i.e., the transition from weak to strong. Nearly no research results are known for this complex and important question. In this paper we investigate how the size and structure of a Neural Network (NN) limits the learnability of a training sample, and thus, can be used to discriminate weak and strong AI (domains). Furthermore, the size of the training sample is a primary parameter for the training effort estimation with the big O function. The needed training repetitions may also limit the learning tractability and will be investigated. The results are illustrated with an analysis of a feedforward NN and a training sample for language with 1,000 words including the effort for the training repetitions.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124606240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. A. N. Fernandes, G. A. Sarriés, Yuniel T. Mazola, Robson C. de Lima, Gustavo N. Furlan, M. Bacchi
{"title":"Machine learning to support geographical origin traceability of Coffea Arabica","authors":"E. A. N. Fernandes, G. A. Sarriés, Yuniel T. Mazola, Robson C. de Lima, Gustavo N. Furlan, M. Bacchi","doi":"10.54364/aaiml.2022.1118","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1118","url":null,"abstract":"The species, variety and geographic origin of coffee directly influence the characteristics of the coffee beans and, consequently, the quality of the beverage. The added economic value that these features bring to the product has boosted the use of non-designative tools for authentication purposes. In this work, the feasibility of implementing a traceability system for Arabica coffee by country of origin was investigated using quality attributes and supervised machine learning algorithms: Multilayer Perceptron (MLP), Random Forest (RF), Random Tree (RT) and Sequential Minimal Optimization (SMO). We use an available database containing quality parameters for coffee beans produced in 15 countries, including the largest exporters and importers. Overall, Ethiopia, Kenya and Uganda had the highest coffee quality index (Total Cup Points). Differences between countries were found with 99% confidence using Robusta Multivariate Data Science with original data and 98% accuracy using Bootstrapping resampling method and Supervised Machine Learning algorithms. The model obtained by RF provided the best classification accuracy. The most important attributes to discriminate Arabica coffee by country of origin, in descending order, were body, moisture, total cup points, cupper points, acidity, aftertaste, flavor, aroma, balance, sweetness and uniformity. The coffee variety proved to be a promising variable to increase accuracy and can be incorporated among the quality attributes for classification and grading of coffee beans.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124767121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why Learning and Machine Learning Are Different","authors":"J. Végh, Ádám-József Berki","doi":"10.54364/aaiml.2021.1109","DOIUrl":"https://doi.org/10.54364/aaiml.2021.1109","url":null,"abstract":"Machine learning intends to be a biology-mimicking learningmethod, implemented bymeans of technical computing. Their technology and methods, however, differ very much; mainly because technological computing is based on the time-unaware classic computing paradigm. Based on the time-aware computing paradigm, the paper discovers the mechanism of biological information storing and learning; furthermore, it explains, why biological and technological information handling and learning are entirely different. The consequences of the huge difference in transmission speed in those computing systems may remain hidden in “toy”-level technological systems but comes to the light in systems having large size and/or mimicking neuronal operations. The biology-mimicking technological operations are in resemblance to the biological operations only when using time-unaware computing paradigm. The difference leads also to the need of introducing “training” mode (with desperately low efficiency) in technological learning, while biological systems have the ability of life-long learning. It is at least misleading to use technological learning methods to complement biological learning studies. The examples show evidence for the effect of transmission time in published experiments.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}