{"title":"An Accessible Python based Author Identification Process","authors":"Anthony F. Breitzman","doi":"10.25080/gerudo-f2bc6f59-003","DOIUrl":"https://doi.org/10.25080/gerudo-f2bc6f59-003","url":null,"abstract":"—Author identification also known as ‘author attribution’ and more recently ‘forensic linguistics’ involves identifying true authors of anonymous texts. The Federalist Papers are 85 documents written anonymously by a combination of Alexander Hamilton, John Jay, and James Madison in the late 1780’s supporting adoption of the American Constitution. All but 12 documents have confirmed authors based on lists provided before the author’s deaths. Mosteller and Wallace in 1963 provided evidence of authorship for the 12 disputed documents, however the analysis is not readily accessible to non-statisticians. In this paper we replicate the analysis but in a much more accessible way using modern text mining methods and Python. One surprising result is the usefulness of filler-words in identifying writing styles. The method described here can be applied to other authorship questions such as linking the Unabomber manifesto with Ted Kaczynski, identifying Shakespeare’s collaborators, etc. Although the question of authorship of the Federalist Papers has been studied before, what is new in this paper is we highlight a process and tools that can be easily used by Python programmers, and the methods do not rely on any knowledge of statistics or machine learning.","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116938521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"causal-curve: tools to perform causal inference given a continuous treatment","authors":"R. Kobrosly","doi":"10.25080/majora-1b6fd038-01d","DOIUrl":"https://doi.org/10.25080/majora-1b6fd038-01d","url":null,"abstract":"","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129722150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Statistics with Python, No Resampling Necessary","authors":"C. Lindsey","doi":"10.25080/gerudo-f2bc6f59-005","DOIUrl":"https://doi.org/10.25080/gerudo-f2bc6f59-005","url":null,"abstract":"—TensorFlow Probability is a powerful library for statistical analysis in Python. Using TensorFlow Probability’s implementation of Bayesian methods, modelers can incorporate prior information and obtain parameter estimates and a quantified degree of belief in the results. Resampling methods like Markov Chain Monte Carlo can also be used to perform Bayesian analysis. As an alternative, we show how to use numerical optimization to estimate model parameters, and then show how numerical differentiation can be used to get a quantified degree of belief. How to perform simulation in Python to corroborate our results is also demonstrated.","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123868103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EEG-to-fMRI Neuroimaging Cross Modal Synthesis in Python","authors":"David Calhas","doi":"10.25080/gerudo-f2bc6f59-007","DOIUrl":"https://doi.org/10.25080/gerudo-f2bc6f59-007","url":null,"abstract":"—Electroencepholography (EEG) and functional magnetic resonance imaging (fMRI) are two ways of recording brain activity; the former provides good time resolution but poor spatial resolution, while the converse is true for the latter. Recently, deep neural network models have been developed that can synthesize fMRI activity from EEG signals, and vice versa. Because these generative models simulate data, they make it easier for neuroscientists to test ideas about how EEG and fMRI signals relate to each other, and what both signals tell us about how the brain controls behavior. To make it easier for researchers to access these models, and to standardize how they are used, we developed a Python package, EEG-to-fMRI, which provides cross modal neuroimaging synthesis functionalities. This is the first open source software enabling neuroimaging synthesis. Our main focus is for this package to help neuroscience, machine learning, and health care communities. This study gives an in-depth description of this package, along with the theoretical foundations and respective results.","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130255900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NumPy – Annual Update","authors":"Inessa Pawson","doi":"10.25080/majora-1b6fd038-026","DOIUrl":"https://doi.org/10.25080/majora-1b6fd038-026","url":null,"abstract":"","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131178499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Gommers, Sebastian Berg, Matti Picus, Tyler Reddy, S. Walt, Charles R. Harris
{"title":"Inside NumPy: Preparing for the next decade","authors":"R. Gommers, Sebastian Berg, Matti Picus, Tyler Reddy, S. Walt, Charles R. Harris","doi":"10.25080/majora-7ddc1dd1-01d","DOIUrl":"https://doi.org/10.25080/majora-7ddc1dd1-01d","url":null,"abstract":"","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132688651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Python to Model Biomass Pyrolysis Reactors","authors":"G. Wiggins","doi":"10.25080/MAJORA-7DDC1DD1-01A","DOIUrl":"https://doi.org/10.25080/MAJORA-7DDC1DD1-01A","url":null,"abstract":"","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115039856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Microsimulation and Activity Allocation in Python: An Update on the Likeness Toolkit","authors":"Joseph V. Tuccillo, James D. Gaboardi","doi":"10.25080/gerudo-f2bc6f59-00c","DOIUrl":"https://doi.org/10.25080/gerudo-f2bc6f59-00c","url":null,"abstract":"—Understanding human security and social equity issues within human systems requires large-scale models of population dynamics that simulate high-fidelity representations of individuals and access to essential activities (work/school, social, errands, health). Likeness is a Python toolkit that provides these capabilities for Oak Ridge National Laboratory’s (ORNL) UrbanPop spatial microsimulation project. In step with the initial development phase for Likeness (2021 - 2022), we built out several foundational examples of work/school and health service access. In this paper, we describe expansion and scaling of Likeness capabilities to metropolitan areas in the United States. We then provide an integrated demonstration of our methods based on a case study of Leon County, FL and perform validation exercises on 1) neighborhood demographic composition and 2) visits by demographic cohorts (gender/age) obtained from point of interest (POI) footfall data for essential services (grocery stores). Taking into account lessons learned from our case study, we scope improvements to our model as well as provide a roadmap of the anticipated Likeness development cycle into 2023 - 2024.","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121960650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Reduction Network","authors":"Haoyin Xu, Haw-minn Lu, J. Unpingco","doi":"10.25080/gerudo-f2bc6f59-012","DOIUrl":"https://doi.org/10.25080/gerudo-f2bc6f59-012","url":null,"abstract":"—Multidimensional categorical data is widespread but not easily visualized using standard methods. For example, questionnaire (e.g. survey) data generally consists of questions with categorical responses (e.g., yes/no, hate/dislike/neutral/like/love). Thus, a questionnaire with 10 questions, each with five mutually exclusive responses, gives a dataset of 5 10 possible observations, an amount of data that would be hard to reasonably collect. Hence, this type of dataset is necessarily sparse. Popular methods of handling categorical data include one-hot encoding (which exacerbates the dimensionality problem) and enumeration, which applies an unwarranted and potentially misleading notional order to the data. To address this, we introduce a novel visualization method named Data Reduction Network (DRN). Using a network-graph structure, the DRN denotes each categorical feature as a node with interrelationships between nodes denoted by weighted edges. The graph is statistically reduced to reveal the strongest or weakest path-wise relationships between features and to reduce visual clutter. A key advantage is that it does not “lose” features, but rather represents interrelationships across the entire categorical feature set without eliminating weaker relationships or features. Indeed, the graph representation can be inverted so that instead of visualizing the strongest interrelationships, the weakest can be surfaced. The DRN is a powerful visualization tool for multi-dimensional categorical data and in particular data derived from surveys and questionaires.","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"136-137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117146549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualize 3D scientific data in a Pythonic way like matplotlib","authors":"T. Koyama","doi":"10.25080/majora-1b6fd038-01c","DOIUrl":"https://doi.org/10.25080/majora-1b6fd038-01c","url":null,"abstract":"","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129149854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}