An XML application for genomic data interoperation

K. Cheung, Yang Liu, Anuj Kumar, M. Snyder, M. Gerstein, P. Miller
{"title":"An XML application for genomic data interoperation","authors":"K. Cheung, Yang Liu, Anuj Kumar, M. Snyder, M. Gerstein, P. Miller","doi":"10.1109/BIBE.2001.974417","DOIUrl":null,"url":null,"abstract":"As the eXtensible Markup Language (XML) becomes a popular or standard language for exchanging data over the Internet/Web, there are a growing number of genome Web sites that make their data available in XML format. Publishing genomic data in XML format alone would not be that useful if there is a lack of development of software applications that could take advantage of the XML technology to process these XML-formatted data. This paper illustrates the usefulness of XML in representing and interoperating genomic data between two different data sources (Snyder's laboratory at Yale and SGD at Stanford). In particular, we compare the locations of transposon insertions in the yeast DNA sequences that have been identified by BLAST searches with the chromosomal locations of the yeast open reading frames (ORFs) stored in SGD. Such a comparison allows us to characterize the transposon insertions by indicating whether they fall into any ORFs (which may potentially encode proteins that possess essential biological functions). To implement this XML-based interoperation, we used NCBIs \"blastall\" (which gives an XML output option) and SGD's yeast nucleotide sequence dataset to establish a local blast server. Also, we converted the SGD's ORF location data file (which is available in tab-delimited formal) into an XML document based on the BIOML (BIOpolymer Markup Language) standard.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2001.974417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

As the eXtensible Markup Language (XML) becomes a popular or standard language for exchanging data over the Internet/Web, there are a growing number of genome Web sites that make their data available in XML format. Publishing genomic data in XML format alone would not be that useful if there is a lack of development of software applications that could take advantage of the XML technology to process these XML-formatted data. This paper illustrates the usefulness of XML in representing and interoperating genomic data between two different data sources (Snyder's laboratory at Yale and SGD at Stanford). In particular, we compare the locations of transposon insertions in the yeast DNA sequences that have been identified by BLAST searches with the chromosomal locations of the yeast open reading frames (ORFs) stored in SGD. Such a comparison allows us to characterize the transposon insertions by indicating whether they fall into any ORFs (which may potentially encode proteins that possess essential biological functions). To implement this XML-based interoperation, we used NCBIs "blastall" (which gives an XML output option) and SGD's yeast nucleotide sequence dataset to establish a local blast server. Also, we converted the SGD's ORF location data file (which is available in tab-delimited formal) into an XML document based on the BIOML (BIOpolymer Markup Language) standard.
用于基因组数据互操作的XML应用程序
随着可扩展标记语言(eXtensible Markup Language, XML)成为在Internet/Web上交换数据的流行语言或标准语言,越来越多的基因组网站以XML格式提供数据。如果没有开发能够利用XML技术处理这些XML格式数据的软件应用程序,那么单独以XML格式发布基因组数据就没有多大用处。本文说明了XML在两个不同数据源(耶鲁大学Snyder的实验室和斯坦福大学SGD)之间表示和互操作基因组数据方面的有用性。特别地,我们将BLAST搜索确定的酵母DNA序列中的转座子插入位置与存储在SGD中的酵母开放阅读框(orf)的染色体位置进行了比较。这样的比较使我们能够通过指示转座子插入是否属于任何orf(可能编码具有基本生物学功能的蛋白质)来表征转座子插入。为了实现这种基于XML的互操作,我们使用ncbi的“blastall”(它提供了一个XML输出选项)和SGD的酵母核苷酸序列数据集来建立本地blast服务器。此外,我们还将SGD的ORF位置数据文件(以制表符分隔的形式提供)转换为基于BIOML(生物聚合物标记语言)标准的XML文档。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信