Home  |   Login  |   Logout  |   Access Information  |   Alerts  |   Purchase History  |   Cart  |   Sitemap  |   Help   
 
Login
BROWSE SEARCH IEEE XPLORE GUIDE SUPPORT
Article Information

Identifying candidate genes using the BioWarehouse: a case study
Yannick Pouliot; Lee, T.J.; Wagner, V.; Karp, P.D.
Systems Engineering, 2005. ICSEng 2005. 18th International Conference on
Volume , Issue , 16-18 Aug. 2005 Page(s): 332 - 340
Digital Object Identifier   10.1109/ICSENG.2005.47
Summary: The BioWarehouse is an open source data warehousing environment focused on supporting bioinformatics databases (DBs). Operating on the MySQL or Oracle relational database management systems (RDBMSs), BioWarehouse integrates public source DBs such as Swiss-Prot and GenBank into a unified normalized schema operating under a single DB management system. BioWarehouse also imposes partial semantic normalization on the source data, thus decreasing semantic heterogeneity and facilitating multi-DB queries using the Structured Query Language (SQL). As an application case study of the BioWarehouse, we have identified candidate genes for "orphan" activities, defined as activities for which no cognate gene sequences exist. 1,356 (36%) of enzymatic activities that have been assigned an enzyme commission (EC) number are orphans (Karp, 2004). Such high prevalence is problematic, given that many of these activities are decades old and often perform essential functions. Most notably, the existence of orphans introduces gaps in sequence data that significantly limit the accuracy of genome annotation and metabolic pathway prediction. Fortunately, with more than 200 hundred genomes sequenced to completion, and with the availability of systems such as BioWarehouse, the computational identification of candidate genes associated with orphan activities can be envisioned. The BioWarehouse's conglomeration of databases, combined with Oracle 10g's native integration of analytical tools into SQL queries (such as the basic local alignment search tool (BLAST) and POSIX regular expressions), enabled us to identify a small number of high-confidence candidate genes associated with a specific orphan activity. We describe the complex queries used in this work to illustrate the value of the data warehousing approach to bioinformatics research.

» View citation and abstract

IEEE Members

Log in by entering your IEEE Web Account Username and Password.

IEEE Communications Society members: If you subscribe to the IEEE Electronic Periodicals Package or IEEE Electronic Periodicals Package Plus, you must access your subscription at www.comsoc.org.

Users at Subscribing Institutions

Check with your librarian, information professional, or system manager to determine if you need to log in. Please complete the online Technical Support Form if you need assistance.

Already Purchased This Article?

Select the Purchase History link to access the document. You will have 5 Days after purchase to access the Full Text PDF. Please complete the online Technical Support Form if you need assistance.

Guests

• Search and access Abstract records free of charge
Register for table of contents alerts
• Purchase Full Text PDF documents

» Learn more about subscription options or how to become an IEEE Member.

You are not logged in.
LOGIN
Username
Password
GO
» Forgot your password?
Please remember to log out when you have finished your session.
You must log in to access:
• Advanced or Author Search
• CrossRef Search
• AbstractPlus Records
• Full Text PDF
• Full Text HTML
Access this document
» Buy this document now
» Learn more about
» Learn more about
   purchasing articles
   and standards
Learn more about IEEE Subscriptions
Indexed by IEE Inspec
© Copyright 2010 IEEE – All Rights Reserved