Taxonomic DatabaseThe taxonomic database used is an Illumina-curated version of the May 2013 release of the Greengenes Consortium Database (greengenes.secondgenome.com/downloads). Here are the current statistics for that database:
To get taxonomies down to the species level, we used the Greengenes SQL database files (gg_13_5.sql.gz). Specifically our database started off with everything contained in the Greengenes clones, isolates, and symbionts tables. From there, we apply a set of filters:
The Greengenes database had a number of classifications placed in the wrong field. i.e. improper genus or species names, placing clone or strain IDs in the species field, etc. We developed a program to help identify and clean up these entries. Ambiguous epithets and classifications (sp, aff, cf, genosp, genomosp) were removed, because they effectively mean the same thing as an empty taxonomic level. Listeria monocytogenes (GenBank entry X56153.1), Listeria innocua (GenBank entry FJ774235.1), and PhiX (NCBI reference sequence: NC_001422) were added to the database to support internal research projects. |
|
||
© 2014 Illumina, Inc. All rights reserved. |
15055861 A |