Metaphlan2 marker database used by curatedMetagenomicData
1
0
Entering edit mode
grp2009 • 0
@grp2009-11057
Last seen 3.3 years ago

Which version of the Metaphlan2 marker database was used to generate the marker and bug abundance data in curatedMetagenomicData? I am looking for some species that are represented in v20 of the Metaphlan2 marker database, but can't find them in the curatedMetagenomicData package. If an earlier marker database was used, are there plans to rerun the analyses using a more complete database?

curatedmetagenomicdata • 1.6k views
1
Entering edit mode
@levi-waldron-3429
Last seen 6 weeks ago
CUNY Graduate School of Public Health a…

We used MetaPhlAn 2.0, with ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic). Note that each dataset shows only clades that were present in at least one specimen - are you sure that the species you are looking for is not present in any dataset in curatedMetagenomicData? All species in the phylogeny are available as follows:

suppressPackageStartupMessages(library(curatedMetagenomicData))
suppressPackageStartupMessages(library(ape))
tree <- getMetaphlanTree()
length(tree$tip.label) #> [1] 4088 head(tree$tip.label)
#> [1] "s__Methanopyrus_kandleri"
#> [2] "s__Methanothermus_fervidus"
#> [3] "s__Methanothermobacter_thermautotrophicus"
#> [4] "s__Methanothermobacter_marburgensis"
#> [5] "s__Methanobacterium_formicicum"


Or the complete list of rownames directly from the datasets, including all levels of taxonomy. This list may be missing anything from the above tree that was not present in any dataset.

suppressPackageStartupMessages(library(curatedMetagenomicData))
suppressPackageStartupMessages(library(magrittr))
suppressMessages(res <- curatedMetagenomicData("*metaphlan*", dryrun = FALSE))
all <- lapply(res, rownames) %>%
unlist() %>%
unique()
length(all)
#> [1] 4690
#> [1] "k__Bacteria"                   "k__Viruses"
#> [3] "k__Bacteria|p__Proteobacteria" "k__Bacteria|p__Actinobacteria"
#> [5] "k__Bacteria|p__Firmicutes"     "k__Viruses|p__Viruses_noname"

0
Entering edit mode
I can add that I've confirmed there have been no changes to the metaphlan2 database since what was used to generate curatedMetagenomicData. We'll do a complete update after Metaphlan3 is released in the coming months.
0
Entering edit mode

Thanks Levi. A few questions:

- Are viruses not included in the output of getMetaphlanTree? If I run it (with simplify=F to show full taxonomy) I see Archaea and Bacteria but no Viruses

- Is it possible to distinguish an organism that was not found in a sample from an organism that was not looked for? This is where it would be useful to know which Metaphlan marker database version was used (v20?).

0
Entering edit mode

- You're right about viruses not being included in the phylogenetic tree. That's because it was generated by PhyloPhlAn: microbial Tree of Life using 400 universal proteins (bacterial and archaeal proteins, that is): https://bitbucket.org/nsegata/phylophlan/wiki/Home. More generally, although I'm not an evolutionary biologist, I understand that "Viruses cannot be included in the tree of life because they do not share characteristics with cells, and no single gene is shared by all viruses or viral lineages. While cellular life has a single, common origin, viruses are polyphyletic – they have many evolutionary origins." - taken from http://www.virology.ws/2009/03/19/viruses-and-the-tree-of-life/.

- The marker database version reflected in all cMD versions up til now is mpa_v20_m200.

0
Entering edit mode

Excellent, thanks!