Is edgeR suitable for differential abundance of UNIref features from metatranscriptomics from HUMANnN2?
1
0
Entering edit mode
Nick • 0
@nick-24226
Last seen 3.4 years ago

Hi,

My question is whether the statistical methods used by edgeR are suitable for detecting significant differences in abundance of UNIref protein family features as generated by HUMAnN2?

HUMAnN2 maps all the microbial (non-host) reads from my metatranscriptome samples onto the uniref protein families, and outputs the counts for each family, normalised to average gene family length (in RPK units).

I have a continuous metadata variable which I would like to correlate with gene family abundance using edgeR/Limma-Voom. I can't see why this wouldn't be a suitable method, but would welcome outside input on this.

HUMAnN also recommends additional tools for normalising data to library size. I wouldn't normally do any normalisation before importing read counts into edgeR though, so I'm thinking this isn't necessary, as it's handled by edgeR normalisation. Would anyone agree with this?

Finally, the data from HUMMAnN comes normalised to "gene length". Would it be better to try to recalculate read counts from the RPK output (i.e. multiple by mean length of gene family) before importing into edgeR, as edgeR input is assumed to be non-normalised, or would it be possible to get away without doing this (and just don't try any kind of gene length normalisation in edgeR.

All input on this is much appreciated

edgeR HUMAnN2.0 • 1.2k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

I'm the senior author of the edgeR project. The HUMAnN2.0 data is not suitable for input to edgeR as it comes out. However I've been in discussions with the HUMANnN2 Lab recently about how we might extract data suitable for edgeR from the HUMANnN2 calculations. So it is something we are exploring, an area of current research.

While not ideal, the best approach for the time being might be a limma-trend (not limma-voom) analysis on log(RPK + 0.01) where RPK is the HUMANnN2 relative abundance values.

ADD COMMENT
0
Entering edit mode

Thanks so much for this qualified answer! Very promising to hear that this is being discussed. I would certainly think HUMANnN2 -> edgeR would be a very intuitive and powerful way to analyse this type of data.

Thanks for your suggestion, I will try this approach

ADD REPLY
0
Entering edit mode

Would you include a library size (RPKM) normalisation in this, or just log(RPK + 0.01) as you suggest?

My code is either:

# load packages

library(edgeR)
library(limma)

# No lib size normalisation

# normalise data 

data = log2(data + 0.01)

# generate model matrix

mmp <- model.matrix(~meta$Parab)

# fit linear model

lmfitp <- lmFit(data, mmp)
lmfitp <- eBayes(lmfitp, trend=TRUE)

OR

# read data into edgeR

dds <- DGEList(data)

# normalisation with library size

logCPM = cpm(dds, log=T, prior.count = 0.01)

# generate model matrix

mmp <- model.matrix(~meta$Parab)

# fit linear model

lmfitp <- lmFit(logCPM, mmp)
lmfitp <- eBayes(lmfitp, trend=TRUE)

It seems more sensible to normalise to library size

ADD REPLY
0
Entering edit mode

I mean the first. As far as I know, whatever comes out of HUMAnN2 is already normalized for library size. Try computing column sums to see. In any case, you can't use cpm or DGEList (or any edgeR functions) because you don't have counts.

ADD REPLY
0
Entering edit mode

I think the HUMAnN manual states that by default abundances are returned in RPK units, not normalised for library size, and ColSums of the output varies between samples. I think there is a humann2 accessory script to normalise for library size though, so in this case the best approach is.

  1. normalise to library size using accessory script
  2. take log using logData <- log(data + 0.01)
  3. fit linear model with
lmfit <- lmFit(logData, mmp)
lmfit <- eBayes(lmfit, trend=TRUE)
ADD REPLY

Login before adding your answer.

Traffic: 1029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6