Using R package 'limma' on pre-normalized counts and the effect of non-equivalent reads/sample on limma linear model
1
0
Entering edit mode
tealfurn • 0
@tealfurn-11619
Last seen 4.8 years ago

Greetings!

I am doing RNA-seq on a microbiome sample. I therefore get many (~70% of all reads) multi-mapped reads due to species similarities (even using a clustered reference database).

These are good reads and real hits, but this causes the # of reads/sample to be largely inflated and would not be correct to use for normalization. I think if I normalize to the "real" # of reads I can overcome this problem, but I would have to feed into limma-voom pipeline (below).

Q1: where in the limma-voom pipeline (below) should I modify to prevent it from trying to normalize?

Q2: if there are an unequal # of reads that limma  "sees", or if I remove "voom" when it tries to do the model, does that ruin the modeling itself?

__basic limma-voom pipeline__

1. make DESeq object

y = DGEList(counts = RPKMs, genes = info.matrix, remove.zeros = TRUE)

2. voom

y = calcNormFactors(y, method = "TMM", doWeighting=TRUE)

v=voom(y,design.matrix,plot = TRUE, normalize="quantile")

3. limma

fit = lmFit(v,design.matrix)

fit2 = contrasts.fit(fit, contrast.matrix)

fit2 = eBayes(fit2)

2
Entering edit mode
@steve-lianoglou-2771
Last seen 1 day ago
Denali

If all you've got is RPKMs, then don't go the DGEList -> voom route, but rather log2(rpkms + 1) your data and use the limma-trend pipeline.

Going off of the code you provided, that might look something like this:

rpkms.all <- log2(RPKMs + 1)
## remove lowly expressed genes, the threshold I'm using is just an example
rpkms <- rpkms.all[rowMeans(rpkms.all) >= 1), ]
fit <- lmFit(rpkms, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2, trend=TRUE)
...

Searching this site for "voom" or "llimma trend" will provide more references to dig further if this isn't sufficient