I am doing RNA-seq on a microbiome sample. I therefore get many (~70% of all reads) multi-mapped reads due to species similarities (even using a clustered reference database).
These are good reads and real hits, but this causes the # of reads/sample to be largely inflated and would not be correct to use for normalization. I think if I normalize to the "real" # of reads I can overcome this problem, but I would have to feed into limma-voom pipeline (below).
Q1: where in the limma-voom pipeline (below) should I modify to prevent it from trying to normalize?
Q2: if there are an unequal # of reads that limma "sees", or if I remove "voom" when it tries to do the model, does that ruin the modeling itself?
__basic limma-voom pipeline__
1. make DESeq object
y = DGEList(counts = RPKMs, genes = info.matrix, remove.zeros = TRUE)
y = calcNormFactors(y, method = "TMM", doWeighting=TRUE)
v=voom(y,design.matrix,plot = TRUE, normalize="quantile")
fit = lmFit(v,design.matrix)
fit2 = contrasts.fit(fit, contrast.matrix)
fit2 = eBayes(fit2)
summary(decideTests(fit2, adjust.method = "BH"))