Hello all,
I am working with microarray RNA data and using the voom/limma pipeline. I have repeated measures, so I am utilizing the duplicateCorrelation to account for correlated measurements.
My main question stems from the fact that even after a Bonferroni correction, I am still finding that ~50% of my 800 markers are significant.
The variable of interest Time is coded 0/1. demo.long contains my demographic information, mRNA are the raw counts and have been converted into a DGEList object. I was told that you can perform voom twice to get more accurate estimates when dealing with repeated measurements.
Here is my code:
design <- with(demo.long, model.matrix(~ Time))
mRNA <- calcNormFactors(object = mRNA, method = "TMM")
colnames(mRNA$counts) <- gsub("B|-REF", "", colnames(mRNA))
v <- voom(mRNA, design, plot = F, normalize.method = "quantile")
dc <- duplicateCorrelation(v, design, block = as.factor(demo.long$ID))
vf <- voom(v, design = design, block = as.factor(demo.long$ID), correlation = dc$consensus.correlation, normalize.method = "quantile")
fit.ref <- lmFit(v, design = design, block = as.factor(demo.long$ID), correlation = dc$consensus.correlation)
fit.refEb <- eBayes(fit.ref)
tT.ref <- topTable(fit.refEb,
coef = 2,
n = dim(mRNA)[1],
sort = "logFC",
adjust.method = "bonferroni")
I am curious if there is anything fundamentally wrong with the code above and if there is a glaring reason that could explain the high rate of significant markers.
Thank you.
If this is microarray data, why do you have counts?
I apologize, I just noticed that after posting. The data were stored in .RCC files and were read in with read.markup.RCC in R from the NanoString package.
Well, Nanostring data is quite different from microarray data, see the discussions here. You may want to fix your post if you want people who've actually analyzed Nanostring data to get involved.