Question: Interpretation of p-value plot
0
13 months ago by
karlien.springael0 wrote:

Hello,

I have to analyse a microarray for differential expressed genes. The used microarray has the following accession number: GSE36980. I just picked 4 control arrays and 4 alzheimer arrays for the analysis. I made the next code:

library('affy')

library('limma')

#Create dataframe
pd <- data.frame(experiment = c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control'))
sampleNames(data.raw) <- c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control')

#preprocessing

data.bg.norm <- affy::rma(data.raw)

expression <- data.bg.norm@assayData$exprs ###Differential expression### #make model X <- model.matrix(~0+factor(data.bg.norm$experiment))
colnames(X) <- c("Alzheimer","Control")

#Fit model -> Fit linear model for each gene given a series of arrays
LmFit <- lmFit(expression,X)
mc <- makeContrasts("Alzheimer-Control", levels = X)
c.fit <- contrasts.fit(LmFit, mc)
eb <- eBayes(c.fit)

# extract the p values of the F tests
modFpvalue <- eb\$F.p.value

hist(modFpvalue)

#Which genes are significant
results <- decideTests(eb, method = "nestedF",adjust.method= 'BH',p.value =0.05)

neut_indices_array <- which(results == 0)
neg_indices_array <- which(results == -1)
pos_indices_array <- which(results == 1)

When I run this code I get a histogram of modFpvalues which seems to show differential expressed genes, because the distribution of the p-values isn't uniform. But the output of the last step shows that not a single gene is differential expressed (after adjustment). I don't understand this, I thought that if the  pvalue distribution isn't uniform, you'll have differential expressed genes anyway but apparently not.

modified 13 months ago by Gordon Smyth37k • written 13 months ago by karlien.springael0
0
13 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

No, your assumption is not entirely correct. While a p-value histogram that is enriched for small p-values will indeed tend to yield significantly DE genes, it will not necessarily do so at the reasonably stringent FDR cutoff that you have selected. The enrichment needs to be of a certain magnitude before you can achieve FDR < 0.05.

If you analysed the whole GEO dataset, instead of just picking a subset of n=4 AD arrays and n=4 controls at random, you might get more interesting results. n=4 would very rarely be anywhere near a large enough sample size for studying a human disease like Alzheimer's. I would expect to need dozens of cases and dozens of controls at least, and even then very careful batch correct (RUV perhaps) would probably be needed.

I wonder why you are using method="nestedF". Since you have only two groups here, nestedF will just give the same result as would a moderated t-test. If you don't understand well what nestedF does, it would be better to stick to the limma defaults.