Interpretation of p-value plot
1
0
Entering edit mode
@karlienspringael-15749
Last seen 6.6 years ago

Hello,

I have to analyse a microarray for differential expressed genes. The used microarray has the following accession number: GSE36980. I just picked 4 control arrays and 4 alzheimer arrays for the analysis. I made the next code:

library('affy')

library('limma')

data.raw <- ReadAffy("GSM907796.CEL.gz","GSM907808.CEL.gz","GSM907797.CEL.gz","GSM907809.CEL.gz","GSM907798.CEL.gz","GSM907810.CEL.gz","GSM907799.CEL.gz","GSM907811.CEL.gz") #Frontal

#Create dataframe
pd <- data.frame(experiment = c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control'))
sampleNames(data.raw) <- c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control')
metaData <- data.frame(labelDescription = c('experiment'))
phenoData(data.raw) <- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)
protocolData(data.raw)<- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)

#preprocessing

data.bg.norm <- affy::rma(data.raw)

expression <- data.bg.norm@assayData$exprs

###Differential expression###
#make model
X <- model.matrix(~0+factor(data.bg.norm$experiment))
colnames(X) <- c("Alzheimer","Control")

#Fit model -> Fit linear model for each gene given a series of arrays
LmFit <- lmFit(expression,X)
mc <- makeContrasts("Alzheimer-Control", levels = X)
c.fit <- contrasts.fit(LmFit, mc)
eb <- eBayes(c.fit)

# extract the p values of the F tests
modFpvalue <- eb$F.p.value

#make a plot of the non-adjusted and adjusted pvalues

hist(p.adjust(modFpvalue,method='BH'),probability='TRUE')
hist(modFpvalue)

#Which genes are significant
results <- decideTests(eb, method = "nestedF",adjust.method= 'BH',p.value =0.05)
 

neut_indices_array <- which(results == 0)
neg_indices_array <- which(results == -1)
pos_indices_array <- which(results == 1)

 

 

When I run this code I get a histogram of modFpvalues which seems to show differential expressed genes, because the distribution of the p-values isn't uniform. But the output of the last step shows that not a single gene is differential expressed (after adjustment). I don't understand this, I thought that if the  pvalue distribution isn't uniform, you'll have differential expressed genes anyway but apparently not.

p values differential expression limma • 1.1k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

No, your assumption is not entirely correct. While a p-value histogram that is enriched for small p-values will indeed tend to yield significantly DE genes, it will not necessarily do so at the reasonably stringent FDR cutoff that you have selected. The enrichment needs to be of a certain magnitude before you can achieve FDR < 0.05.

If you analysed the whole GEO dataset, instead of just picking a subset of n=4 AD arrays and n=4 controls at random, you might get more interesting results. n=4 would very rarely be anywhere near a large enough sample size for studying a human disease like Alzheimer's. I would expect to need dozens of cases and dozens of controls at least, and even then very careful batch correct (RUV perhaps) would probably be needed.

I wonder why you are using method="nestedF". Since you have only two groups here, nestedF will just give the same result as would a moderated t-test. If you don't understand well what nestedF does, it would be better to stick to the limma defaults.

ADD COMMENT

Login before adding your answer.

Traffic: 1009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6