Question: Interpretation of p-value plot
0
gravatar for karlien.springael
17 months ago by
karlien.springael0 wrote:

Hello,

I have to analyse a microarray for differential expressed genes. The used microarray has the following accession number: GSE36980. I just picked 4 control arrays and 4 alzheimer arrays for the analysis. I made the next code:

library('affy')

library('limma')

data.raw <- ReadAffy("GSM907796.CEL.gz","GSM907808.CEL.gz","GSM907797.CEL.gz","GSM907809.CEL.gz","GSM907798.CEL.gz","GSM907810.CEL.gz","GSM907799.CEL.gz","GSM907811.CEL.gz") #Frontal

#Create dataframe
pd <- data.frame(experiment = c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control'))
sampleNames(data.raw) <- c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control')
metaData <- data.frame(labelDescription = c('experiment'))
phenoData(data.raw) <- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)
protocolData(data.raw)<- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)

#preprocessing

data.bg.norm <- affy::rma(data.raw)

expression <- data.bg.norm@assayData$exprs

###Differential expression###
#make model
X <- model.matrix(~0+factor(data.bg.norm$experiment))
colnames(X) <- c("Alzheimer","Control")

#Fit model -> Fit linear model for each gene given a series of arrays
LmFit <- lmFit(expression,X)
mc <- makeContrasts("Alzheimer-Control", levels = X)
c.fit <- contrasts.fit(LmFit, mc)
eb <- eBayes(c.fit)

# extract the p values of the F tests
modFpvalue <- eb$F.p.value

#make a plot of the non-adjusted and adjusted pvalues

hist(p.adjust(modFpvalue,method='BH'),probability='TRUE')
hist(modFpvalue)

#Which genes are significant
results <- decideTests(eb, method = "nestedF",adjust.method= 'BH',p.value =0.05)
 

neut_indices_array <- which(results == 0)
neg_indices_array <- which(results == -1)
pos_indices_array <- which(results == 1)

 

 

When I run this code I get a histogram of modFpvalues which seems to show differential expressed genes, because the distribution of the p-values isn't uniform. But the output of the last step shows that not a single gene is differential expressed (after adjustment). I don't understand this, I thought that if the  pvalue distribution isn't uniform, you'll have differential expressed genes anyway but apparently not.

ADD COMMENTlink modified 17 months ago by Gordon Smyth38k • written 17 months ago by karlien.springael0
Answer: Interpretation of p-value plot
0
gravatar for Gordon Smyth
17 months ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

No, your assumption is not entirely correct. While a p-value histogram that is enriched for small p-values will indeed tend to yield significantly DE genes, it will not necessarily do so at the reasonably stringent FDR cutoff that you have selected. The enrichment needs to be of a certain magnitude before you can achieve FDR < 0.05.

If you analysed the whole GEO dataset, instead of just picking a subset of n=4 AD arrays and n=4 controls at random, you might get more interesting results. n=4 would very rarely be anywhere near a large enough sample size for studying a human disease like Alzheimer's. I would expect to need dozens of cases and dozens of controls at least, and even then very careful batch correct (RUV perhaps) would probably be needed.

I wonder why you are using method="nestedF". Since you have only two groups here, nestedF will just give the same result as would a moderated t-test. If you don't understand well what nestedF does, it would be better to stick to the limma defaults.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Gordon Smyth38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 214 users visited in the last hour