no significant differences in gene expression-limma
2
0
Entering edit mode
@sabet-julia-a-6404
Last seen 9.6 years ago

Hi all,
I have just analyzed some Affymetrix Mouse Gene 2.0 ST arrays using limma and I found no significantly differentially expressed genes (according to adjusted p value being <0.05).  I tried a few different filtering methods, including no filter, and I tried to analyze males and females separately (making separate esets) as well as together. I adjusted for batch dates as well.  I have a sample size of 36 (12 per group).  I did the quality control analysis on Affymetrix Expression Console (not in R) and everything passed according to Affy's quick reference guide and then I proceeded to read in the CEL files and analyze using the code below.  Is there any step that I have left out, or anything else I can do to try to find significant differences?  Any quality control steps in R that I should do besides what I've done already, or should I try a different multiple comparison method besides BH?  My advisor refuses to believe that there could be no significant differences...
Thank you for your help.
Julia

#Load libraries
library(limma)
library(oligo)
#Load CEL files from working directory
celFiles <- list.celfiles()
affyRaw <- read.celfiles(celFiles)
#Normalize/background correct
librarypd.mogene.2.0.st)
eset <- rma(affyRaw)
#Save expression set to an output file
write.exprs(eset,file="maleLog2transformedNormalizeddata02202014.txt")
#Adding gene annotation to eset
library(mogene20sttranscriptcluster.db)
my_frame <- data.frame(exprs(eset))
mogene20sttranscriptcluster()
Annot <-
data.frame(ACCNUM=sapply(contents(mogene20sttranscriptclusterACCNUM), paste, collapse=", "),
SYMBOL=sapply(contents(mogene20sttranscriptclusterSYMBOL), paste, collapse=", "),
DESC=sapply(contents(mogene20sttranscriptclusterGENENAME), paste, collapse=", "))
all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)
write.table(all,file="maleannotateddata02202014.txt",sep="\t")
#Filter background (very low expression) probes
library(genefilter)
ind <- genefilter(eset, filterfun(kOverA(12, 6)))
eset.filt <- eset[ind,]
dim(eset.filt)
#Or filter by using antigenomic probesets as a measure of
background(filtered more genes out)
librarypd.mogene.2.0.st)
con <- dbpd.mogene.2.0.st)
dbGetQuery(con, "select * from type_dict;")
#type of probes in my array:
table(dbGetQuery(con, "select type from featureSet;")[,1])
antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner
join featureSet on core_mps.fsetid=featureSet.fsetid where
featureSet.type='4';")
bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, probs=0.95)
minval <- max(bkg)
ind <- genefilter(eset, filterfun(kOverA(12, minval)))
eset.filtB<-eset[ind,]
dim(eset.filtB)
#Filter out control probes and background probes
library(affycoretools)
eset.filtC <- getMainProbes(eset)
library(genefilter)
ind <- genefilter(eset.filtC, filterfun(kOverA(12, 6)))
eset.dblfiltmale <- eset.filtC[ind,]
dim(eset)
dim(eset.filtC)
dim(eset.dblfilt)
#Read targets file into R
targets <- readTargets("Targets.txt", row.names="Name")
#Make design matrix with only diet as covariate
f <- factor(targets$PaternalDiet, levels=c("c","d","s"))
design <- model.matrix(~0+f)
colnames(design)<-c("c","d","s")
fit <- lmFit(eset.filtB, design)
#Make contrast matrix and models
contrast.matrix <- makeContrasts(d-c, s-d, s-c, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
#Show most significant differentially expressed genes- do for each coefficient separately
topTable(fit2, coef=1, adjust="BH")
#Show top F stats(sig. differences in any contrast)
topTableF(fit2, number=30)
# Make design matrix for 2 factors (sex and diet), adjusted for batch(date)
DS <- paste(targets$PaternalDiet, targets$Sex, sep=".")
DS<-factor(DS,
levels=c("c.female","d.female","s.female","c.male","d.male","s.male"))
design <- model.matrix(~0+DS+Date, targets)
colnames(design) <- gsub("targets", "", colnames(design))
colnames(design)[7:9] <- paste0("Date", 1:3)
fit <- lmFit(eset.dblfilt, design)
cont.matrix <- makeContrasts(
DvsCinMale=DSd.male-DSc.male,
SvsCinMale=DSs.male-DSc.male,
SvsDinMale=DSs.male-DSd.male,
DvsCinFemale=DSd.female-DSc.female,
SvsCinFemale=DSs.female-DSc.female,
SvsDinFemale=DSs.female-DSd.female,
levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
topTable(fit2, coef=1, adjust="BH")
topTableF(fit2, number=30)
Annotation • 3.3k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Dear Julia,

Have you made some plots of your data?  For example, plotMDS().  One should always do this, and it's not the same as an Affy Expression Console quality analysis.

Do the treated and untreated groups look different in an MDS plot?  If they do, there you will certainly get DE.  If they don't, then the plot will show you which samples are not doing what you think they should, and then you can follow up why this might be so.

Best wishes
Gordon

PS. There's no rule that says you must use FDR < 0.05.  Larger cutoffs can be used.

ADD COMMENT
0
Entering edit mode

Thank you Gordon.  I looked up the R documentation for plotMDS and did the following code, using my normalized eset, and came up with the following error.  Could you or someone else tell me what I did wrong and if I am missing some code?  I am new to R and had never heard of plotMDS before...I don't really understand what dim.plot is.
Thank you!

> mds<-plotMDS(eset.dblfiltmale, top=500,
labels=colnames(eset.dblfiltmale), col=NULL, cex=1, dim.plot=c(1,2),
ndim=max(dim.plot), gene.selection="pairwise",
xlab=paste("Dimension",dim.plot[1]),
ylab=paste("Dimension",dim.plot[2]))
Error in plotMDS.default(eset.dblfiltmale, top = 500, labels =
colnames(eset.dblfiltmale),  :
  object 'dim.plot' not found
ADD REPLY
0
Entering edit mode
Hi, On Mon, Feb 24, 2014 at 7:13 AM, Sabet, Julia A <julia.sabet at="" tufts.edu=""> wrote: > Thank you Gordon. I looked up the R documentation for plotMDS and did the following code, using my normalized eset, and came up with the following error. Could you or someone else tell me what I did wrong and if I am missing some code? I am new to R and had never heard of plotMDS before...I don't really understand what dim.plot is. > Thank you! > >> mds<-plotMDS(eset.dblfiltmale, top=500, labels=colnames(eset.dblfiltmale), col=NULL, cex=1, dim.plot=c(1,2), ndim=max(dim.plot), gene.selection="pairwise", xlab=paste("Dimension",dim.plot[1]), ylab=paste("Dimension",dim.plot[2])) > Error in plotMDS.default(eset.dblfiltmale, top = 500, labels = colnames(eset.dblfiltmale), : > object 'dim.plot' not found This is failing because you have are calling a function (`dim`) on an object (`dim.plot`) that is not defined. A simpler example that triggers the same error is like so: R> fn <- function(x, y) x + y R> fn(x=1:10, y=max(x)) Operating on "undefined" arguments in a *function signature* is different than operating on undefined objects when you *call* the function (which is what I'm doing here). I remember a decent tutorial on function/variable scoping in R that called this "split-horizon" scoping, but google is only pointing me to this: http://blog.moertel.com/posts/2006-01-20-wondrous-oddities-rs- function-call-semantics.html Which I don't think is the one I had in mind, but it does touch on this subject a bit. Anyway, your solution (for now) is to call plotMDS without `ndim=max(dim.plot)`. In the long term, the solution would be to get a better handle on some of R's calling semantics ;-) Hope that helps, -steve -- Steve Lianoglou Computational Biologist Genentech
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Dear Julia,

I see that you have read the help page for plotMDS(), which is good, and that you have tried to copy the usage line given in the help page into your R session.

This isn't at all necessary.  In practice, you just type

   plotMDS(eset.dblfiltmale)

perhaps adding a few arguments if you want to customize the plot.

I can see that the R style of documentation can be a bit counter-intuitive, but the purpose of the usage line on the help page is to describe the defaults for all the arguments.  The reason that the defaults are set is so that you don't have to set them yourself.  To get an idea of how the function is used in practice it is best to look at the example code given at the bottom of the help page or else at the case studies in the limma User's Guide.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Thank you for your explanation, Gordon!  I was finally able to figure out how to generate the plot using the following code, with labels corresponding to my 3 different diet groups (the example on the help page definitely helped):

>plotMDS(mds,  col=c(rep("black",12), rep("red",12), rep("blue",12)),
labels= c(rep("c",12), rep("d",12), rep("s",12)))

What resulted was a plot with 2 distinct clusters, but each cluster has a mix of diet groups represented.  I'm pretty sure the clusters are according to batch date, because when I made PCA plots, they look about the same, with the same number of samples in each cluster.  I have already included batch date in my limma models to account for this.  I also performed the arrayQualityMetrics function and there were no outlier arrays that were found.  Is there anything else I should do to try to identify differentially expressed genes?  I am pretty sure there are at least some differences, since we observed differences in body weight among the mice.  Thank you

ADD REPLY
0
Entering edit mode

There's no need to guess whether the clusters correspond to batch date.  Try

   plotMDS(mds, labels=batchdate)

to confirm whether or not this is so.

You could also try looking at MDS dimensions 2, 3 and 4, for example by

   plotMDS(mds, dim=c(3,4))

If you don't see the diets separating on any dimension, then it is hard to argue that you really have systematic differential expression.

> I also performed the arrayQualityMetrics function and there were
> no outlier arrays that were found.

The issue is with the RNA samples themselves and what they represent, rather than with the quality of the microarrays.

> Is there anything else I should do to try to identify differentially
> expressed genes?  I am pretty sure there are at least some differences,
> since we observed differences in body weight among the mice.

You could try:

  ?arrayWeights

Best wishes
Gordon

ADD REPLY
0
Entering edit mode

Thanks for your help. It just occurred to me- should plotMDS be done on normalized data, raw data, filtered data? All of the above?

ADD REPLY
0
Entering edit mode

If you are trying to compare to the DE analysis, then plotMDS should be run on the same data (normalized, filtered) as for DE analysis.

Gordon

ADD REPLY

Login before adding your answer.

Traffic: 881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6