lmfit zero genes fit model

0

Entering edit mode

Urska Cvek ▴ 20

@urska-cvek-2094

Last seen 9.6 years ago

Hello BioC, I am using the limma package and the lmfit function, as described in section 8.7 here: http://bioconductor.org/packages/1.9/bioc/vignettes/limma/inst/doc/use rsguide.pdf. I have used this same package before, on a different data set, and it worked nice, so I am thinking there's something up with my data that I don't understand completely. I have 8 Affy chips, for a factorial design with Deletion and Strain as the two factors: Filename Deletion Strain Ht-2002.cel h t2 Ht-2003.cel h t2 Hn2-2002.cel h n2 Hn2-2003.cel h n2 Lt-2002.cel l t2 Lt-2003.cel l t2 Ln2-2002.cel l n2 Ln2-2003.cel l n2 readAffy function is used to read in the AffyBatch object "a" using the above PhenoData. Since these experiments were done as replicate pairs, I first normalize the data by pairs. normalize(a[,1:2]), and repeate for the other three normalized.a = merge(tmp1,tmp2) normalized.a = merge(normalized.a, tmp3) normalized.a = merge(normalized.a, tmp4) x=rma(normalized.a, normalize=FALSE) I look at the boxplots for the raw intensities, and expression set "x" intensities, and the second set looks much better. TS <- paste(pd$Deletion, pd$Strain, sep=".") # extract all combinations in one vector TS # fit a model with a coefficient for each of the four factor combinations # and then extract the comparisons of interest as contrasts TS <- factor(TS, levels=c("h.t2", "h.n2", "l.t2", "l.n2")) design <- model.matrix(~0+TS) > design h.t2 h.n2 low.t2 low.n2 1 1 0 0 0 2 1 0 0 0 3 0 1 0 0 4 0 1 0 0 5 0 0 1 0 6 0 0 1 0 7 0 0 0 1 8 0 0 0 1 colnames(design) <- levels(TS) fit <- lmFit(x, design) cont.matrix <- makeContrasts( H.n2vst2=h.t2-h.n2, L.n2vst2=l.t2-l.n2, Diff=(h.n2-h.t2)-(l.n2-l.t2), levels=design) > cont.matrix Contrasts Levels H.n2vsi2 L.n2vsi2 Diff h.t2 1 0 -1 h.n2 -1 0 1 low.t2 0 1 1 low.n2 0 -1 -1 fit2 <- contrasts.fit(fit, cont.matrix) fit2 <- eBayes(fit2) # display the counts results <- decideTests(fit2) vennDiagram(results) This vennDiagram is empty (has zeros for counts). There are no genes that would be differentially expressed in other of the two linear models, which I find very surprising. I know that it's possible that not many genes would be differentially expressed, but I don't understand why there are no genes that would fit the model at all. I have tried just using the rma on the whole set of 8 chips as well, without the separate normalization step, but this does not yield much luck either. I also added the fourth replicate to the experiment, making it a total of 12 chips, but that chip had such different distribution that I removed it from the analysis. What should I do to be able to answer the above questions? Is the caused by my data and I cannot find such genes? Thank you in advance, your help is appreciated. U.C.

Normalization affy limma Normalization affy limma • 1.1k views

ADD COMMENT • link updated 17.1 years ago by James W. MacDonald 65k • written 17.1 years ago by Urska Cvek ▴ 20

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 7 hours ago

United States

Hi Urska, Urska Cvek wrote: > Hello BioC, > > I am using the limma package and the lmfit function, as described in > section 8.7 here: > http://bioconductor.org/packages/1.9/bioc/vignettes/limma/inst/doc/u sersguide.pdf. > I have used this same package before, on a different data set, and it > worked nice, so I am thinking there's something up with my data that I > don't understand completely. > > I have 8 Affy chips, for a factorial design with Deletion and Strain > as the two factors: > Filename Deletion Strain > Ht-2002.cel h t2 > Ht-2003.cel h t2 > Hn2-2002.cel h n2 > Hn2-2003.cel h n2 > Lt-2002.cel l t2 > Lt-2003.cel l t2 > Ln2-2002.cel l n2 > Ln2-2003.cel l n2 > > readAffy function is used to read in the AffyBatch object "a" using > the above PhenoData. > Since these experiments were done as replicate pairs, I first > normalize the data by pairs. There are very few instances where this would be either necessary or a good thing to do. IMO you are much better off just running rma() on all chips together. > > normalize(a[,1:2]), and repeate for the other three > > normalized.a = merge(tmp1,tmp2) > normalized.a = merge(normalized.a, tmp3) > normalized.a = merge(normalized.a, tmp4) > > x=rma(normalized.a, normalize=FALSE) > > I look at the boxplots for the raw intensities, and expression set "x" > intensities, and the second set looks much better. > > TS <- paste(pd$Deletion, pd$Strain, sep=".") # extract all > combinations in one vector > TS > > # fit a model with a coefficient for each of the four factor combinations > # and then extract the comparisons of interest as contrasts > TS <- factor(TS, levels=c("h.t2", "h.n2", "l.t2", "l.n2")) > design <- model.matrix(~0+TS) > > >>design > > h.t2 h.n2 low.t2 low.n2 > 1 1 0 0 0 > 2 1 0 0 0 > 3 0 1 0 0 > 4 0 1 0 0 > 5 0 0 1 0 > 6 0 0 1 0 > 7 0 0 0 1 > 8 0 0 0 1 > > colnames(design) <- levels(TS) > fit <- lmFit(x, design) > > cont.matrix <- makeContrasts( > H.n2vst2=h.t2-h.n2, L.n2vst2=l.t2-l.n2, > Diff=(h.n2-h.t2)-(l.n2-l.t2), > levels=design) > > >>cont.matrix > > Contrasts > Levels H.n2vsi2 L.n2vsi2 Diff > h.t2 1 0 -1 > h.n2 -1 0 1 > low.t2 0 1 1 > low.n2 0 -1 -1 > > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) > > # display the counts > results <- decideTests(fit2) > vennDiagram(results) > > This vennDiagram is empty (has zeros for counts). There are no genes > that would be differentially expressed in other of the two linear > models, which I find very surprising. I know that it's possible that > not many genes would be differentially expressed, but I don't > understand why there are no genes that would fit the model at all. > I have tried just using the rma on the whole set of 8 chips as well, > without the separate normalization step, but this does not yield much > luck either. I also added the fourth replicate to the experiment, > making it a total of 12 chips, but that chip had such different > distribution that I removed it from the analysis. > > What should I do to be able to answer the above questions? Is the > caused by my data and I cannot find such genes? I would start by looking at the output of topTable() for each of your coefficients (or even just topTable(fit2), which will output overall F-stats for your contrasts). It may well be that you don't have much evidence for differential expression. Depending on the true differences between your sample types (actual difference in mRNA amounts as well as the number of differentially expressed genes), you may not have enough replication to detect anything. Note that by default you are using the original BH version of FDR to adjust for multiple comparisons. The number of 'significant' probesets you get will be determined by the range of p-values as well as the number of probesets you are testing. You can increase your power to detect differences by first filtering out those probesets that aren't considered interesting by some criterion (such as low variance, being called absent by the mas5 algorithm, small expression values, etc.), and then fitting your model to the remaining probesets. You might also consider using unadjusted p-values, and using a more stringent p-value cutoff. Something like 0.005. There is nothing really magical about using FDR or any other p-value adjustment - they are just useful for cutting long lists of genes down to a manageable length and giving you something you can say globally about the set of 'significant' probesets. The ordering of your gene list will be the same regardless. You may just have fewer significant probesets than you would expect to see by chance. Best, Jim > > Thank you in advance, your help is appreciated. > > U.C. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.1 years ago James W. MacDonald 65k

Login before adding your answer.