multi-level, multi-factor results interpretation.

0

Entering edit mode

Ingrid Lindquist ▴ 40

@ingrid-lindquist-5181

Last seen 9.7 years ago

Hi Simon, I am working with a multi-factor experiment that, in a sense, doesn't have biological replicates (replicates were done at different times, and the differences in the chemistry/instrumentation is enough to be a rather large component of variance) and has three levels. By three levels, I mean there are 2 different treatments and 1 control. When I push the workflow through, everything works fine, but I'm having a hard time delineating which level is responsible for a gene being identified as significantly differentially expressed. I understand that (in the following example) ConditionX, ConditionY, AlternativeVariableold are log2 fold changes in relation to "ctrl" (in the case of condition) or "new" (in the case of AlternativeVariable). Can I assume that the largest FC of these three fields within each gene instance is likely the reason that gene had a padj within significance? My plan is to separate out the dataset so that I'm only running 2 levels at once (x vs ctrl; y vs ctrl), but I'm hoping you can shed light on how I can interpret these results I've mentioned here. The output I'm referring to has the following headers (generated by the last line (write.table) in my workflow below): Gene Pval Padj Intercept ConditionX ConditionY AlternativeVariableold Deviance Converged My workflow: counts <- (file, header=TRUE, row.names=1) Design<- data.frame( + row.names = colnames(SampledCounts), + Condition = c( "x", "ctrl", "ctrl", "x", "y", "y"), + AlterativeVariable= c( "old", "new", "old", "new", "new", "old")) library(DESeq) cdsFull <- newCountDataSet (counts, Design) cdsFull <-estimateSizeFactors(cdsFull) cdsFull <-estimateDispersions(cdsFull, method="blind", sharingMode="fit-only") fit1<-fitNbinomGLMs(cdsFull, count ~ condition + AlternativeVariable) fit0<-fitNbinomGLMs(cdsFull, count ~ AlternativeVariable) pvalsGLM <-nbinomGLMTest(fit1, fit0) padjGLM <-p.adjust(pvalsGLM, method = "BH") write.table(data.frame(geneID=row.names(counts(cdsFull)), pval=pvalsGLM, padj=padjGLM, fit1), file= "result.txt") Thanks for your help, Ingrid

• 1.1k views

ADD COMMENT • link updated 12.1 years ago by Wolfgang Huber ★ 13k • written 12.1 years ago by Ingrid Lindquist ▴ 40

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 weeks ago

EMBL European Molecular Biology Laborat…

Dear Ingrid I wonder whether you really sent the code that you ran, because 'AlterativeVariable' is sometimes spellt with 'n' and sometimes without, and R wouldn't normally tolerate that. The question you are asking is not specific for NGS data or for DESeq, but rather concerns multivariate and multilevel ANOVA in general. I see two main options, as you suggest: 1. test for more specific hypotheses, such as ctrl vs x and ctrl vs y separately. 2. do a scatterplot with your significant hits (ctrl-vs-x on one axis, and ctrl-vs-y on the other; possibly old vs new on the third) to get an overview of the hits and cluster them into subgroups. Best wishes Wolfgang Apr/20/12 11:37 PM, Ingrid Lindquist scripsit:: > Hi Simon, > > I am working with a multi-factor experiment that, in a sense, doesn't > have biological replicates (replicates were done at different times, and > the differences in the chemistry/instrumentation is enough to be a > rather large component of variance) and has three levels. By three > levels, I mean there are 2 different treatments and 1 control. When I > push the workflow through, everything works fine, but I'm having a hard > time delineating which level is responsible for a gene being identified > as significantly differentially expressed. I understand that (in the > following example) ConditionX, ConditionY, AlternativeVariableold are > log2 fold changes in relation to "ctrl" (in the case of condition) or > "new" (in the case of AlternativeVariable). Can I assume that the > largest FC of these three fields within each gene instance is likely the > reason that gene had a padj within significance? My plan is to separate > out the dataset so that I'm only running 2 levels at once (x vs ctrl; y > vs ctrl), but I'm hoping you can shed light on how I can interpret these > results I've mentioned here. > > The output I'm referring to has the following headers (generated by the > last line (write.table) in my workflow below): > > Gene Pval Padj Intercept ConditionX ConditionY AlternativeVariableold > Deviance Converged > > > My workflow: > > counts <- (file, header=TRUE, row.names=1) > > Design<- data.frame( > + row.names = colnames(SampledCounts), > + Condition = c( "x", "ctrl", "ctrl", "x", "y", "y"), > + AlterativeVariable= c( "old", "new", "old", "new", "new", "old")) > > library(DESeq) > > cdsFull <- newCountDataSet (counts, Design) > cdsFull <-estimateSizeFactors(cdsFull) > cdsFull <-estimateDispersions(cdsFull, method="blind", > sharingMode="fit-only") > fit1<-fitNbinomGLMs(cdsFull, count ~ condition + AlternativeVariable) > fit0<-fitNbinomGLMs(cdsFull, count ~ AlternativeVariable) > pvalsGLM <-nbinomGLMTest(fit1, fit0) > padjGLM <-p.adjust(pvalsGLM, method = "BH") > write.table(data.frame(geneID=row.names(counts(cdsFull)), pval=pvalsGLM, > padj=padjGLM, fit1), file= "result.txt") > > Thanks for your help, > Ingrid > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 12.1 years ago Wolfgang Huber ★ 13k

Login before adding your answer.