DESeq /edgeR design problem for a small RNASeq
2
0
Entering edit mode
@eduardo-andres-leon-5963
Last seen 10.3 years ago
Hi all, I'm trying to analyse data coming from a Small RNAseq with a "complex" design. I have 2 different kind of mice,a wild type and a mutant. The mutant has an insert to over express a gene of interest. and It's a different mouse from the wild type because the wild type with the insert did't born. Second, in order to over express a gene, we add a drug, so we have 4 possibilities : WTsD (wild type no treated) WTcD (wild type treated) . Just to see if adding the drug is also affecting the mouse expression profile MTsD (mouse with the insertion) . In this case, to see if the insertion is also affecting the normal expression profile of miRNAs. MTcD (mouse with the insertion + drug) . How the over expression affects. Of course this is also a time course design because the drug is added at 7 days, 9 days and 11 days, So first of all, I'd like to see DE miRNAs taking into account just the overexpression of the gene (that is by removing the changes caused for the drug and for the mouse type ). I've tried with DESeq with the following code : library(DESeq) #Reading count data data<-read.table(file="ALL_MTcD_9.5",header=TRUE,row.names=1) #Design Design<-data.frame( row.names=colnames(data), treatment =c("untreated","untreated","untreated","treated","treated" ,"treated","untreated","untreated","untreated","treated","treated","tr eated"), mouseType =c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT","MT","MT") ) cdsFull<-newCountDataSet(data,Design) cdsFull<-estimateSizeFactors(cdsFull) cdsFull<-estimateDispersions(cdsFull) #fit with the mouse and the drug experimento<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment) #fit with the mouse + drug + interaction from mouse:drug experimento_todos_factores<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment + mouseType:treatment) pvalsGLM<-nbinomGLMTest(experimento_todos_factores,experimento) padjGLM<-p.adjust(pvalsGLM,method="BH") experimento$pval<-pvalsGLM experimento$adj.pval<-padjGLM This gives me 21 DE miRNAs. As I'm not really sure if this design is correct. I've never seen something like this. So I've also used the edgeR package using the example 4.4 in the vignette. library(edgeR) Mouse<-factor(c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT","MT" ,"MT")) Treatment<-factor(c("Untreated","Untreated","Untreated","Treated","Tre ated","Treated","Untreated","Untreated","Untreated","Treated","Treated ","Treated")) data.frame(Sample=colnames(y),Mouse,Treatment) design<-model.matrix(~Mouse+Treatment) rownames(design)<-colnames(y) y<-estimateGLMCommonDisp(y,design,verbose=TRUE) #Estimation of the gene-wise dispersion y<-estimateGLMTrendedDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) #Diff expresion fit<-glmFit(y,design) lrt<-glmLRT(fit) topedgeR<-topTags(lrt,n=100) So I guess that I'm testing the drug adjusting for baseline differences between the 2 mouse types. And I have 26 DE miRNAs The problem here is that there is only 5 common miRNAs between DESeq and edgeR. Is the design correct ?what I'm doing wrong ? Thanks in advance ! [[alternative HTML version deleted]]
RNASeq edgeR DESeq RNASeq edgeR DESeq • 2.6k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.4 years ago
Zentrum für Molekularbiologie, Universi…
Hi Eduardo > The problem here is that there is only 5 common miRNAs between DESeq > and edgeR. This probably has little to do with the differences between edgeR and DESeq and much more with the fact that you ask the two tools two different questions. With DESeq, you did this: > #fit with the mouse and the drug > experimento<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment) > #fit with the mouse + drug + interaction from mouse:drug > experimento_todos_factores<-fitNbinomGLMs(cdsFull,count ~ mouseType + > treatment + mouseType:treatment) > > pvalsGLM<-nbinomGLMTest(experimento_todos_factores,experimento) So, you compare a reduced model that accounts for the main effects that treatment and mouseType have with a full model that also includes an interaction between the two factors. So your null hypothesis is: "Both mouseType and treatment might influence the genes' expression strengths, but independently so, i.e. the effect of drug treatment on a gene's expression does not depend on the mouseType (and likely, the effect of the mouseType does not depend on whether the mouse was treated." Or, on other words: You are looking for gene, where the strength of the effect of the drug treatment is _different_ between the two mouse types. With edgeR, you do: > design<-model.matrix(~Mouse+Treatment) fit<-glmFit(y,design) > lrt<-glmLRT(fit) If I remember correctly, edgeR drops the _last_ factor to get a reduced model. So you are comparing with the model "~ Mouse" and if you do the same with DESeq, you will get similar results. This time, you simply ask which genes' expression is affected by the treatment (whilst controlling for mouseType), but you do not ask whether the strength of the drug's effect depends on mouse type. > The mutant has an insert to over express a gene of interest. and It's > a different mouse from the wild type because the wild type with the > insert did't born. This is a very unfortunate experimental design. As your two mice differ in more than just the gene of interest, you will not be able to argue that the differences you find between the two mice is due to this gene and not one of the other genes in which they differ! If the mutation was lethal in the strain you tried first but not in the second strain, why did you keep using the first strain as control rather than using wild-type mice of the second strain as controls? Simon
ADD COMMENT
0
Entering edit mode
Dear Simon thanks for your reply. Your suggestion at the end of the mail, was my starting point (MTsD vs MTcD). But in this case I was loosing the effect that the drug could add to the model. The only problem here is to remove the effect of the drug (That I can infer from the comparison between WTsD vs WTcD. But this also add a new variable ... the mice). So I don't know how to deal with it neither in edgeR nor in DESeq may be : A design without taking into account the mouseType (that is assuming, the WT and the MT are more or less very similar), but using a overexpression covariate: data<-read.table(file="ALL_MTcD_9.5_NO_MTsD1",header=TRUE,row.names=1) # #Design for WTcD + MTsD + MTcD # Design<-data.frame( row.names=colnames(data[4:11]), treatment =c("treated","treated","treated","untreated","untreated","treated","tr eated","treated"), OverExpression=c("NO","NO","NO","NO","NO","YES","YES","YES") ) Design$treatment<-relevel(Design$treatment,ref="untreated") cdsFull<-newCountDataSet(data[4:11],Design) cdsFull<-estimateSizeFactors(cdsFull) cdsFull<-estimateDispersions(cdsFull,sharingMode="maximum") All_variables<-fitNbinomGLMs(cdsFull,count ~ treatment + OverExpression) drug<-fitNbinomGLMs(cdsFull,count ~ treatment) pvalsGLM<-nbinomGLMTest(All_variables,drug) padjGLM<-p.adjust(pvalsGLM,method="BH") experimento$pval<-pvalsGLM experimento$adj.pval<-padjGLM What do you think ? Thanks in advance 2013/6/3 Simon Anders <anders@embl.de> > Hi Eduardo > > > The problem here is that there is only 5 common miRNAs between DESeq >> and edgeR. >> > > This probably has little to do with the differences between edgeR and > DESeq and much more with the fact that you ask the two tools two > different questions. > > With DESeq, you did this: > > #fit with the mouse and the drug >> experimento<-fitNbinomGLMs(**cdsFull,count ~ mouseType + treatment) >> #fit with the mouse + drug + interaction from mouse:drug >> experimento_todos_factores<-**fitNbinomGLMs(cdsFull,count ~ mouseType + >> treatment + mouseType:treatment) >> >> pvalsGLM<-nbinomGLMTest(**experimento_todos_factores,**experimento) >> > > So, you compare a reduced model that accounts for the main effects that > treatment and mouseType have with a full model that also includes an > interaction between the two factors. So your null hypothesis is: "Both > mouseType and treatment might influence the genes' expression strengths, > but independently so, i.e. the effect of drug treatment on a gene's > expression does not depend on the mouseType (and likely, the effect of > the mouseType does not depend on whether the mouse was treated." Or, on > other words: You are looking for gene, where the strength of the effect > of the drug treatment is _different_ between the two mouse types. > > With edgeR, you do: > > design<-model.matrix(~Mouse+**Treatment) fit<-glmFit(y,design) >> lrt<-glmLRT(fit) >> > > If I remember correctly, edgeR drops the _last_ factor to get a reduced > model. So you are comparing with the model "~ Mouse" and if you do the > same with DESeq, you will get similar results. This time, you simply ask > which genes' expression is affected by the treatment (whilst controlling > for mouseType), but you do not ask whether the strength of the drug's > effect depends on mouse type. > > > The mutant has an insert to over express a gene of interest. and It's >> a different mouse from the wild type because the wild type with the >> insert did't born. >> > > This is a very unfortunate experimental design. As your two mice differ in > more than just the gene of interest, you will not be able to argue that the > differences you find between the two mice is due to this gene and not one > of the other genes in which they differ! > > If the mutation was lethal in the strain you tried first but not in the > second strain, why did you keep using the first strain as control rather > than using wild-type mice of the second strain as controls? > > Simon > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- =================================================== Eduardo Andrés León Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054) e-mail: eandres@cnio.es Fax: (+34) 91 224 69 76 Unidad de Bioinformática Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) http://bioinfo.cnio.es http://bioinfo.cnio.es/people/eandres =================================================== [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Eduardo On 03/06/13 20:41, Eduardo Andr?s Le?n wrote: > Your suggestion at the end of the mail, was my starting point (MTsD vs > MTcD). But in this case I was loosing the effect that the drug could add > to the model. > > The only problem here is to remove the effect of the drug (That I can > infer from the comparison between WTsD vs WTcD. But this also add a new > variable ... the mice). Sorry, you lost me now. I thought mutation and mouse stain are aliased? What biological question do you want to address? We need some more details here; an abstract discussion won't lead anywhere. Simon
ADD REPLY
0
Entering edit mode
Hi Simon, thank you again for your help. I'll try to explain the idea of the experiment. Basically we want to know what happen in tense of expression when you over-express a gen, for example oct2, while mouse development. To do that, we insert a gene construction with a promoter + oct2 gene. This promoter is sensible to a drug, so when you add the drug, oct2 is expressed. This drug is added at different times , from 4 days to 7, from 6 days to 9 and from 8 days to 11. These are my MTcD mice ( mutant + drug ). I have also mutants mice with no drug (MTsD). So, if I don't take into account, at this moment, the time course and I start to analyze the data from the 9.5 days, I look for DEG from MTcD vs MTsD. At the end of this analysis, I have DEG for two different reasons : Oct2 overexpression Drug effect So, in order to remove the possible effect of adding the drug, we have another different kind of mouse (they don't have the gene construction to overexpress oct2) that we can use. So we have a wild type mouse that has been treated with the same drug at the same days of the MTcD. We call it, WTcD. And of course we have also a WT mouse with no drug treatment (WTsD). So comparing the WTcD vs WTsD, I can infer the possible effect of the drug and I'd like to remove this effect from my mutants. More or less like : (MTsD - MTcD) - (WTsD -WTcD) = Real effect of the overexpression. Do you know how to do that ? Again thanks for your help. 2013/6/3 Simon Anders <anders@embl.de> > Hi Eduardo > > > On 03/06/13 20:41, Eduardo Andrés León wrote: > >> Your suggestion at the end of the mail, was my starting point (MTsD vs >> MTcD). But in this case I was loosing the effect that the drug could add >> to the model. >> >> The only problem here is to remove the effect of the drug (That I can >> infer from the comparison between WTsD vs WTcD. But this also add a new >> variable ... the mice). >> > > Sorry, you lost me now. I thought mutation and mouse stain are aliased? > > What biological question do you want to address? We need some more details > here; an abstract discussion won't lead anywhere. > > Simon > -- =================================================== Eduardo Andrés León Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054) e-mail: eandres@cnio.es Fax: (+34) 91 224 69 76 Unidad de Bioinformática Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) http://bioinfo.cnio.es http://bioinfo.cnio.es/people/eandres =================================================== [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Eduardo Thanks for the exaplanation. I now see what yiu are heading at. On 04/06/13 11:01, Eduardo Andr?s Le?n wrote: > (MTsD - MTcD) - (WTsD -WTcD) = Real effect of the overexpression. Yes, this is precisely what is called an "interaction effect" and described in R's formula notation with ":" operator. So, you use, as full model: full_model <- fitNbinomGLMs( cds, count ~ mutation + drug + mutation:drug) and as reduced model reduced_model <- fitNbinomGLMs( cds, count ~ mutation + drug ) and then compare the two: pvals <- nbinomGLMTest( full_model, reduced_model ) (As an alternative for the notation "~ A + B + A:B", you will often see the equivalent abbreviated notation "~ A * B".) If you deal with such questions more often, it may be helpful to read a textbook on linear models and Anova. This is all quite standard stuff. Simon
ADD REPLY
0
Entering edit mode
Hi Simon, I hope not to deal with these kind of experiments anymore ;) I'm glad to know that I'm doing it correctly. About ANOVA, I did some experiments with the edgeR anova-like test using the glmLRT function and the coef parameter. But as usual, both results were quite different. About the next step, the time course analysis, can this be done with DESeq, using the time as a new covariate (as with the drug) ? Thanks again ! On 4 Jun 2013, at 11:52, "Simon Anders" <anders@embl.de> wrote: > Hi Eduardo > > Thanks for the exaplanation. I now see what yiu are heading at. > > On 04/06/13 11:01, Eduardo Andrés León wrote: >> (MTsD - MTcD) - (WTsD -WTcD) = Real effect of the overexpression. > > Yes, this is precisely what is called an "interaction effect" and described in R's formula notation with ":" operator. > > So, you use, as full model: > > full_model <- fitNbinomGLMs( cds, count ~ mutation + drug + mutation:drug) > > and as reduced model > > reduced_model <- fitNbinomGLMs( cds, count ~ mutation + drug ) > > and then compare the two: > > pvals <- nbinomGLMTest( full_model, reduced_model ) > > (As an alternative for the notation "~ A + B + A:B", you will often see the equivalent abbreviated notation "~ A * B".) > > If you deal with such questions more often, it may be helpful to read a textbook on linear models and Anova. This is all quite standard stuff. > > Simon =================================================== Eduardo Andrés León Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054/3063) e-mail: eandres@cnio.es Fax: (+34) 91 224 69 76 Unidad de Bioinformática Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) http://bioinfo.cnio.es http://bioinfo.cnio.es/people/eandres =================================================== [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Eduardo On 04/06/13 12:13, Eduardo Andr?s Le?n wrote: > About ANOVA, I did some experiments with the edgeR anova-like test using > the glmLRT function and the coef parameter. > > But as usual, both results were quite different. Just to clarify: What you do with DESeq's GLM facilities is also ANOVA (or actually, both edgeR and DESeq do ANODEV, but this is just a technicality), and you should get roughly similar results. Are you sure you tested _for_the_interaction_ with edgeR? > About the next step, the time course analysis, can this be done with > DESeq, using the time as a new covariate (as with the drug) ? In principle, yes. But now things get really complicated because you need to choose which time points to compare (as you have more than two). Simon
ADD REPLY
0
Entering edit mode
Hi Simon, For edgeR I was using this code : rawdata<-read.delim(file="ALL_MTcD_9.5_NO_MTsD1",header=TRUE,row.names =1) y<-DGEList(counts=rawdata,genes=rawdata[,0]) y<-calcNormFactors(y) plotMDS(y,xlab="Mouse factor",ylab="Treatment Factor",xlim=c(-0.4,0.4),ylim=c(-0.4,0.4)) Mouse<-factor(c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT","MT" )) Treatment<-factor(c("Untreated","Untreated","Untreated","Treated","Tre ated","Treated","Untreated","Untreated","Treated","Treated","Treated") ) data.frame(Sample=colnames(y),Mouse,Treatment) design<-model.matrix(~Mouse+Treatment+Mouse:Treatment) rownames(design)<-colnames(y) #Overall dispersion of the dataset y<-estimateGLMCommonDisp(y,design,verbose=TRUE) #Estimation of the gene-wise dispersion y<-estimateGLMTrendedDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) #Diff expresion fit<-glmFit(y,design) lrt<-glmLRT(fit,contrast=c(1,1-1)) top<-topTags(lrt,n=100) head(top$table) but it gives me an error : lrt<-glmLRT(fit,contrast=c(1,1-1)) Error in glmfit$coefficients %*% contrast : non-conformable arguments On 4 Jun 2013, at 12:18, "Simon Anders" <anders@embl.de> wrote: > Hi Eduardo > > On 04/06/13 12:13, Eduardo Andrés León wrote: >> About ANOVA, I did some experiments with the edgeR anova-like test using >> the glmLRT function and the coef parameter. >> >> But as usual, both results were quite different. > > Just to clarify: What you do with DESeq's GLM facilities is also ANOVA (or actually, both edgeR and DESeq do ANODEV, but this is just a technicality), and you should get roughly similar results. Are you sure you tested _for_the_interaction_ with edgeR? > >> About the next step, the time course analysis, can this be done with >> DESeq, using the time as a new covariate (as with the drug) ? > > In principle, yes. But now things get really complicated because you need to choose which time points to compare (as you have more than two). > > Simon > =================================================== Eduardo Andrés León Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054/3063) e-mail: eandres@cnio.es Fax: (+34) 91 224 69 76 Unidad de Bioinformática Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) http://bioinfo.cnio.es http://bioinfo.cnio.es/people/eandres =================================================== [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia
Dear Eduardo, The design matrix you've used for edgeR is not correct, as I think you've already come to understand in your discussions with Simon. Your four group experiment is of the type covered by Section 3.3 in the edgeR User's guide. Why not have a read and follow the instructions? I think that trying to interpret model formula in R may be causing you some confusion. I recommend the simpler more intuitive approach described in Section 3.3.1 of the edgeR User's Guide. edgeR gives you a way to explicitly say that you want: (MTsD - MTcD) - (WTsD -WTcD) instead of trying to understand the meaning of coefficients in linear models. BTW, please show sessionInfo() output. Are you using the latest version of Bioconductor? If not, upgrade! Best wishes Gordon > Date: Mon, 3 Jun 2013 13:06:55 +0200 > From: Eduardo Andr?s Le?n <eandres at="" cnio.es=""> > To: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Subject: [BioC] DESeq /edgeR design problem for a small RNASeq > > Hi all, > I'm trying to analyse data coming from a Small RNAseq with a "complex" design. > > I have 2 different kind of mice,a wild type and a mutant. The mutant has > an insert to over express a gene of interest. and It's a different mouse > from the wild type because the wild type with the insert did't born. > Second, in order to over express a gene, we add a drug, so we have 4 > possibilities : > > WTsD (wild type no treated) > WTcD (wild type treated) . Just to see if adding the drug is also affecting the mouse expression profile > MTsD (mouse with the insertion) . In this case, to see if the insertion is also affecting the normal expression profile of miRNAs. > MTcD (mouse with the insertion + drug) . How the over expression affects. > > Of course this is also a time course design because the drug is added at > 7 days, 9 days and 11 days, > > So first of all, I'd like to see DE miRNAs taking into account just the > overexpression of the gene (that is by removing the changes caused for > the drug and for the mouse type ). > > I've tried with DESeq with the following code : > > library(DESeq) > > #Reading count data > data<-read.table(file="ALL_MTcD_9.5",header=TRUE,row.names=1) > > #Design > Design<-data.frame( > row.names=colnames(data), > treatment =c("untreated","untreated","untreated","treated","treated ","treated","untreated","untreated","untreated","treated","treated","t reated"), > mouseType =c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT","MT","MT") > ) > > cdsFull<-newCountDataSet(data,Design) > cdsFull<-estimateSizeFactors(cdsFull) > cdsFull<-estimateDispersions(cdsFull) > > #fit with the mouse and the drug > experimento<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment) > #fit with the mouse + drug + interaction from mouse:drug > experimento_todos_factores<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment + mouseType:treatment) > > pvalsGLM<-nbinomGLMTest(experimento_todos_factores,experimento) > padjGLM<-p.adjust(pvalsGLM,method="BH") > > experimento$pval<-pvalsGLM > experimento$adj.pval<-padjGLM > > > This gives me 21 DE miRNAs. > > As I'm not really sure if this design is correct. I've never seen something like this. So I've also used the edgeR package using the example 4.4 in the vignette. > > library(edgeR) > > Mouse<-factor(c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT","M T","MT")) > Treatment<-factor(c("Untreated","Untreated","Untreated","Treated","T reated","Treated","Untreated","Untreated","Untreated","Treated","Treat ed","Treated")) > data.frame(Sample=colnames(y),Mouse,Treatment) > design<-model.matrix(~Mouse+Treatment) > rownames(design)<-colnames(y) > > y<-estimateGLMCommonDisp(y,design,verbose=TRUE) > #Estimation of the gene-wise dispersion > y<-estimateGLMTrendedDisp(y,design) > y<-estimateGLMTagwiseDisp(y,design) > > #Diff expresion > fit<-glmFit(y,design) > lrt<-glmLRT(fit) > topedgeR<-topTags(lrt,n=100) > > So I guess that I'm testing the drug adjusting for baseline differences > between the 2 mouse types. And I have 26 DE miRNAs > > The problem here is that there is only 5 common miRNAs between DESeq and > edgeR. > > Is the design correct ?what I'm doing wrong ? > > > Thanks in advance ! > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Thanks Gordon for your replay. I really appreciate it If fact, I did the analysis with a different design having same results. But it's true that the example in the 3.3.1 is easier to understand. So if a change my edgeR code to use this design/contrast, the results are comparable with DESeq : > library(limma) > library(edgeR) > rawdata<-read.delim(file="ALL_MTcD_9.5_NO_MTsD1",header=TRUE,row.nam es=1) > y<-DGEList(counts=rawdata,genes=rawdata[,0]) > y An object of class "DGEList" $counts WTsD_1_9.5 WTsD_2_9.5 WTsD_3_9.5 WTcD_1_9.5 WTcD_2_9.5 WTcD_3_9.5 MTsD_2_9.5 MTsD_3_9.5 MTcD_1_9.5 MTcD_2_9.5 MTcD_3_9.5 mmu-let-7a-1-3p 4 7 1 5 4 4 5 3 8 5 2 mmu-let-7a-2-3p 2 1 0 2 1 0 0 0 0 0 1 mmu-let-7a-5p 195 310 263 748 1091 630 164 187 194 188 265 mmu-let-7b-3p 1 0 0 0 3 0 0 0 2 1 0 mmu-let-7b-5p 2 6 3 11 59 17 1 1 18 13 6 1268 more rows ... $samples group lib.size norm.factors WTsD_1_9.5 1 672649 1 WTsD_2_9.5 1 913140 1 WTsD_3_9.5 1 805774 1 WTcD_1_9.5 1 884400 1 WTcD_2_9.5 1 951267 1 6 more rows ... $genes data frame with 0 columns and 5 rows 1268 more rows ... > y<-calcNormFactors(y) > targets<-readTargets(row.names="FileName") > targets FileName treatment mouseType WTsD_1_9.5 WTsD_1_9.5 untreated WT WTsD_2_9.5 WTsD_2_9.5 untreated WT WTsD_3_9.5 WTsD_3_9.5 untreated WT WTcD_1_9.5 WTcD_1_9.5 treated WT WTcD_2_9.5 WTcD_2_9.5 treated WT WTcD_3_9.5 WTcD_3_9.5 treated WT MTsD_2_9.5 MTsD_2_9.5 untreated MT MTsD_3_9.5 MTsD_3_9.5 untreated MT MTcD_1_9.5 MTcD_1_9.5 treated MT MTcD_2_9.5 MTcD_2_9.5 treated MT MTcD_3_9.5 MTcD_3_9.5 treated MT > Group<-factor(paste(targets$treatment,targets$mouseType,sep="_")) > cbind(targets,Group=Group) FileName treatment mouseType Group WTsD_1_9.5 WTsD_1_9.5 untreated WT untreated_WT WTsD_2_9.5 WTsD_2_9.5 untreated WT untreated_WT WTsD_3_9.5 WTsD_3_9.5 untreated WT untreated_WT WTcD_1_9.5 WTcD_1_9.5 treated WT treated_WT WTcD_2_9.5 WTcD_2_9.5 treated WT treated_WT WTcD_3_9.5 WTcD_3_9.5 treated WT treated_WT MTsD_2_9.5 MTsD_2_9.5 untreated MT untreated_MT MTsD_3_9.5 MTsD_3_9.5 untreated MT untreated_MT MTcD_1_9.5 MTcD_1_9.5 treated MT treated_MT MTcD_2_9.5 MTcD_2_9.5 treated MT treated_MT MTcD_3_9.5 MTcD_3_9.5 treated MT treated_MT > > design<-model.matrix(~0+Group) > colnames(design)<-levels(Group) > rownames(design)<-colnames(y) > > #Overall dispersion of the dataset > y<-estimateGLMCommonDisp(y,design,verbose=TRUE) Disp = 0.01289 , BCV = 0.1135 > #Estimation of the gene-wise dispersion > y<-estimateGLMTrendedDisp(y,design) > y<-estimateGLMTagwiseDisp(y,design) > fit<-glmFit(y,design) > contrast<-makeContrasts( + OverExOnMutants=(untreated_MT-treated_MT)-(untreated_WT- treated_WT) + ,levels=design) > lrt<-glmLRT(fit,contrast=contrast[,"OverExOnMutants"]) > top<-topTags(lrt,n=100) > head(top$table) logFC logCPM LR PValue FDR mmu-miR-222-3p 1.1568204 8.674346 34.79073 3.671169e-09 4.673398e-06 mmu-miR-423-5p -1.3875732 9.346753 32.55137 1.160833e-08 7.388699e-06 mmu-miR-92b-3p 0.8393514 14.808914 30.76576 2.911292e-08 1.235358e-05 mmu-let-7a-5p 1.4829445 8.738643 30.18508 3.927182e-08 1.249826e-05 mmu-miR-615-5p 2.0255706 5.640921 29.67739 5.102658e-08 1.299137e-05 mmu-miR-342-5p 2.5054672 5.530717 24.93025 5.944228e-07 1.260493e-04 > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] gplots_2.11.0.1 MASS_7.3-26 KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 BiocInstaller_1.10.1 RColorBrewer_1.0-5 [9] DESeq_1.12.0 lattice_0.20-15 locfit_1.5-9.1 edgeR_3.2.3 limma_3.16.5 AnnotationDbi_1.22.5 Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 bitops_1.0-5 DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 IRanges_1.18.1 RSQLite_0.11.4 splines_3.0.1 [9] stats4_3.0.1 survival_2.37-4 tools_3.0.1 XML_3.95-0.2 xtable_1.7-1 Thank you very much ! On 5 Jun 2013, at 03:35, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Eduardo, > > The design matrix you've used for edgeR is not correct, as I think you've already come to understand in your discussions with Simon. > > Your four group experiment is of the type covered by Section 3.3 in the edgeR User's guide. Why not have a read and follow the instructions? > > I think that trying to interpret model formula in R may be causing you some confusion. I recommend the simpler more intuitive approach described in Section 3.3.1 of the edgeR User's Guide. edgeR gives you a way to explicitly say that you want: > > (MTsD - MTcD) - (WTsD -WTcD) > > instead of trying to understand the meaning of coefficients in linear models. > > BTW, please show sessionInfo() output. Are you using the latest version of Bioconductor? If not, upgrade! > > Best wishes > Gordon > >> Date: Mon, 3 Jun 2013 13:06:55 +0200 >> From: Eduardo Andr?s Le?n <eandres@cnio.es> >> To: "<bioconductor@r-project.org>" <bioconductor@r-project.org> >> Subject: [BioC] DESeq /edgeR design problem for a small RNASeq >> >> Hi all, >> I'm trying to analyse data coming from a Small RNAseq with a "complex" design. >> >> I have 2 different kind of mice,a wild type and a mutant. The mutant has an insert to over express a gene of interest. and It's a different mouse from the wild type because the wild type with the insert did't born. > >> Second, in order to over express a gene, we add a drug, so we have 4 possibilities : >> >> WTsD (wild type no treated) >> WTcD (wild type treated) . Just to see if adding the drug is also affecting the mouse expression profile >> MTsD (mouse with the insertion) . In this case, to see if the insertion is also affecting the normal expression profile of miRNAs. >> MTcD (mouse with the insertion + drug) . How the over expression affects. >> >> Of course this is also a time course design because the drug is added at 7 days, 9 days and 11 days, >> >> So first of all, I'd like to see DE miRNAs taking into account just the overexpression of the gene (that is by removing the changes caused for the drug and for the mouse type ). >> >> I've tried with DESeq with the following code : >> >> library(DESeq) >> >> #Reading count data >> data<-read.table(file="ALL_MTcD_9.5",header=TRUE,row.names=1) >> >> #Design >> Design<-data.frame( >> row.names=colnames(data), >> treatment =c("untreated","untreated","untreated","treated","treated ","treated","untreated","untreated","untreated","treated","treated","t reated"), >> mouseType =c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT","MT","MT") >> ) >> >> cdsFull<-newCountDataSet(data,Design) >> cdsFull<-estimateSizeFactors(cdsFull) >> cdsFull<-estimateDispersions(cdsFull) >> >> #fit with the mouse and the drug >> experimento<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment) >> #fit with the mouse + drug + interaction from mouse:drug >> experimento_todos_factores<-fitNbinomGLMs(cdsFull,count ~ mouseType + treatment + mouseType:treatment) >> >> pvalsGLM<-nbinomGLMTest(experimento_todos_factores,experimento) >> padjGLM<-p.adjust(pvalsGLM,method="BH") >> >> experimento$pval<-pvalsGLM >> experimento$adj.pval<-padjGLM >> >> >> This gives me 21 DE miRNAs. >> >> As I'm not really sure if this design is correct. I've never seen something like this. So I've also used the edgeR package using the example 4.4 in the vignette. >> >> library(edgeR) >> >> Mouse<-factor(c("WT","WT","WT","WT","WT","WT","MT","MT","MT","MT"," MT","MT")) >> Treatment<-factor(c("Untreated","Untreated","Untreated","Treated"," Treated","Treated","Untreated","Untreated","Untreated","Treated","Trea ted","Treated")) >> data.frame(Sample=colnames(y),Mouse,Treatment) >> design<-model.matrix(~Mouse+Treatment) >> rownames(design)<-colnames(y) >> >> y<-estimateGLMCommonDisp(y,design,verbose=TRUE) >> #Estimation of the gene-wise dispersion >> y<-estimateGLMTrendedDisp(y,design) >> y<-estimateGLMTagwiseDisp(y,design) >> >> #Diff expresion >> fit<-glmFit(y,design) >> lrt<-glmLRT(fit) >> topedgeR<-topTags(lrt,n=100) >> >> So I guess that I'm testing the drug adjusting for baseline differences between the 2 mouse types. And I have 26 DE miRNAs >> >> The problem here is that there is only 5 common miRNAs between DESeq and edgeR. >> >> Is the design correct ?what I'm doing wrong ? >> >> >> Thanks in advance ! >> > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________ =================================================== Eduardo Andrés León Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054/3063) e-mail: eandres@cnio.es Fax: (+34) 91 224 69 76 Unidad de Bioinformática Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fernández Almagro, 3 Madrid (Spain) http://bioinfo.cnio.es http://bioinfo.cnio.es/people/eandres =================================================== [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6