(no subject)

0

Entering edit mode

Liu, XiaoChuan ▴ 30

@liu-xiaochuan-5812

Last seen 11.3 years ago

Dear Dr. Simon Anders, I have some questions on the multi-factor test in DESeq. Here I have total 8 RNA-seq samples. There are two factors: condition(Wildtype and Knockout), and treatment(Picrotoxin and Control). And each has two biological replicates. Details I have attached (see Table 1 in attachment). Now I want to use the generalized linear models to do the ANOVA-like analysis for these two factors and want to find the differentially expressed genes caused by the two factors. The part 2 in attachment are the different GLMs tests by DESeq. And the cut-off are chosen as P-value<0.05 and FDR<0.05 separately. The number of significant differentially expressed genes are counted by cut-offs. Details see Part 2 in attachment. I also pick up raw counts for all the genes, then I try to use the function getVarianceStabilizedData() to transfer the raw counts to variance stabilizing transformation (VST). After that, I directly use the ANOVA function in R to calculate the P-value for each genes based on the VST value. When I get all P-value for all genes, I also do the p.adjust to calculate the FDR by method = "BH". Results see Part 3 in attachment. Here are my questions: 1. What is the meaning when I use ?count ~ 1?? Here 1 is a cut- off? Or other meaning? I saw you give an example like this in DESeq Reference Manual. So I try to follow using it. But I do not know the meaning for test. 2. In Part 2, I did 6 different GLMs tests by DESeq. And I also do the overlap amongst results. The overlap sometimes is very small. Why are they so different? How to explain it? Could you give me some comments? 3. In Part 3, I also do the overlap with 6 results in Part 2. But the overlap are very small. I wonder if I make a mistake to misuse the variance stabilizing transformation? If I want to directly use the ANOVA function in R to calculate co-factor P-value, could I use the raw count? Or How to normalize the raw counts then I can use ANOVA function in R? Thank you very much! Looking forward to your response! Best, Leo 1. Table1 condition treatment WP1 Wildtype Picrotoxin WC1 Wildtype Control KP1 Knockout Picrotoxin KC1 Knockout Control WP2 Wildtype Picrotoxin WC2 Wildtype Control KP2 Knockout Picrotoxin KC2 Knockout Control 2. Multi-factor test fit1 = fitNbinomGLMs( cds, count ~ treatment + condition ) fit0 = fitNbinomGLMs( cds, count ~ treatment ) pvalsGLM1 = nbinomGLMTest( fit1, fit0 ) padjGLM1 = p.adjust( pvalsGLM1, method="BH" ) Dataset1 P-value<0.05: 615 FDR<0.05: 20 fit3 = fitNbinomGLMs( cds, count ~ condition + treatment ) fit2 = fitNbinomGLMs( cds, count ~ condition ) pvalsGLM2 = nbinomGLMTest( fit3, fit2 ) padjGLM2 = p.adjust( pvalsGLM2, method="BH" ) Dataset2 P-value<0.05: 1959 FDR<0.05: 870 fit1 = fitNbinomGLMs( cds, count ~ treatment + condition ) fit4 = fitNbinomGLMs( cds, count ~ 1 ) pvalsGLM3 = nbinomGLMTest( fit1, fit4 ) padjGLM3 = p.adjust( pvalsGLM3, method="BH" ) Dataset3 P-value<0.05: 1767 FDR<0.05: 778 fit5 = fitNbinomGLMs( cds, count ~ condition * treatment ) fit0 = fitNbinomGLMs( cds, count ~ treatment ) pvalsGLM4 = nbinomGLMTest( fit5, fit0 ) padjGLM4 = p.adjust( pvalsGLM4, method="BH" ) Dataset4 P-value<0.05: 393 FDR<0.05: 26 fit5 = fitNbinomGLMs( cds, count ~ condition * treatment ) fit2 = fitNbinomGLMs( cds, count ~ condition ) pvalsGLM5 = nbinomGLMTest( fit5, fit2 ) padjGLM5 = p.adjust( pvalsGLM5, method="BH" ) Dataset5 P-value<0.05: 1450 FDR<0.05: 665 fit5 = fitNbinomGLMs( cds, count ~ condition * treatment ) fit4 = fitNbinomGLMs( cds, count ~ 1 ) pvalsGLM6 = nbinomGLMTest( fit5, fit4 ) padjGLM6 = p.adjust( pvalsGLM6, method="BH" ) Dataset6 P-value<0.05: 1465 FDR<0.05: 657 Overlap(P-value<0.05) Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset6 Dataset1 Dataset2 240 Dataset3 455 1511 Dataset4 356 178 342 Dataset5 220 1423 1378 184 Dataset6 396 1282 1432 328 1247 Overlap(FDR<0.05) Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset6 Dataset1 Dataset2 10 Dataset3 20 720 Dataset4 18 15 26 Dataset5 9 658 652 15 Dataset6 20 618 653 26 602 3. Two way ANOVA After VST? P-value<0.05: 671 FDR<0.05: 0 Overlap(P-value<0.05) Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset6 Two way ANOVA After VST 35 77 68 34 69 69 -------------- next part -------------- A non-text attachment was scrubbed... Name: Anova.pdf Type: application/pdf Size: 106447 bytes Desc: Anova.pdf URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130308="" e5c68473="" attachment.pdf="">

DESeq DESeq • 3.4k views

ADD COMMENT • link updated 12.8 years ago by Deisy Gysi ▴ 20 • written 12.8 years ago by Liu, XiaoChuan ▴ 30

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 5.4 years ago

Zentrum für Molekularbiologie, Universi…

Dear Leo On 08/03/13 16:54, Liu, XiaoChuan wrote: [...] > 1. What is the meaning when I use ?count ~ 1?? Here 1 is a > cut-off? Or other meaning? I saw you give an example like this in > DESeq Reference Manual. So I try to follow using it. But I do not > know the meaning for test. No, "~ 1" means that no factors except for the intercept should be used in the model. This formula notation is not specific to DESeq, it is a part of R and hence discussed in most textbooks covering R. (If you are unfamiliar with linear models and the associated concepts, the books by Dobson or Dalgaard might be a useful read.) > 2. In Part 2, I did 6 different GLMs tests by DESeq. And I also > do the overlap amongst results. The overlap sometimes is very small. > Why are they so different? How to explain it? Could you give me some > comments? Sorry, I don't understand the question. You perform tests for different hypotheses and are surprised to get different results? You will need to tell us more specifically what biological hypotheses you wish to test, and then we can maybe advise you how to formulate this as a linear model. > > 3. In Part 3, I also do the overlap with 6 results in Part 2. > But the overlap are very small. I wonder if I make a mistake to > misuse the variance stabilizing transformation? If I want to directly > use the ANOVA function in R to calculate co-factor P-value, could I > use the raw count? Or How to normalize the raw counts then I can use > ANOVA function in R? No, ordinary-least-square (OLS) ANOVA requires data to be homoscedastic and this count data is not. This is, after all, the whole point of either using GLMs on the raw count data, or OLS ANOVA on variance-stabilized data. I would have expected some similarity between the results of a GLM ANODEV anaysis of the count data and the OLS ANOVA analysis on the vST data, but as you did not post the code you used, it is hard to say whether you may have made a mistake. Simon

ADD COMMENT • link 12.8 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Dear Simon, For question 1: Thanks very much for your suggestion! I will read it. For question2: Our biological hypotheses is that whether we can find the differentially expressed genes caused by the two factors amongst these samples? So do you think how to formulate this as a linear model? Like this belove? fit5 = fitNbinomGLMs( cds, count ~ condition * treatment ) fit4 = fitNbinomGLMs( cds, count ~ 1 ) pvalsGLM6 = nbinomGLMTest( fit5, fit4 ) padjGLM6 = p.adjust( pvalsGLM6, method="BH" ) For question3: I did the OLS ANOVA analysis on the vST data. Do you think it is correct? The codes are attached. Thanks! Leo -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Simon Anders Sent: Friday, March 08, 2013 11:13 AM To: bioconductor at r-project.org Subject: Re: [BioC] (no subject) Dear Leo On 08/03/13 16:54, Liu, XiaoChuan wrote: [...] > 1. What is the meaning when I use ?count ~ 1?? Here 1 is a > cut-off? Or other meaning? I saw you give an example like this in > DESeq Reference Manual. So I try to follow using it. But I do not know > the meaning for test. No, "~ 1" means that no factors except for the intercept should be used in the model. This formula notation is not specific to DESeq, it is a part of R and hence discussed in most textbooks covering R. (If you are unfamiliar with linear models and the associated concepts, the books by Dobson or Dalgaard might be a useful read.) > 2. In Part 2, I did 6 different GLMs tests by DESeq. And I also > do the overlap amongst results. The overlap sometimes is very small. > Why are they so different? How to explain it? Could you give me some > comments? Sorry, I don't understand the question. You perform tests for different hypotheses and are surprised to get different results? You will need to tell us more specifically what biological hypotheses you wish to test, and then we can maybe advise you how to formulate this as a linear model. > > 3. In Part 3, I also do the overlap with 6 results in Part 2. > But the overlap are very small. I wonder if I make a mistake to > misuse the variance stabilizing transformation? If I want to directly > use the ANOVA function in R to calculate co-factor P-value, could I > use the raw count? Or How to normalize the raw counts then I can use > ANOVA function in R? No, ordinary-least-square (OLS) ANOVA requires data to be homoscedastic and this count data is not. This is, after all, the whole point of either using GLMs on the raw count data, or OLS ANOVA on variance-stabilized data. I would have expected some similarity between the results of a GLM ANODEV anaysis of the count data and the OLS ANOVA analysis on the vST data, but as you did not post the code you used, it is hard to say whether you may have made a mistake. Simon _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: code.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130308="" dd02123e="" attachment.txt="">

ADD REPLY • link 12.8 years ago Liu, XiaoChuan ▴ 30

0

Entering edit mode

Dear Simon, Here are my code for ANOVA test after VST in DESeq, Do you think it is right to do ANOVA test by VST? Thanks! Code: CountTable <- read.table("all.samples.count.txt",header=TRUE, row.names=1) library('DESeq') Design<-data.frame(row.names=colnames(CountTable), condition = c("Wild type","Wildtype","Knockout","Knockout","Wildtype","Wildtype","Knockout ","Knockout"), treatment = c("Picrotoxin","Control","Picrotoxin","Cont rol","Picrotoxin","Control","Picrotoxin","Control")) cds <- newCountDataSet( CountTable, Design ) cds <- estimateSizeFactors( cds ) cds <- estimateDispersions( cds ) x<-getVarianceStabilizedData(cds) CountTable_new <- x nrow_number<-nrow(CountTable_new) ncol_number<-ncol(CountTable_new) all_P_value<-0; for(i in 1:nrow_number){ t1<-as.matrix(Design$condition) t2<-as.matrix(Design$treatment) t3<-as.matrix(CountTable_new[i,]) anovaTable<-matrix(c(t1,t2,t3),nrow=ncol_number,ncol=3,dimnames = list(c(1:ncol_number),c("condition","treatment","expression"))) anovaTable_data<-as.data.frame(anovaTable) anovaTable_data$expression<-as.integer(anovaTable_data$expression) int <- aov(expression ~ condition*treatment, data= anovaTable_data) int_matrix<-as.matrix(anova(int)) p_value=int_matrix[3,5] all_P_value<-c(all_P_value,p_value) } all_P_value<-all_P_value[2:(nrow_number+1)] all_P_value<-as.matrix(all_P_value) anova<-matrix(c(CountTable_new,all_P_value,all_P_value),nrow=nrow_numb er,ncol=(ncol_number+2),dimnames = list(rownames(CountTable_new),c(col names(CountTable_new),"p_value","q_value"))) anova<-as.data.frame(anova) anova$q_value<-p.adjust(as.numeric(anova$p_value),method = "BH") output_file_name="two_way_ANOVA.txt" write.table( anova, file=output_file_name, quote= FALSE, sep="\t", row.names= TRUE ) Leo >3. In Part 3, I also do the overlap with 6 results in Part 2. > But the overlap are very small. I wonder if I make a mistake to > misuse the variance stabilizing transformation? If I want to directly > use the ANOVA function in R to calculate co-factor P-value, could I > use the raw count? Or How to normalize the raw counts then I can use > ANOVA function in R? No, ordinary-least-square (OLS) ANOVA requires data to be homoscedastic and this count data is not. This is, after all, the whole point of either using GLMs on the raw count data, or OLS ANOVA on variance-stabilized data. I would have expected some similarity between the results of a GLM ANODEV anaysis of the count data and the OLS ANOVA analysis on the vST data, but as you did not post the code you used, it is hard to say whether you may have made a mistake. Simon _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.8 years ago Liu, XiaoChuan ▴ 30

0

Entering edit mode

Hi On 14/03/13 16:57, Liu, XiaoChuan wrote: > Here are my code for ANOVA test after VST in DESeq, > Do you think it is right to do ANOVA test by VST? Thanks! Doing a VST followed by ANOVA is not entirely wrong but it is certainly less suitable then a GLM. I don't see why you don't want to do your analysis in the manner suggested in the vignette. You ANOVA code gives you four p values per gene, and, of course, you can get four equivalent p values with GLM tests, too. The reason why I haven't told you how is that I doubt that you are interested in all four of them, and as you have not told us what biological question you want to address, I cannot say which one would be the one relevant for you. So, I stick to my previous advice: Please read up on the concept of "interactions" in linear models and ANOVA, because without that, it is unlikely that you manage to make sense of your data. And once you know whether you are looking for an interaction or a main effect, ask again. Simon

ADD REPLY • link 12.8 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Hi, Does anybody aware of any package in R or know how we can map metabolites to genes. I mean all of the enzymatic genes which are involved in any reactions in which the metabolite of interest is participating as substrate or product or both. Thanks, Nooshin [[alternative HTML version deleted]]

ADD REPLY • link 12.8 years ago nooshin ▴ 300

0

Entering edit mode

Dear Nooshin, I believe that the Metscape 2 add-on to Cytoscape can do what you want. Best wishes, Rich On Mar 14, 2013, at 2:59 PM, n omranian wrote: > Hi, > > Does anybody aware of any package in R or know how we can map metabolites to genes. > I mean all of the enzymatic genes which are involved in any reactions in which the metabolite of interest is participating as substrate or product or both. > > Thanks, > Nooshin > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.8 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Dear Rich, Thanks a lot for your reply. I will check it and inform you about the result. but nothing exists in R? Thanks. Nooshin --- On Thu, 3/14/13, Richard Friedman <friedman@cancercenter.columbia.edu> wrote: From: Richard Friedman <friedman@cancercenter.columbia.edu> Subject: Re: [BioC] metabolite to gene - mapping To: "n omranian" <n_omranian@yahoo.com> Cc: "'bioconductor@r-project.org'" <bioconductor@r-project.org> Date: Thursday, March 14, 2013, 10:31 PM Dear Nooshin, I believe that the Metscape 2 add-on to Cytoscape can do what you want. Best wishes, Rich On Mar 14, 2013, at 2:59 PM, n omranian wrote: > Hi, > > Does anybody aware of any package in R or know how we can map metabolites to genes. > I mean all of the enzymatic genes which are involved in any reactions in which the metabolite of interest is participating as substrate or product or both. > > Thanks, > Nooshin > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 12.8 years ago nooshin ▴ 300

0

Entering edit mode

On Mar 14, 2013, at 3:04 PM, n omranian wrote: > > Dear Rich, > Thanks a lot for your reply. I will check it and inform you about the result. but nothing exists in R? > Thanks. > Nooshin > Not of which I am aware but maybe someone else on the list knows. Rich

ADD REPLY • link 12.8 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Hi Richard, The Reactome.db package contains the entire reactome database. It is not all exposed via select (on account of the fact that reactome is so enormous). But by using the DBI and RSQLite packages, along with the included database connection, you should be able to get out anything from there that you wanted to extract. And if you are willing to help me find the thing you wish to expose (the reactome database schema is quite complex), I can probably help you make it accessible via select. Marc On 03/14/2013 12:06 PM, Richard Friedman wrote: > On Mar 14, 2013, at 3:04 PM, n omranian wrote: > >> Dear Rich, >> Thanks a lot for your reply. I will check it and inform you about the result. but nothing exists in R? >> Thanks. >> Nooshin >> > Not of which I am aware but maybe someone else on the list knows. > Rich > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.8 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

One more thing: It might also be worth your time to have a look at the KEGGREST package (currently it's in devel). http://www.bioconductor.org/packages/2.12/bioc/html/KEGGREST.html Marc On 03/20/2013 12:38 PM, Marc Carlson wrote: > Hi Richard, > > The Reactome.db package contains the entire reactome database. It is > not all exposed via select (on account of the fact that reactome is so > enormous). But by using the DBI and RSQLite packages, along with the > included database connection, you should be able to get out anything > from there that you wanted to extract. > > And if you are willing to help me find the thing you wish to expose > (the reactome database schema is quite complex), I can probably help > you make it accessible via select. > > > Marc > > > > On 03/14/2013 12:06 PM, Richard Friedman wrote: >> On Mar 14, 2013, at 3:04 PM, n omranian wrote: >> >>> Dear Rich, >>> Thanks a lot for your reply. I will check it and inform you about >>> the result. but nothing exists in R? >>> Thanks. >>> Nooshin >>> >> Not of which I am aware but maybe someone else on the list knows. >> Rich >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.8 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

> Date: Tue, 12 Mar 2013 13:11:25 -0400 > From: Richard Friedman <friedman at="" cancercenter.columbia.edu=""> > To: "Ryan C. Thompson" <rct at="" thompsonclan.org=""> > Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] QuasiSeq vs DSS > > Dear Ryan, > > Thank you for your response. > 3 questions: > 1. If I had just a simple pairwise comparison is it known DSS or > QuasiSeq better? > 2. I was unaware that an approximate implementation of QuasiSeq was > available in edgeR. If so, is it known hor it compare to the ordinairy > EdgeR on the one hand and the full QuasiSeq on the other. As Ryan has pointed out, the quasi-likelihood code in edgeR is not an approximation. It is an independent implementation that leverages the additional capabilities of the limma and edgeR packages. For example, the edgeR quasi-likelihood code on the devel repository has a robust empirical Bayes implementation that is very promising. > 3. And I guess that the third question is for Gordon - Is using DSS and > QuasiSeq (or EdgeR) together desireable and if so, are there plans to > incorporate DSS into QuasiSeq (EdgeR). No, one cannot combine DSS (or any other) shrinkage with quasi- likelihood shrinkage. One cannot shrink values that have already been shrunk. There is no need need for this anyway. > My note was planning ahead. I will still be in the microarray world for > a more few weeks before I return to learning RNASeq. I wanted to know > what the best practice is. If you (or anybody out there) develops a > script to meld the two methods, I am sure that it would be interesting > to the list. I doubt that there is a single best method, although several methods are quite good. RNA-seq data analysis is a very fast moving area and I don't think that it has settled down yet. Most method authors probably feel that their own method is best. However different people get different results from different simulation comparisons, so it would be wise to reserve judgement when new papers appear. Best wishes Gordon > Best wishes, > Rich > > > > > > On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote: > >> Dear Rich, >> From what I can tell, it should be possible. The development version of DESeq2 implements the DSS "squeezing" method combined with edgeR's Cox-Reid dispersion estimation. You could use DESeq2 to estimate dispersions, and then copy those dispersion values into an edgeR DGEList object. Then you can use edgeR::glmQLFTest, which implements (approximately) the QuasiSeq method. >> I have not had time yet to investigate putting these packages together in this way, but it is something I plan to look at. I'm certain that the combination is technically possible, and I'm reasonably sure that the result would be statistically meaningful. >> -Ryan Thompson >> On Mar 12, 2013 7:06 AM, "Richard Friedman" <friedman at="" cancercenter.columbia.edu=""> wrote: >> Dear List. >> >> The papers on DSS (included in Bioconductor): >> >> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves >> differential expression detection in RNA-seq data. Biostatistics. 2013 >> Apr;14(2):232-43. >> >> and QuasiSeq (included in CRAN): >> >> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression >> in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. >> Stat Appl Genet Mol Biol. 2012 >> >> both give evidence of superior performance to edgeR (if I understand them correctly). >> >> Have the two methods been compared? >> Can the 2 methods been combined (with DSS estimating the dispersion used in >> the quasi-negative bionomial disribution used in QuasiSeq)? >> >> I would appreciate any insight with respect to what is the overall best >> method for differential expression in RNASeq available at present. >> >> Thanks and best wishes, >> Rich >> >> >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >> Columbia Initiative in Systems Biology >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> Fritz Lang. Didn't he do "Star Trek". >> -Rose Friedman, age 16 >> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.8 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Rich, > Date: Tue, 12 Mar 2013 13:11:25 -0400 > From: Richard Friedman <friedman at="" cancercenter.columbia.edu=""> > To: "Ryan C. Thompson" <rct at="" thompsonclan.org=""> > Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] QuasiSeq vs DSS > > Dear Ryan, > > Thank you for your response. > 3 questions: > 1. If I had just a simple pairwise comparison is it known DSS or > QuasiSeq better? > 2. I was unaware that an approximate implementation of QuasiSeq was > available in edgeR. If so, is it known how it compare to the ordinairy > EdgeR on the one hand and the full QuasiSeq on the other. As Ryan has pointed out, the quasi-likelihood code in edgeR is not an approximation. It is an independent implementation that leverages the additional capabilities of the limma and edgeR packages. For example, the edgeR quasi-likelihood code on the devel repository has a robust empirical Bayes implementation that is very promising. > 3. And I guess that the third question is for Gordon - Is using DSS and > QuasiSeq (or EdgeR) together desireable and if so, are there plans to > incorporate DSS into QuasiSeq (EdgeR). No, one cannot combine DSS (or any other) shrinkage with quasi- likelihood shrinkage. One cannot shrink values that have already been shrunk. There is no need need for this anyway. > My note was planning ahead. I will still be in the microarray world for > a more few weeks before I return to learning RNASeq. I wanted to know > what the best practice is. If you (or anybody out there) develops a > script to meld the two methods, I am sure that it would be interesting > to the list. I doubt that there is a single best method, although several methods are quite good. RNA-seq data analysis is a very fast moving area and I don't think that it has settled down yet. Most method authors probably feel that their own method is best. However different people get different results from different simulation comparisons, so it would be wise to reserve judgement when new papers appear. Moreover, what may be best for one dataset may not be best for another. Best wishes Gordon > Best wishes, > Rich > > > On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote: > >> Dear Rich, >> From what I can tell, it should be possible. The development version of >> DESeq2 implements the DSS "squeezing" method combined with edgeR's >> Cox-Reid dispersion estimation. You could use DESeq2 to estimate >> dispersions, and then copy those dispersion values into an edgeR >> DGEList object. Then you can use edgeR::glmQLFTest, which implements >> (approximately) the QuasiSeq method. I have not had time yet to >> investigate putting these packages together in this way, but it is >> something I plan to look at. I'm certain that the combination is >> technically possible, and I'm reasonably sure that the result would be >> statistically meaningful. >> -Ryan Thompson >> On Mar 12, 2013 7:06 AM, "Richard Friedman" <friedman at="" cancercenter.columbia.edu=""> wrote: >> Dear List. >> >> The papers on DSS (included in Bioconductor): >> >> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves >> differential expression detection in RNA-seq data. Biostatistics. 2013 >> Apr;14(2):232-43. >> >> and QuasiSeq (included in CRAN): >> >> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential >> expression in RNA-sequence data using quasi-likelihood with shrunken >> dispersion estimates. Stat Appl Genet Mol Biol. 2012 >> >> both give evidence of superior performance to edgeR (if I understand >> them correctly). >> >> Have the two methods been compared? Can the 2 methods been combined >> (with DSS estimating the dispersion used in the quasi-negative >> bionomial disribution used in QuasiSeq)? >> >> I would appreciate any insight with respect to what is the overall best >> method for differential expression in RNASeq available at present. >> >> Thanks and best wishes, >> Rich >> >> >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >> Columbia Initiative in Systems Biology >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 12.8 years ago Gordon Smyth 53k

0

Entering edit mode

Deisy Gysi ▴ 20

@deisy-gysi-5842

Last seen 11.3 years ago

Hi I'm having some troubles in the importation of an archive .GFF3, i've read the Help and the vinhtes so many times, made so many things and still don't work! > gffpath = system.file("gff3/chr1_1_247249719.gff3", package="gwascat") > library(rtracklayer) > c1 = import(gffpath, asRangedData = FALSE) Erro em .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 975: range cannot be determined from the supplied arguments (too many NAs) Here is part of the database... I cannot found the error!! The first line is the line 975, where R says that there is "too many NAs": chr1 UCSC cytoband 7100000 9200000 . + . Parent=Sequence:Cytoband:1p36.23;Alias=p36.23;Alias=1p36.23;Stain=gpos 25 chr1 UCSC chromosome 5300000 7100000 . + . ID=Sequence:Cytoband:1p36.31;Alias=p36.31;Alias=1p36.31;Stain=gneg chr1 UCSC cytoband 5300000 7100000 . + . Parent=Sequence:Cytoband:1p36.31;Alias=p36.31;Alias=1p36.31;Stain=gneg chr1 UCSC chromosome 2300000 5300000 . + . ID=Sequence:Cytoband:1p36.32;Alias=p36.32;Alias=1p36.32;Stain=gpos25 chr1 UCSC cytoband 2300000 5300000 . + . Parent=Sequence:Cytoband:1p36.32;Alias=p36.32;Alias=1p36.32;Stain=gpos 25 chr1 UCSC chromosome . 2300000 . + . ID=Sequence:Cytoband:1p36.33;Alias=p36.33;Alias=1p36.33;Stain=gneg chr1 UCSC cytoband . 2300000 . + . Parent=Sequence:Cytoband:1p36.33;Alias=p36.33;Alias=1p36.33;Stain=gneg Thanks in advance for help! -- Atenciosamente, Deisy Morselli Gysi Bacharel em Biotecnologia (PUC-PR) Graduanda em Estatística (UFPR) Aluna de Iniciação Científica (PIBIC/UFPR) [[alternative HTML version deleted]]

ADD COMMENT • link 12.8 years ago Deisy Gysi ▴ 20

0

Entering edit mode

Hello, What version of gwascat and rtracklayer are you using? Please provide the output of sessionInfo(). I think you have an old version of gwascat. The chr1 data you are trying to load doesn't exist in the package in release or devel and has been replaced by chr17 data. The most current vignette for gwascat is located here, http://bioconductor.org/packages/2.12/bioc/html/gwascat.html Help with updating R and Bioconductor packages is here, http://bioconductor.org/install/ Valerie On 03/22/2013 06:40 AM, Deisy Gysi wrote: > Hi I'm having some troubles in the importation of an archive .GFF3, i've > read the Help and the vinhtes so many times, made so many things and still > don't work! > >> gffpath = system.file("gff3/chr1_1_247249719.gff3", package="gwascat") >> library(rtracklayer) >> c1 = import(gffpath, asRangedData = FALSE) > Erro em .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : > solving row 975: range cannot be determined from the supplied arguments > (too many NAs) > > > Here is part of the database... > I cannot found the error!! > > The first line is the line 975, where R says that there is "too many NAs": > > chr1 UCSC cytoband 7100000 9200000 . + . > Parent=Sequence:Cytoband:1p36.23;Alias=p36.23;Alias=1p36.23;Stain=gp os25 > chr1 UCSC chromosome 5300000 7100000 . + . > ID=Sequence:Cytoband:1p36.31;Alias=p36.31;Alias=1p36.31;Stain=gneg > chr1 UCSC cytoband 5300000 7100000 . + . > Parent=Sequence:Cytoband:1p36.31;Alias=p36.31;Alias=1p36.31;Stain=gneg > chr1 UCSC chromosome 2300000 5300000 . + . > ID=Sequence:Cytoband:1p36.32;Alias=p36.32;Alias=1p36.32;Stain=gpos25 > chr1 UCSC cytoband 2300000 5300000 . + . > Parent=Sequence:Cytoband:1p36.32;Alias=p36.32;Alias=1p36.32;Stain=gp os25 > chr1 UCSC chromosome . 2300000 . + . > ID=Sequence:Cytoband:1p36.33;Alias=p36.33;Alias=1p36.33;Stain=gneg > chr1 UCSC cytoband . 2300000 . + . > Parent=Sequence:Cytoband:1p36.33;Alias=p36.33;Alias=1p36.33;Stain=gneg > > > Thanks in advance for help! > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 12.8 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.