Problem elaborating contrasts for DESeq2 analysis
2
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States
hi Marine, Note that you are using an older version of DESeq2 and Bioconductor (DESeq2 v1.4 was released in April 2014). So if you are searching by Google for documentation, it might not apply to your software (we print the version number on the first page of the vignettes). You can always get the correct documentation by using: browseVignettes("DESeq2") and ?results for the man page for the results() function. If you can update to the release version of Bioconductor (by updating to R 3.1.0), you can obtain the contrast you are interested in with: results(dds, contrast=list(c("conditionA1","conditionA2","conditionA3"), c("conditionB1","conditionB2","conditionB3")), listValues=c(1/3, -1/3)) If you aren't able to update to R 3.1.0 and the release version of Bioconductor, you can hack your contrast of interest for v1.2 like so: design(dds) <- ~ condition + 0 dds <- DESeq(dds, betaPrior=FALSE) # note that the resultsNames(dds) will not be labelled correctly # with design having + 0 (they still have the _vs_ in the names) # but the following provides the correct contrast attr(dds, "modelMatrix") results(dds, contrast = c(1,1,1,-1,-1,-1,0,0,0) / 3) Mike On Mon, Jul 21, 2014 at 4:46 AM, Marine Rohmer <marine.rohmer at="" mgx.cnrs.fr=""> wrote: > Dear bioconductor mailing list, > > I want to do a differential analysis using the GLM with DESeq2. > We have 18 RNA samples, including 6 samples subjected to treatment A, 6 > others subjected to treatment B, and the last 6 subjected to treatment C. > Each treatment group is divided into 3 colonies, in which one we have 2 > replicates (leading us to 6 samples per treatment). > Colonies A1, A2 and A3 have been treated with the A treatment, colonies B* > with the B treatment, and colonies C* with the C treatment. > (A picture of the experiment design is attached to this email). > > These treatments are what we want to study, being aware of the "colony > effect". > > It is important to note that both factors "colony" and "treatment" are in > linear correlation, so that the classic contrast "A - B" does not work (--> > "design matrix not of full rank"). > That's why I want to study, for example, the contrast : (A1+A2+A3 - > B1-B2-B3)/3 . > > I've followed the DESeq2 documentation, and it seems I can do that with the > numeric contrast vector as described in the 3.2. section. > My problem is that when typing "resultsNames(ddsCtrst)", I don't have "A1", > "A2", "A3" etc... as described in the doc, but directly some "A2_vs_A1", > "A2_vs_A1" etc... so that I can't do my numeric contrast vector as I wanted. > I don't know why I have these "_vs_something" because I did not write them > myself. > > This is my code : > > ### Preparing the analysis > library(DESeq2) > # Using edgeR to have the readDGE function > library(edgeR) > > # fileT is my target.txt tabulated file, containing : > # files group description colony > # path/to/countsFile.txt A A1_1 A1 > # path/to/countsFile.txt A A1_2 A1 > # path/to/countsFile.txt A A2_1 A2 > # etc... > > targets <- read.delim(file = fileT, stringsAsFactors = FALSE,header = TRUE, > comment.char="" ) > d2 <- readDGE(targets, columns=c(1,2), header = FALSE) > d.RLE <- d2 > d.RLE <- calcNormFactors(d.RLE,method="RLE") > > ### Preparing the design and contrast > descexp=c(d2$samples$colony) > # "condition" is my colony factor "A1", "A2", "A3", "B1", "B2" etc... > descexp=data.frame(condition=descexp) > >> descexp > > condition > 1 A1 > 2 A1 > 3 A2 > 4 A2 > 5 A3 > 6 A3 > 7 B1 > 8 B1 > 9 B2 > 10 B2 > 11 B3 > 12 B3 > 13 C1 > 14 C1 > 15 C2 > 16 C2 > 17 C3 > 18 C3 > > dds <- DESeqDataSetFromMatrix(countData = > d2$counts,colData=descexp,design=~condition) > > ### Checking the condition : each colony in two replicates >> >> dds$condition > > [1] A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 C1 C1 C2 C2 C3 C3 > Levels: A1 A2 A3 B1 B2 B3 C1 C2 C3 > > ### Analysis > dds <- DESeq(dds) > > ### Now when typing resultsNames(dds) : >> >> resultsNames(dds) > > [1] "Intercept" "condition_A2_vs_A1" "condition_A3_vs_A1" > [4] "condition_B1_vs_A1" "condition_B2_vs_A1" "condition_B3_vs_A1" > [7] "condition_C1_vs_A1" "condition_C2_vs_A1" "condition_C3_vs_A1" > > ### Whereas I want something like described in the DESeq2 doc, that is to > say without the "_vs_something"... > > Does anyone know what is wrong is my code (or my experiment design) ? > I hope my message is understable. Don't hesitate to ask me other details. > > Thank you for advance for your attention, > > Best regards, > > M. R. > > NB : > >> sessionInfo() > > R version 3.0.2 (2013-09-25) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C > [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 > [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 > [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] edgeR_3.4.0 limma_3.18.2 DESeq2_1.2.5 > [4] RcppArmadillo_0.3.920.1 Rcpp_0.10.6 GenomicRanges_1.14.3 > [7] XVector_0.2.0 IRanges_1.20.5 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 > [4] DBI_0.2-7 genefilter_1.44.0 grid_3.0.2 > [7] lattice_0.20-24 locfit_1.5-9.1 RColorBrewer_1.0-5 > [10] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 > [13] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 > [16] xtable_1.7-1
edgeR DESeq2 edgeR DESeq2 • 2.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States
hi Marine, I heard about some stalled runs for v1.2 (but only heard about them after the release cycle had ended). I haven't heard of any stalled runs for the current release. anyway, I believe the stalls were related to rows with very low counts, but I'm not certain. You might try first subsetting: dds = dds[rowMeans(counts(dds)) > 5,]. If this doesn't work you'll have to upgrade to the release version. On Tue, Jul 22, 2014 at 11:18 AM, Marine Rohmer <marine.rohmer at="" mgx.cnrs.fr=""> wrote: > Hi Michael, > > First of all thank you a lot for your answer! > So you think that the "_vs_" names are only due to the older version ? > On my workplace I can't update these packages but I will try to install them > on my own computer in the latest version, and see if it goes better. > > Waiting for that, I've tried as you suggested : > > dds <- > DESeqDataSetFromMatrix(countData=d2$counts,colData=descexp,design=~c ondition+0) > dds <- DESeq(dds, betaPrior=FALSE) > > This second line turns but never ends (At least within 1 hour). > > Thank you for your attention, > > Best regards, > > Marine > > > Le 2014-07-21 16:32, Michael Love a ?crit : > >> hi Marine, >> >> Note that you are using an older version of DESeq2 and Bioconductor >> (DESeq2 v1.4 was released in April 2014). So if you are searching by >> Google for documentation, it might not apply to your software (we >> print the version number on the first page of the vignettes). You can >> always get the correct documentation by using: >> browseVignettes("DESeq2") and ?results for the man page for the >> results() function. >> >> If you can update to the release version of Bioconductor (by updating >> to R 3.1.0), you can obtain the contrast you are interested in with: >> >> results(dds, contrast=list(c("conditionA1","conditionA2","conditionA3"), >> c("conditionB1","conditionB2","conditionB3")), listValues=c(1/3, >> -1/3)) >> >> If you aren't able to update to R 3.1.0 and the release version of >> Bioconductor, you can hack your contrast of interest for v1.2 like so: >> >> design(dds) <- ~ condition + 0 >> dds <- DESeq(dds, betaPrior=FALSE) >> # note that the resultsNames(dds) will not be labelled correctly >> # with design having + 0 (they still have the _vs_ in the names) >> # but the following provides the correct contrast >> attr(dds, "modelMatrix") >> results(dds, contrast = c(1,1,1,-1,-1,-1,0,0,0) / 3) >> >> Mike >> >> On Mon, Jul 21, 2014 at 4:46 AM, Marine Rohmer >> <marine.rohmer at="" mgx.cnrs.fr=""> wrote: >>> >>> Dear bioconductor mailing list, >>> >>> I want to do a differential analysis using the GLM with DESeq2. >>> We have 18 RNA samples, including 6 samples subjected to treatment A, 6 >>> others subjected to treatment B, and the last 6 subjected to treatment C. >>> Each treatment group is divided into 3 colonies, in which one we have 2 >>> replicates (leading us to 6 samples per treatment). >>> Colonies A1, A2 and A3 have been treated with the A treatment, colonies >>> B* >>> with the B treatment, and colonies C* with the C treatment. >>> (A picture of the experiment design is attached to this email). >>> >>> These treatments are what we want to study, being aware of the "colony >>> effect". >>> >>> It is important to note that both factors "colony" and "treatment" are in >>> linear correlation, so that the classic contrast "A - B" does not work >>> (--> >>> "design matrix not of full rank"). >>> That's why I want to study, for example, the contrast : (A1+A2+A3 - >>> B1-B2-B3)/3 . >>> >>> I've followed the DESeq2 documentation, and it seems I can do that with >>> the >>> numeric contrast vector as described in the 3.2. section. >>> My problem is that when typing "resultsNames(ddsCtrst)", I don't have >>> "A1", >>> "A2", "A3" etc... as described in the doc, but directly some "A2_vs_A1", >>> "A2_vs_A1" etc... so that I can't do my numeric contrast vector as I >>> wanted. >>> I don't know why I have these "_vs_something" because I did not write >>> them >>> myself. >>> >>> This is my code : >>> >>> ### Preparing the analysis >>> library(DESeq2) >>> # Using edgeR to have the readDGE function >>> library(edgeR) >>> >>> # fileT is my target.txt tabulated file, containing : >>> # files group description colony >>> # path/to/countsFile.txt A A1_1 A1 >>> # path/to/countsFile.txt A A1_2 A1 >>> # path/to/countsFile.txt A A2_1 A2 >>> # etc... >>> >>> targets <- read.delim(file = fileT, stringsAsFactors = FALSE,header = >>> TRUE, >>> comment.char="" ) >>> d2 <- readDGE(targets, columns=c(1,2), header = FALSE) >>> d.RLE <- d2 >>> d.RLE <- calcNormFactors(d.RLE,method="RLE") >>> >>> ### Preparing the design and contrast >>> descexp=c(d2$samples$colony) >>> # "condition" is my colony factor "A1", "A2", "A3", "B1", "B2" etc... >>> descexp=data.frame(condition=descexp) >>> >>>> descexp >>> >>> >>> condition >>> 1 A1 >>> 2 A1 >>> 3 A2 >>> 4 A2 >>> 5 A3 >>> 6 A3 >>> 7 B1 >>> 8 B1 >>> 9 B2 >>> 10 B2 >>> 11 B3 >>> 12 B3 >>> 13 C1 >>> 14 C1 >>> 15 C2 >>> 16 C2 >>> 17 C3 >>> 18 C3 >>> >>> dds <- DESeqDataSetFromMatrix(countData = >>> d2$counts,colData=descexp,design=~condition) >>> >>> ### Checking the condition : each colony in two replicates >>>> >>>> >>>> dds$condition >>> >>> >>> [1] A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 C1 C1 C2 C2 C3 C3 >>> Levels: A1 A2 A3 B1 B2 B3 C1 C2 C3 >>> >>> ### Analysis >>> dds <- DESeq(dds) >>> >>> ### Now when typing resultsNames(dds) : >>>> >>>> >>>> resultsNames(dds) >>> >>> >>> [1] "Intercept" "condition_A2_vs_A1" "condition_A3_vs_A1" >>> [4] "condition_B1_vs_A1" "condition_B2_vs_A1" "condition_B3_vs_A1" >>> [7] "condition_C1_vs_A1" "condition_C2_vs_A1" "condition_C3_vs_A1" >>> >>> ### Whereas I want something like described in the DESeq2 doc, that is to >>> say without the "_vs_something"... >>> >>> Does anyone know what is wrong is my code (or my experiment design) ? >>> I hope my message is understable. Don't hesitate to ask me other details. >>> >>> Thank you for advance for your attention, >>> >>> Best regards, >>> >>> M. R. >>> >>> NB : >>> >>>> sessionInfo() >>> >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 >>> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 >>> [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] edgeR_3.4.0 limma_3.18.2 DESeq2_1.2.5 >>> [4] RcppArmadillo_0.3.920.1 Rcpp_0.10.6 GenomicRanges_1.14.3 >>> [7] XVector_0.2.0 IRanges_1.20.5 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 >>> [4] DBI_0.2-7 genefilter_1.44.0 grid_3.0.2 >>> [7] lattice_0.20-24 locfit_1.5-9.1 RColorBrewer_1.0-5 >>> [10] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 >>> [13] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 >>> [16] xtable_1.7-1
ADD COMMENT
0
Entering edit mode
@marine-rohmer-6662
Last seen 9.6 years ago
Hi Michael, First of all thank you a lot for your answer! So you think that the "_vs_" names are only due to the older version ? On my workplace I can't update these packages but I will try to install them on my own computer in the latest version, and see if it goes better. Waiting for that, I've tried as you suggested : dds <- DESeqDataSetFromMatrix(countData=d2$counts,colData=descexp,design=~con dition+0) dds <- DESeq(dds, betaPrior=FALSE) This second line turns but never ends (At least within 1 hour). Thank you for your attention, Best regards, Marine Le 2014-07-21 16:32, Michael Love a ?crit?: > hi Marine, > > Note that you are using an older version of DESeq2 and Bioconductor > (DESeq2 v1.4 was released in April 2014). So if you are searching by > Google for documentation, it might not apply to your software (we > print the version number on the first page of the vignettes). You can > always get the correct documentation by using: > browseVignettes("DESeq2") and ?results for the man page for the > results() function. > > If you can update to the release version of Bioconductor (by updating > to R 3.1.0), you can obtain the contrast you are interested in with: > > results(dds, > contrast=list(c("conditionA1","conditionA2","conditionA3"), > c("conditionB1","conditionB2","conditionB3")), listValues=c(1/3, > -1/3)) > > If you aren't able to update to R 3.1.0 and the release version of > Bioconductor, you can hack your contrast of interest for v1.2 like so: > > design(dds) <- ~ condition + 0 > dds <- DESeq(dds, betaPrior=FALSE) > # note that the resultsNames(dds) will not be labelled correctly > # with design having + 0 (they still have the _vs_ in the names) > # but the following provides the correct contrast > attr(dds, "modelMatrix") > results(dds, contrast = c(1,1,1,-1,-1,-1,0,0,0) / 3) > > Mike > > On Mon, Jul 21, 2014 at 4:46 AM, Marine Rohmer > <marine.rohmer at="" mgx.cnrs.fr=""> wrote: >> Dear bioconductor mailing list, >> >> I want to do a differential analysis using the GLM with DESeq2. >> We have 18 RNA samples, including 6 samples subjected to treatment A, >> 6 >> others subjected to treatment B, and the last 6 subjected to treatment >> C. >> Each treatment group is divided into 3 colonies, in which one we have >> 2 >> replicates (leading us to 6 samples per treatment). >> Colonies A1, A2 and A3 have been treated with the A treatment, >> colonies B* >> with the B treatment, and colonies C* with the C treatment. >> (A picture of the experiment design is attached to this email). >> >> These treatments are what we want to study, being aware of the "colony >> effect". >> >> It is important to note that both factors "colony" and "treatment" are >> in >> linear correlation, so that the classic contrast "A - B" does not work >> (--> >> "design matrix not of full rank"). >> That's why I want to study, for example, the contrast : (A1+A2+A3 - >> B1-B2-B3)/3 . >> >> I've followed the DESeq2 documentation, and it seems I can do that >> with the >> numeric contrast vector as described in the 3.2. section. >> My problem is that when typing "resultsNames(ddsCtrst)", I don't have >> "A1", >> "A2", "A3" etc... as described in the doc, but directly some >> "A2_vs_A1", >> "A2_vs_A1" etc... so that I can't do my numeric contrast vector as I >> wanted. >> I don't know why I have these "_vs_something" because I did not write >> them >> myself. >> >> This is my code : >> >> ### Preparing the analysis >> library(DESeq2) >> # Using edgeR to have the readDGE function >> library(edgeR) >> >> # fileT is my target.txt tabulated file, containing : >> # files group description colony >> # path/to/countsFile.txt A A1_1 A1 >> # path/to/countsFile.txt A A1_2 A1 >> # path/to/countsFile.txt A A2_1 A2 >> # etc... >> >> targets <- read.delim(file = fileT, stringsAsFactors = FALSE,header = >> TRUE, >> comment.char="" ) >> d2 <- readDGE(targets, columns=c(1,2), header = FALSE) >> d.RLE <- d2 >> d.RLE <- calcNormFactors(d.RLE,method="RLE") >> >> ### Preparing the design and contrast >> descexp=c(d2$samples$colony) >> # "condition" is my colony factor "A1", "A2", "A3", "B1", "B2" etc... >> descexp=data.frame(condition=descexp) >> >>> descexp >> >> condition >> 1 A1 >> 2 A1 >> 3 A2 >> 4 A2 >> 5 A3 >> 6 A3 >> 7 B1 >> 8 B1 >> 9 B2 >> 10 B2 >> 11 B3 >> 12 B3 >> 13 C1 >> 14 C1 >> 15 C2 >> 16 C2 >> 17 C3 >> 18 C3 >> >> dds <- DESeqDataSetFromMatrix(countData = >> d2$counts,colData=descexp,design=~condition) >> >> ### Checking the condition : each colony in two replicates >>> >>> dds$condition >> >> [1] A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 C1 C1 C2 C2 C3 C3 >> Levels: A1 A2 A3 B1 B2 B3 C1 C2 C3 >> >> ### Analysis >> dds <- DESeq(dds) >> >> ### Now when typing resultsNames(dds) : >>> >>> resultsNames(dds) >> >> [1] "Intercept" "condition_A2_vs_A1" "condition_A3_vs_A1" >> [4] "condition_B1_vs_A1" "condition_B2_vs_A1" "condition_B3_vs_A1" >> [7] "condition_C1_vs_A1" "condition_C2_vs_A1" "condition_C3_vs_A1" >> >> ### Whereas I want something like described in the DESeq2 doc, that is >> to >> say without the "_vs_something"... >> >> Does anyone know what is wrong is my code (or my experiment design) ? >> I hope my message is understable. Don't hesitate to ask me other >> details. >> >> Thank you for advance for your attention, >> >> Best regards, >> >> M. R. >> >> NB : >> >>> sessionInfo() >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 >> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 >> [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods >> [8] base >> >> other attached packages: >> [1] edgeR_3.4.0 limma_3.18.2 DESeq2_1.2.5 >> [4] RcppArmadillo_0.3.920.1 Rcpp_0.10.6 >> GenomicRanges_1.14.3 >> [7] XVector_0.2.0 IRanges_1.20.5 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 >> [4] DBI_0.2-7 genefilter_1.44.0 grid_3.0.2 >> [7] lattice_0.20-24 locfit_1.5-9.1 RColorBrewer_1.0-5 >> [10] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 >> [13] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 >> [16] xtable_1.7-1
ADD COMMENT

Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6