Search
Question: DEXSeq continous variable in model
0
4.2 years ago by
United States
Alex Gutteridge650 wrote:
On 30-07-2014 10:13, Alex Gutteridge wrote: > On 29-07-2014 10:00, Alex Gutteridge wrote: >> Hi, >> >> I have a time course RNASeq experiment and I'd like to detect DEU >> between early and late stages. I am trying to use 'Age' as a >> continuous variable in my design, but I'm getting an error which I >> suspect is related to this e.g: >> >>> summary(sample.data.ipsc[,1:6]) >> Sample CellType Subtype Donor Age >> Length:28 iPSC:28 iPSC :28 4555 :28 Min. : 2.000 >> Class :character HBR : 0 D4721 : 0 1st Qu.: 4.000 >> Mode :character L2 : 0 D4749 : 0 Median : 8.000 >> L3 : 0 D6002 : 0 Mean : 7.821 >> L4 : 0 D6005 : 0 3rd Qu.:10.500 >> L5 : 0 D6008 : 0 Max. :14.000 >> (Other): 0 (Other): 0 >> Passage >> 42:3 >> 48:7 >> 49:5 >> 52:6 >> 56:7 >> >>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >> sampleData=sample.data.ipsc, >> design= ~ sample + exon + Age:exon, >> flattenedfile=flattenedFile) >> >>> BPPARAM = MulticoreParam(workers=24) >> >>> dxd = estimateSizeFactors(dxd) >>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", >>> BPPARAM=BPPARAM) >> Error: 8346 errors; first error: >> Error in FUN(1:3[[1L]], ...): Non-factor in model frame >> >> For more information, use bplasterror(). To resume calculation, >> re-call >> the function and set the argument 'BPRESUME' to TRUE or wrap the >> previous call in bpresume(). >> >> First traceback: >> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM = >> BPPARAM) >> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed, >> mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM), >> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal else >> FALSE) >> 27: lapply(seq_len(cores), inner.do) >> 26: lapply(seq_len(cores), inner.do) >> 25: FUN(1:24[[1L]], ...) >> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) >> 23: parallel:::sendMaster(...) >> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE) >> 21: tryCatch(expr, error = function(e) >> >> Can DEXSeq accept a model with a continuous variable? Does it make >> sense to do so? (I do the same thing with DESeq2 to detected DE and it >> works fine). Is this error due to that? Note that all the other steps >> seem to run fine and I can get results (though I don't have many >> significant hits - not sure if this is related or not). If not what is >> best practice? Just split the data set into 'early' and 'late' samples >> and run that as a factor? >> >> Alex Gutteridge > > To sort of answer my own question: cutting the age vector into two > bins (early <8 weeks and late >8 weeks) and using that factor seems to > work and give many significant hits: > >> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2) > >> dxd = DEXSeqDataSetFromHTSeq(countFiles, > sampleData=sample.data.ipsc, > design= ~ sample + exon + AgeBin:exon, > flattenedfile=flattenedFile) > >> BPPARAM = MulticoreParam(workers=12) > >> dxd = estimateSizeFactors(dxd) >> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >> dxd = testForDEU(dxd, BPPARAM=BPPARAM) > > Which makes me think that DEXSeq wasn't doing what I hoped with the > previous analysis (quite apart from the error). Is there any way to > rejig the design to cope with a continuous variable as opposed to a > factor? I *think* the underlying biological question makes sense: Are > there exons showing DEU that correlates with time? > > Alex Gutteridge I'm just going to bump this question one last time as I'd love to know if there's a better solution to what I have now. In short: I have a reasonable detailed time course RNASeq experiment (8 timepoints; 2-14 weeks; 3-4 replicates at all time points except one) and would like to detect time dependent DEU using DEXSeq. Slicing the timecourse into two bins (at 8 weeks) and doing a straight early vs late comparison works and detects plenty of interesting stuff. Trying to model age as a continuous variable gives an error when determining fold changes and doesn't report any significant DEU. See first plot here for an example (exons 27/28 show strong DEU comparing <8 weeks to >8 week samples) shown with a DEXSeq plot: http://dx.doi.org/10.6084/m9.figshare.1124172 But the 8 week split is entirely arbitrary, really time (Age) is a continuous variable and so I'd like to model it as such (maybe this is just OCD on my behalf, but I'm sure I must also be loosing some power by crudely splitting the timecourse like this; also DESeq2 lets me model the genewise DE with a continuous variable). The second plot from the figshare link shows the RPKMs of those two exons (27/28). On top of a background of increased gene-wise expression there is clearly a time dependent effect on the relative usage of these two exons, which it must be possible to model better than a crude split at 8w (black dashed line). Is there any trick to do this with DEXSeq? Alex Gutteridge
modified 4.2 years ago by Alejandro Reyes1.6k • written 4.2 years ago by Alex Gutteridge650
1
4.2 years ago by
Alejandro Reyes1.6k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.6k wrote:
Hi Alex, Yes, I think that the testing using DEXSeq should also work with the continuous model! I had some discussions within the lab and there was a misunderstanding from my side with regards to the fold changes with continuous variables. Wolfgang (cc-ed) clarified to me that if f(x) is the expression level for condition x, the log fold change is then the derivate of f(x). As you mention this is how DESeq2 deals with it, but the current implementation for DEXSeq does not deal with this at the moment. I will work on this and keep you posted, in the meantime the breakdown that you mention is probably the best! Alejandro > Hi Alejandro, > > Thanks for the reply! For your first question I will have to get back > to you as I need to rerun DEXSeq with the continuous design and make > that comparison - it'll take a few hours. I assume from your reply > that the continuous variable *should* work in fact? > > For your second question I guess what I was expecting was to see the > fold change implied from the the model from a unit change in Age (in > this case 1 week). This is for instance how DESeq2 reports it. But I > agree when it comes to actually plotting then a factorial breakdown is > probably simplest. > > Alex Gutteridge > > On 01-08-2014 10:29, Alejandro Reyes wrote: >> Dear Alex Gutteridge, >> >> Thanks for your detailed e-mails and sorry for the delay in the reply. >> >> About the fact that you do get significant hits if you specify factors >> instead of your continuous variable, Is it a thresholding effect or >> the exons that you detect when specifying the factorial design have >> large p-values with the continuous design? >> >> About the exon fold changes error, this function does not support >> continuous variables. I am unsure how would a fold change be with a >> continuous variable? I would maybe partition the "Age" into more than >> 2 factors based on e.g. quantiles or simply make plots like the one >> you showed! >> >> Bests, >> Alejandro Reyes >> >> >> >>> On 30-07-2014 10:13, Alex Gutteridge wrote: >>>> On 29-07-2014 10:00, Alex Gutteridge wrote: >>>>> Hi, >>>>> >>>>> I have a time course RNASeq experiment and I'd like to detect DEU >>>>> between early and late stages. I am trying to use 'Age' as a >>>>> continuous variable in my design, but I'm getting an error which I >>>>> suspect is related to this e.g: >>>>> >>>>>> summary(sample.data.ipsc[,1:6]) >>>>> Sample CellType Subtype Donor Age >>>>> Length:28 iPSC:28 iPSC :28 4555 :28 Min. : 2.000 >>>>> Class :character HBR : 0 D4721 : 0 1st Qu.: 4.000 >>>>> Mode :character L2 : 0 D4749 : 0 Median : 8.000 >>>>> L3 : 0 D6002 : 0 Mean : 7.821 >>>>> L4 : 0 D6005 : 0 3rd Qu.:10.500 >>>>> L5 : 0 D6008 : 0 Max. :14.000 >>>>> (Other): 0 (Other): 0 >>>>> Passage >>>>> 42:3 >>>>> 48:7 >>>>> 49:5 >>>>> 52:6 >>>>> 56:7 >>>>> >>>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>>>> sampleData=sample.data.ipsc, >>>>> design= ~ sample + exon + Age:exon, >>>>> flattenedfile=flattenedFile) >>>>> >>>>>> BPPARAM = MulticoreParam(workers=24) >>>>> >>>>>> dxd = estimateSizeFactors(dxd) >>>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>>>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", >>>>>> BPPARAM=BPPARAM) >>>>> Error: 8346 errors; first error: >>>>> Error in FUN(1:3[[1L]], ...): Non-factor in model frame >>>>> >>>>> For more information, use bplasterror(). To resume calculation, >>>>> re-call >>>>> the function and set the argument 'BPRESUME' to TRUE or wrap the >>>>> previous call in bpresume(). >>>>> >>>>> First traceback: >>>>> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM = >>>>> BPPARAM) >>>>> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>>>> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>>>> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed, >>>>> mc.silent = !BPPARAM$verbose, mc.cores = >>>>> bpworkers(BPPARAM), >>>>> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal >>>>> else FALSE) >>>>> 27: lapply(seq_len(cores), inner.do) >>>>> 26: lapply(seq_len(cores), inner.do) >>>>> 25: FUN(1:24[[1L]], ...) >>>>> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) >>>>> 23: parallel:::sendMaster(...) >>>>> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE) >>>>> 21: tryCatch(expr, error = function(e) >>>>> >>>>> Can DEXSeq accept a model with a continuous variable? Does it make >>>>> sense to do so? (I do the same thing with DESeq2 to detected DE >>>>> and it >>>>> works fine). Is this error due to that? Note that all the other steps >>>>> seem to run fine and I can get results (though I don't have many >>>>> significant hits - not sure if this is related or not). If not >>>>> what is >>>>> best practice? Just split the data set into 'early' and 'late' >>>>> samples >>>>> and run that as a factor? >>>>> >>>>> Alex Gutteridge >>>> >>>> To sort of answer my own question: cutting the age vector into two >>>> bins (early <8 weeks and late >8 weeks) and using that factor seems to >>>> work and give many significant hits: >>>> >>>>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2) >>>> >>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>>> sampleData=sample.data.ipsc, >>>> design= ~ sample + exon + AgeBin:exon, >>>> flattenedfile=flattenedFile) >>>> >>>>> BPPARAM = MulticoreParam(workers=12) >>>> >>>>> dxd = estimateSizeFactors(dxd) >>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>>> >>>> Which makes me think that DEXSeq wasn't doing what I hoped with the >>>> previous analysis (quite apart from the error). Is there any way to >>>> rejig the design to cope with a continuous variable as opposed to a >>>> factor? I *think* the underlying biological question makes sense: Are >>>> there exons showing DEU that correlates with time? >>>> >>>> Alex Gutteridge >>> >>> I'm just going to bump this question one last time as I'd love to know >>> if there's a better solution to what I have now. In short: I have a >>> reasonable detailed time course RNASeq experiment (8 timepoints; 2-14 >>> weeks; 3-4 replicates at all time points except one) and would like to >>> detect time dependent DEU using DEXSeq. Slicing the timecourse into two >>> bins (at 8 weeks) and doing a straight early vs late comparison works >>> and detects plenty of interesting stuff. Trying to model age as a >>> continuous variable gives an error when determining fold changes and >>> doesn't report any significant DEU. >>> >>> See first plot here for an example (exons 27/28 show strong DEU >>> comparing <8 weeks to >8 week samples) shown with a DEXSeq plot: >>> >>> http://dx.doi.org/10.6084/m9.figshare.1124172 >>> >>> But the 8 week split is entirely arbitrary, really time (Age) is a >>> continuous variable and so I'd like to model it as such (maybe this is >>> just OCD on my behalf, but I'm sure I must also be loosing some >>> power by >>> crudely splitting the timecourse like this; also DESeq2 lets me model >>> the genewise DE with a continuous variable). >>> >>> The second plot from the figshare link shows the RPKMs of those two >>> exons (27/28). On top of a background of increased gene-wise expression >>> there is clearly a time dependent effect on the relative usage of these >>> two exons, which it must be possible to model better than a crude split >>> at 8w (black dashed line). Is there any trick to do this with DEXSeq? >>> >>> Alex Gutteridge >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Alejandro, Just to confirm that your testing was right. I'm not sure what I did wrong the first time I ran the continuous model, but when I reran it over the weekend it worked much better. Apologies for the erroneous bug report! Nice to see my instinct was right and the continuous model gives much more power (see dxr.age.padj vs dxr1 results below - E27/E28 show DEU), this is a really cool result for us, so thank you! It would still be nice to get a log2fold column for the continuous model though, so I will keep an eye out for the next version. > subset(dxr.age.padj, groupID == "ENSG00000169432" & padj < 0.1) LRT p-value: full vs reduced DataFrame with 3 rows and 10 columns groupID featureID exonBaseMean dispersion <character> <character> <numeric> <numeric> ENSG00000169432:E027 ENSG00000169432 E027 169.71429 0.03685301 ENSG00000169432:E028 ENSG00000169432 E028 161.35714 0.03184911 ENSG00000169432:E019 ENSG00000169432 E019 64.89286 0.03007973 stat pvalue padj <numeric> <numeric> <numeric> ENSG00000169432:E027 54.62674 1.457392e-13 2.834817e-10 ENSG00000169432:E028 36.52308 1.508693e-09 6.930432e-07 ENSG00000169432:E019 8.76212 3.075513e-03 4.810069e-02 genomicData countData transcripts <granges> <matrix> <list> ENSG00000169432:E027 2:-:[167160541, 167160632] 411 494 432 ... ######## ENSG00000169432:E028 2:-:[167160748, 167160839] 144 222 126 ... ######## ENSG00000169432:E019 2:-:[167140963, 167140995] 86 151 96 ... ######## > subset(dxr1, groupID == "ENSG00000169432" & padj < 0.1) LRT p-value: full vs reduced DataFrame with 2 rows and 13 columns groupID featureID exonBaseMean dispersion <character> <character> <numeric> <numeric> ENSG00000169432:E027 ENSG00000169432 E027 169.7143 0.06699247 ENSG00000169432:E028 ENSG00000169432 E028 161.3571 0.04371863 stat pvalue padj X.1.99.8. X.8.14. <numeric> <numeric> <numeric> <numeric> <numeric> ENSG00000169432:E027 21.16153 4.221541e-06 0.001944887 18.08856 22.48066 ENSG00000169432:E028 20.71878 5.319188e-06 0.002257030 22.56749 18.23336 log2fold_.1.99.8._.8.14. genomicData <numeric> <granges> ENSG00000169432:E027 -0.3136070 2:-:[167160541, 167160632] ENSG00000169432:E028 0.3076653 2:-:[167160748, 167160839] countData transcripts <matrix> <list> ENSG00000169432:E027 411 494 432 ... ######## ENSG00000169432:E028 144 222 126 ... ######## On 03-08-2014 18:39, Alejandro Reyes wrote: > Hi Alex, > > Yes, I think that the testing using DEXSeq should also work with the > continuous model! > > I had some discussions within the lab and there was a misunderstanding > from my side with regards to the fold changes with continuous > variables. Wolfgang (cc-ed) clarified to me that if f(x) is the > expression level for condition x, the log fold change is then the > derivate of f(x). As you mention this is how DESeq2 deals with it, > but the current implementation for DEXSeq does not deal with this at > the moment. I will work on this and keep you posted, in the meantime > the breakdown that you mention is probably the best! > > Alejandro > >> Hi Alejandro, >> >> Thanks for the reply! For your first question I will have to get back >> to you as I need to rerun DEXSeq with the continuous design and make >> that comparison - it'll take a few hours. I assume from your reply >> that the continuous variable *should* work in fact? >> >> For your second question I guess what I was expecting was to see the >> fold change implied from the the model from a unit change in Age (in >> this case 1 week). This is for instance how DESeq2 reports it. But I >> agree when it comes to actually plotting then a factorial breakdown is >> probably simplest. >> >> Alex Gutteridge >> >> On 01-08-2014 10:29, Alejandro Reyes wrote: >>> Dear Alex Gutteridge, >>> >>> Thanks for your detailed e-mails and sorry for the delay in the >>> reply. >>> >>> About the fact that you do get significant hits if you specify >>> factors >>> instead of your continuous variable, Is it a thresholding effect or >>> the exons that you detect when specifying the factorial design have >>> large p-values with the continuous design? >>> >>> About the exon fold changes error, this function does not support >>> continuous variables. I am unsure how would a fold change be with a >>> continuous variable? I would maybe partition the "Age" into more than >>> 2 factors based on e.g. quantiles or simply make plots like the one >>> you showed! >>> >>> Bests, >>> Alejandro Reyes >>> >>> >>> >>>> On 30-07-2014 10:13, Alex Gutteridge wrote: >>>>> On 29-07-2014 10:00, Alex Gutteridge wrote: >>>>>> Hi, >>>>>> >>>>>> I have a time course RNASeq experiment and I'd like to detect DEU >>>>>> between early and late stages. I am trying to use 'Age' as a >>>>>> continuous variable in my design, but I'm getting an error which I >>>>>> suspect is related to this e.g: >>>>>> >>>>>>> summary(sample.data.ipsc[,1:6]) >>>>>> Sample CellType Subtype Donor Age >>>>>> Length:28 iPSC:28 iPSC :28 4555 :28 Min. : >>>>>> 2.000 >>>>>> Class :character HBR : 0 D4721 : 0 1st Qu.: >>>>>> 4.000 >>>>>> Mode :character L2 : 0 D4749 : 0 Median : >>>>>> 8.000 >>>>>> L3 : 0 D6002 : 0 Mean : >>>>>> 7.821 >>>>>> L4 : 0 D6005 : 0 3rd >>>>>> Qu.:10.500 >>>>>> L5 : 0 D6008 : 0 Max. >>>>>> :14.000 >>>>>> (Other): 0 (Other): 0 >>>>>> Passage >>>>>> 42:3 >>>>>> 48:7 >>>>>> 49:5 >>>>>> 52:6 >>>>>> 56:7 >>>>>> >>>>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>>>>> sampleData=sample.data.ipsc, >>>>>> design= ~ sample + exon + Age:exon, >>>>>> flattenedfile=flattenedFile) >>>>>> >>>>>>> BPPARAM = MulticoreParam(workers=24) >>>>>> >>>>>>> dxd = estimateSizeFactors(dxd) >>>>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>>>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>>>>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", >>>>>>> BPPARAM=BPPARAM) >>>>>> Error: 8346 errors; first error: >>>>>> Error in FUN(1:3[[1L]], ...): Non-factor in model frame >>>>>> >>>>>> For more information, use bplasterror(). To resume calculation, >>>>>> re-call >>>>>> the function and set the argument 'BPRESUME' to TRUE or wrap the >>>>>> previous call in bpresume(). >>>>>> >>>>>> First traceback: >>>>>> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM = >>>>>> BPPARAM) >>>>>> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>>>>> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>>>>> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = >>>>>> BPPARAM$setSeed, >>>>>> mc.silent = !BPPARAM$verbose, mc.cores = >>>>>> bpworkers(BPPARAM), >>>>>> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal >>>>>> else FALSE) >>>>>> 27: lapply(seq_len(cores), inner.do) >>>>>> 26: lapply(seq_len(cores), inner.do) >>>>>> 25: FUN(1:24[[1L]], ...) >>>>>> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = >>>>>> TRUE)) >>>>>> 23: parallel:::sendMaster(...) >>>>>> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE) >>>>>> 21: tryCatch(expr, error = function(e) >>>>>> >>>>>> Can DEXSeq accept a model with a continuous variable? Does it make >>>>>> sense to do so? (I do the same thing with DESeq2 to detected DE >>>>>> and it >>>>>> works fine). Is this error due to that? Note that all the other >>>>>> steps >>>>>> seem to run fine and I can get results (though I don't have many >>>>>> significant hits - not sure if this is related or not). If not >>>>>> what is >>>>>> best practice? Just split the data set into 'early' and 'late' >>>>>> samples >>>>>> and run that as a factor? >>>>>> >>>>>> Alex Gutteridge >>>>> >>>>> To sort of answer my own question: cutting the age vector into two >>>>> bins (early <8 weeks and late >8 weeks) and using that factor seems >>>>> to >>>>> work and give many significant hits: >>>>> >>>>>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2) >>>>> >>>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>>>> sampleData=sample.data.ipsc, >>>>> design= ~ sample + exon + >>>>> AgeBin:exon, >>>>> flattenedfile=flattenedFile) >>>>> >>>>>> BPPARAM = MulticoreParam(workers=12) >>>>> >>>>>> dxd = estimateSizeFactors(dxd) >>>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>>>> >>>>> Which makes me think that DEXSeq wasn't doing what I hoped with the >>>>> previous analysis (quite apart from the error). Is there any way to >>>>> rejig the design to cope with a continuous variable as opposed to a >>>>> factor? I *think* the underlying biological question makes sense: >>>>> Are >>>>> there exons showing DEU that correlates with time? >>>>> >>>>> Alex Gutteridge >>>> >>>> I'm just going to bump this question one last time as I'd love to >>>> know >>>> if there's a better solution to what I have now. In short: I have a >>>> reasonable detailed time course RNASeq experiment (8 timepoints; >>>> 2-14 >>>> weeks; 3-4 replicates at all time points except one) and would like >>>> to >>>> detect time dependent DEU using DEXSeq. Slicing the timecourse into >>>> two >>>> bins (at 8 weeks) and doing a straight early vs late comparison >>>> works >>>> and detects plenty of interesting stuff. Trying to model age as a >>>> continuous variable gives an error when determining fold changes and >>>> doesn't report any significant DEU. >>>> >>>> See first plot here for an example (exons 27/28 show strong DEU >>>> comparing <8 weeks to >8 week samples) shown with a DEXSeq plot: >>>> >>>> http://dx.doi.org/10.6084/m9.figshare.1124172 >>>> >>>> But the 8 week split is entirely arbitrary, really time (Age) is a >>>> continuous variable and so I'd like to model it as such (maybe this >>>> is >>>> just OCD on my behalf, but I'm sure I must also be loosing some >>>> power by >>>> crudely splitting the timecourse like this; also DESeq2 lets me >>>> model >>>> the genewise DE with a continuous variable). >>>> >>>> The second plot from the figshare link shows the RPKMs of those two >>>> exons (27/28). On top of a background of increased gene-wise >>>> expression >>>> there is clearly a time dependent effect on the relative usage of >>>> these >>>> two exons, which it must be possible to model better than a crude >>>> split >>>> at 8w (black dashed line). Is there any trick to do this with >>>> DEXSeq? >>>> >>>> Alex Gutteridge >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
0
4.2 years ago by
Devon Ryan200
Germany
Devon Ryan200 wrote:
Hi Alex, It's best to CC the package maintainer with questions such as these (I've done so on this reply). Keep in mind that it's the height of vacation season, so I wouldn't be overly surprised if it takes longer to get a reply. Best, Devon -- Devon Ryan, Ph.D. Email: dpryan@dpryan.com Laboratory for Molecular and Cellular Cognition German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn Germany <devon.ryan@dzne.de> On Fri, Aug 1, 2014 at 9:47 AM, Alex Gutteridge <alexg@ruggedtextile.com> wrote: > On 30-07-2014 10:13, Alex Gutteridge wrote: > >> On 29-07-2014 10:00, Alex Gutteridge wrote: >> >>> Hi, >>> >>> I have a time course RNASeq experiment and I'd like to detect DEU >>> between early and late stages. I am trying to use 'Age' as a >>> continuous variable in my design, but I'm getting an error which I >>> suspect is related to this e.g: >>> >>> summary(sample.data.ipsc[,1:6]) >>>> >>> Sample CellType Subtype Donor Age >>> Length:28 iPSC:28 iPSC :28 4555 :28 Min. : 2.000 >>> Class :character HBR : 0 D4721 : 0 1st Qu.: 4.000 >>> Mode :character L2 : 0 D4749 : 0 Median : 8.000 >>> L3 : 0 D6002 : 0 Mean : 7.821 >>> L4 : 0 D6005 : 0 3rd Qu.:10.500 >>> L5 : 0 D6008 : 0 Max. :14.000 >>> (Other): 0 (Other): 0 >>> Passage >>> 42:3 >>> 48:7 >>> 49:5 >>> 52:6 >>> 56:7 >>> >>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>>> >>> sampleData=sample.data.ipsc, >>> design= ~ sample + exon + Age:exon, >>> flattenedfile=flattenedFile) >>> >>> BPPARAM = MulticoreParam(workers=24) >>>> >>> >>> dxd = estimateSizeFactors(dxd) >>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", BPPARAM=BPPARAM) >>>> >>> Error: 8346 errors; first error: >>> Error in FUN(1:3[[1L]], ...): Non-factor in model frame >>> >>> For more information, use bplasterror(). To resume calculation, re-call >>> the function and set the argument 'BPRESUME' to TRUE or wrap the >>> previous call in bpresume(). >>> >>> First traceback: >>> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM = >>> BPPARAM) >>> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed, >>> mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM), >>> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal else >>> FALSE) >>> 27: lapply(seq_len(cores), inner.do) >>> 26: lapply(seq_len(cores), inner.do) >>> 25: FUN(1:24[[1L]], ...) >>> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) >>> 23: parallel:::sendMaster(...) >>> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE) >>> 21: tryCatch(expr, error = function(e) >>> >>> Can DEXSeq accept a model with a continuous variable? Does it make >>> sense to do so? (I do the same thing with DESeq2 to detected DE and it >>> works fine). Is this error due to that? Note that all the other steps >>> seem to run fine and I can get results (though I don't have many >>> significant hits - not sure if this is related or not). If not what is >>> best practice? Just split the data set into 'early' and 'late' samples >>> and run that as a factor? >>> >>> Alex Gutteridge >>> >> >> To sort of answer my own question: cutting the age vector into two >> bins (early <8 weeks and late >8 weeks) and using that factor seems to >> work and give many significant hits: >> >> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2) >>> >> >> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>> >> sampleData=sample.data.ipsc, >> design= ~ sample + exon + AgeBin:exon, >> flattenedfile=flattenedFile) >> >> BPPARAM = MulticoreParam(workers=12) >>> >> >> dxd = estimateSizeFactors(dxd) >>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>> >> >> Which makes me think that DEXSeq wasn't doing what I hoped with the >> previous analysis (quite apart from the error). Is there any way to >> rejig the design to cope with a continuous variable as opposed to a >> factor? I *think* the underlying biological question makes sense: Are >> there exons showing DEU that correlates with time? >> >> Alex Gutteridge >> > > I'm just going to bump this question one last time as I'd love to know if > there's a better solution to what I have now. In short: I have a reasonable > detailed time course RNASeq experiment (8 timepoints; 2-14 weeks; 3-4 > replicates at all time points except one) and would like to detect time > dependent DEU using DEXSeq. Slicing the timecourse into two bins (at 8 > weeks) and doing a straight early vs late comparison works and detects > plenty of interesting stuff. Trying to model age as a continuous variable > gives an error when determining fold changes and doesn't report any > significant DEU. > > See first plot here for an example (exons 27/28 show strong DEU comparing > <8 weeks to >8 week samples) shown with a DEXSeq plot: > > http://dx.doi.org/10.6084/m9.figshare.1124172 > > But the 8 week split is entirely arbitrary, really time (Age) is a > continuous variable and so I'd like to model it as such (maybe this is just > OCD on my behalf, but I'm sure I must also be loosing some power by crudely > splitting the timecourse like this; also DESeq2 lets me model the genewise > DE with a continuous variable). > > The second plot from the figshare link shows the RPKMs of those two exons > (27/28). On top of a background of increased gene-wise expression there is > clearly a time dependent effect on the relative usage of these two exons, > which it must be possible to model better than a crude split at 8w (black > dashed line). Is there any trick to do this with DEXSeq? > > > Alex Gutteridge > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]
0
4.2 years ago by
Alejandro Reyes1.6k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.6k wrote:
Dear Alex Gutteridge, Thanks for your detailed e-mails and sorry for the delay in the reply. About the fact that you do get significant hits if you specify factors instead of your continuous variable, Is it a thresholding effect or the exons that you detect when specifying the factorial design have large p-values with the continuous design? About the exon fold changes error, this function does not support continuous variables. I am unsure how would a fold change be with a continuous variable? I would maybe partition the "Age" into more than 2 factors based on e.g. quantiles or simply make plots like the one you showed! Bests, Alejandro Reyes > On 30-07-2014 10:13, Alex Gutteridge wrote: >> On 29-07-2014 10:00, Alex Gutteridge wrote: >>> Hi, >>> >>> I have a time course RNASeq experiment and I'd like to detect DEU >>> between early and late stages. I am trying to use 'Age' as a >>> continuous variable in my design, but I'm getting an error which I >>> suspect is related to this e.g: >>> >>>> summary(sample.data.ipsc[,1:6]) >>> Sample CellType Subtype Donor Age >>> Length:28 iPSC:28 iPSC :28 4555 :28 Min. : 2.000 >>> Class :character HBR : 0 D4721 : 0 1st Qu.: 4.000 >>> Mode :character L2 : 0 D4749 : 0 Median : 8.000 >>> L3 : 0 D6002 : 0 Mean : 7.821 >>> L4 : 0 D6005 : 0 3rd Qu.:10.500 >>> L5 : 0 D6008 : 0 Max. :14.000 >>> (Other): 0 (Other): 0 >>> Passage >>> 42:3 >>> 48:7 >>> 49:5 >>> 52:6 >>> 56:7 >>> >>>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >>> sampleData=sample.data.ipsc, >>> design= ~ sample + exon + Age:exon, >>> flattenedfile=flattenedFile) >>> >>>> BPPARAM = MulticoreParam(workers=24) >>> >>>> dxd = estimateSizeFactors(dxd) >>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", BPPARAM=BPPARAM) >>> Error: 8346 errors; first error: >>> Error in FUN(1:3[[1L]], ...): Non-factor in model frame >>> >>> For more information, use bplasterror(). To resume calculation, re-call >>> the function and set the argument 'BPRESUME' to TRUE or wrap the >>> previous call in bpresume(). >>> >>> First traceback: >>> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM = >>> BPPARAM) >>> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM) >>> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed, >>> mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM), >>> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal >>> else FALSE) >>> 27: lapply(seq_len(cores), inner.do) >>> 26: lapply(seq_len(cores), inner.do) >>> 25: FUN(1:24[[1L]], ...) >>> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) >>> 23: parallel:::sendMaster(...) >>> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE) >>> 21: tryCatch(expr, error = function(e) >>> >>> Can DEXSeq accept a model with a continuous variable? Does it make >>> sense to do so? (I do the same thing with DESeq2 to detected DE and it >>> works fine). Is this error due to that? Note that all the other steps >>> seem to run fine and I can get results (though I don't have many >>> significant hits - not sure if this is related or not). If not what is >>> best practice? Just split the data set into 'early' and 'late' samples >>> and run that as a factor? >>> >>> Alex Gutteridge >> >> To sort of answer my own question: cutting the age vector into two >> bins (early <8 weeks and late >8 weeks) and using that factor seems to >> work and give many significant hits: >> >>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2) >> >>> dxd = DEXSeqDataSetFromHTSeq(countFiles, >> sampleData=sample.data.ipsc, >> design= ~ sample + exon + AgeBin:exon, >> flattenedfile=flattenedFile) >> >>> BPPARAM = MulticoreParam(workers=12) >> >>> dxd = estimateSizeFactors(dxd) >>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM) >>> dxd = testForDEU(dxd, BPPARAM=BPPARAM) >> >> Which makes me think that DEXSeq wasn't doing what I hoped with the >> previous analysis (quite apart from the error). Is there any way to >> rejig the design to cope with a continuous variable as opposed to a >> factor? I *think* the underlying biological question makes sense: Are >> there exons showing DEU that correlates with time? >> >> Alex Gutteridge > > I'm just going to bump this question one last time as I'd love to know > if there's a better solution to what I have now. In short: I have a > reasonable detailed time course RNASeq experiment (8 timepoints; 2-14 > weeks; 3-4 replicates at all time points except one) and would like to > detect time dependent DEU using DEXSeq. Slicing the timecourse into two > bins (at 8 weeks) and doing a straight early vs late comparison works > and detects plenty of interesting stuff. Trying to model age as a > continuous variable gives an error when determining fold changes and > doesn't report any significant DEU. > > See first plot here for an example (exons 27/28 show strong DEU > comparing <8 weeks to >8 week samples) shown with a DEXSeq plot: > > http://dx.doi.org/10.6084/m9.figshare.1124172 > > But the 8 week split is entirely arbitrary, really time (Age) is a > continuous variable and so I'd like to model it as such (maybe this is > just OCD on my behalf, but I'm sure I must also be loosing some power by > crudely splitting the timecourse like this; also DESeq2 lets me model > the genewise DE with a continuous variable). > > The second plot from the figshare link shows the RPKMs of those two > exons (27/28). On top of a background of increased gene-wise expression > there is clearly a time dependent effect on the relative usage of these > two exons, which it must be possible to model better than a crude split > at 8w (black dashed line). Is there any trick to do this with DEXSeq? > > Alex Gutteridge > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor