DESeq2 : when the variable of interest is not a factor

0

Entering edit mode

Marie Sémon ▴ 50

@marie-semon-5275

Last seen 9.6 years ago

Dear all, I wish to find genes whose expression profile correlates with a variable of interest, which is quantitative (say, the weight of the individuals). Is it possible/a good practice to use DESeq2 for this? One one hand the vignette clearly describes factors (qualitative) variables, but, on the other hand there is this sentence, p8: "If the variable of interest is not a factor, the log2 fold change can be interpreted as the amount of doubling observed on average for every unit of change." Many thanks for you help With kind regards, Marie

DESeq2 DESeq2 • 2.9k views

ADD COMMENT • link updated 10.6 years ago by Jose M Garcia Manteiga ▴ 310 • written 10.6 years ago by Marie Sémon ▴ 50

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 2 hours ago

United States

hi Marie, Yes, you can use quantitative / continuous valued variables in the design formula just as you would use a factor variable. The statistical test is the equivalently of the null hypothesis that the coefficient (log2 fold change) associated with the variable is equal to 0. Mike On Wed, Sep 11, 2013 at 11:45 AM, Marie SÃ©mon <marie.semon@ens- lyon.fr="">wrote: > Dear all, > > I wish to find genes whose expression profile correlates with a variable > of interest, which is quantitative (say, the weight of the individuals). Is > it possible/a good practice to use DESeq2 for this? > > One one hand the vignette clearly describes factors (qualitative) > variables, but, on the other hand there is this sentence, p8: > "If the variable of interest is not a factor, the log2 fold change can be > interpreted as the amount of doubling observed on average for every unit of > change." > > Many thanks for you help > > With kind regards, > > Marie > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD COMMENT • link 10.6 years ago Michael Love 41k

0

Entering edit mode

Jose M Garcia Manteiga ▴ 310

@jose-m-garcia-manteiga-6046

Last seen 7.1 years ago

Italy

Dear all, I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. thanks in advance Regards Jose ------------------------------------------------------------ Jose M. Garcia Manteiga PhD Data analyst in Functional Genomics Center for Translational Genomics and Bioinformatics DIBIT2-A3 Room 21 San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439114 On Sep 11, 2013, at 11:45 AM, Marie Sémon <marie.semon@ens-lyon.fr> wrote: > Dear all, > > I wish to find genes whose expression profile correlates with a variable > of interest, which is quantitative (say, the weight of the individuals). > Is it possible/a good practice to use DESeq2 for this? > > One one hand the vignette clearly describes factors (qualitative) > variables, but, on the other hand there is this sentence, p8: > "If the variable of interest is not a factor, the log2 fold change can > be interpreted as the amount of doubling observed on average for every > unit of change." > > Many thanks for you help > > With kind regards, > > Marie > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 10.6 years ago Jose M Garcia Manteiga ▴ 310

0

Entering edit mode

Jose M Garcia Manteiga ▴ 310

@jose-m-garcia-manteiga-6046

Last seen 7.1 years ago

Italy

> Dear all, > I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. > thanks in advance > Regards > > Jose > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 ------------------------------------------------------------ Jose M. Garcia Manteiga PhD Data analyst in Functional Genomics Center for Translational Genomics and Bioinformatics DIBIT2-A3 Room 21 San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439114 > > On Sep 11, 2013, at 11:45 AM, Marie Sémon <marie.semon@ens-lyon.fr> wrote: > >> Dear all, >> >> I wish to find genes whose expression profile correlates with a variable >> of interest, which is quantitative (say, the weight of the individuals). >> Is it possible/a good practice to use DESeq2 for this? >> >> One one hand the vignette clearly describes factors (qualitative) >> variables, but, on the other hand there is this sentence, p8: >> "If the variable of interest is not a factor, the log2 fold change can >> be interpreted as the amount of doubling observed on average for every >> unit of change." >> >> Many thanks for you help >> >> With kind regards, >> >> Marie >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.6 years ago Jose M Garcia Manteiga ▴ 310

0

Entering edit mode

hi Jose, As far as using other techniques on count data, we recommend first using the variance stabilizing transformation or the rlog transformation (if the sequencing depth varies greatly between samples) on the count table first (for more details see the manual pages for varianceStabilizingTransformation or rlogTransformation, or the vignette for DESeq2). The transformed data, with more constant variance throughout the dynamic range, is more appropriate for down-stream methods which assume independent and identically distributed errors. Mike On Wed, Sep 11, 2013 at 1:31 PM, Jose M Garcia Manteiga < garciamanteiga.josemanuel@hsr.it> wrote: > > Dear all, > > I take advantage to add another question to statisticians, since many > times I have had the same question in mind, for microarray but also for > RNA-Seq. How about a PLS (partial least square - projection of latent > variables) kind of multivariate regression to find groups of variables > (Latent variables) which correlate to a continuous variable? Is there any > caveat in using it with reads-count table? There are some papers out there > that try to use PLS, and its "factor" version PLS-DA to select variables > with transcriptomics data but I would like to know people in the field's > opinions on that. > > thanks in advance > > Regards > > > > Jose > > ------------------------------------------------------------ > > Jose M. Garcia Manteiga PhD > > > > Data analyst in Functional Genomics > > Center for Translational Genomics and Bioinformatics > > DIBIT2-A3 Room 21 > > San Raffaele Scientific Institute > > Via Olgettina 58 > > 20132 Milano > > Italy > > > > Office: +39 02 26439114 > > > > > > > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > > > > > > > > On Sep 11, 2013, at 11:45 AM, Marie SÃ©mon <marie.semon@ens- lyon.fr=""> > wrote: > > > >> Dear all, > >> > >> I wish to find genes whose expression profile correlates with a variable > >> of interest, which is quantitative (say, the weight of the individuals). > >> Is it possible/a good practice to use DESeq2 for this? > >> > >> One one hand the vignette clearly describes factors (qualitative) > >> variables, but, on the other hand there is this sentence, p8: > >> "If the variable of interest is not a factor, the log2 fold change can > >> be interpreted as the amount of doubling observed on average for every > >> unit of change." > >> > >> Many thanks for you help > >> > >> With kind regards, > >> > >> Marie > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Michael Love 41k

0

Entering edit mode

Thanks Michael, On Sep 11, 2013, at 2:14 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: > hi Jose, > > As far as using other techniques on count data, we recommend first using the variance stabilizing transformation or the rlog transformation (if the sequencing depth varies greatly between samples) on the count table first (for more details see the manual pages for varianceStabilizingTransformation or rlogTransformation, or the vignette for DESeq2). The transformed data, with more constant variance throughout the dynamic range, is more appropriate for down- stream methods which assume independent and identically distributed errors. > > Mike I guessed that because for PCA is what is stated in the vignette and makes perfect sense. Thank you I read your reply saying that you can add a continuous variable in the design formula. If then the p-value is associated with the hypothesis that the coefficient is different to 0, how the coefficient would inform you of the 'slope' of the correlation? How would it be related to an 'r-like regression coefficient, for instance? I guess what we wish is a parameter that measures for each gene strength of correlation (like the r from -1 to 1) and then some kind of 'goodness' of the correlation. Could this be coef and p-values in Deseq2? Thanks again Jose ------------------------------------------------------------ Jose M. Garcia Manteiga PhD Data analyst in Functional Genomics Center for Translational Genomics and Bioinformatics DIBIT2-A3 Room 21 San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439114 > > On Wed, Sep 11, 2013 at 1:31 PM, Jose M Garcia Manteiga <garciamanteiga.josemanuel@hsr.it> wrote: > > Dear all, > > I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. > > thanks in advance > > Regards > > > > Jose > > ------------------------------------------------------------ > > Jose M. Garcia Manteiga PhD > > > > Data analyst in Functional Genomics > > Center for Translational Genomics and Bioinformatics > > DIBIT2-A3 Room 21 > > San Raffaele Scientific Institute > > Via Olgettina 58 > > 20132 Milano > > Italy > > > > Office: +39 02 26439114 > > > > > > > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > > > > > > > > On Sep 11, 2013, at 11:45 AM, Marie Sémon <marie.semon@ens- lyon.fr=""> wrote: > > > >> Dear all, > >> > >> I wish to find genes whose expression profile correlates with a variable > >> of interest, which is quantitative (say, the weight of the individuals). > >> Is it possible/a good practice to use DESeq2 for this? > >> > >> One one hand the vignette clearly describes factors (qualitative) > >> variables, but, on the other hand there is this sentence, p8: > >> "If the variable of interest is not a factor, the log2 fold change can > >> be interpreted as the amount of doubling observed on average for every > >> unit of change." > >> > >> Many thanks for you help > >> > >> With kind regards, > >> > >> Marie > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Jose M Garcia Manteiga ▴ 310

0

Entering edit mode

hi Jose, There is not a clear analogue for correlations of real valued data in the framework of count-based GLMs. Positive coefficients beta (log2 fold changes) as given in the GLM formula in the vignette indicate multiplicative increases in expected counts, and negative coefficients indicate multiplicative decreases in expected counts. It wouldn't make sense for these effect sizes to be on a bounded range like [-1,1]. If you want to do some other kinds of statistical analysis on the counts, then we offer the transformations as a recommended first step. After transformation the data is on the log2 scale of counts. Mike On Wed, Sep 11, 2013 at 2:45 PM, Jose M Garcia Manteiga < garciamanteiga.josemanuel@hsr.it> wrote: > Thanks Michael, > > > On Sep 11, 2013, at 2:14 PM, Michael Love <michaelisaiahlove@gmail.com> > wrote: > > hi Jose, > > As far as using other techniques on count data, we recommend first using > the variance stabilizing transformation or the rlog transformation (if the > sequencing depth varies greatly between samples) on the count table first > (for more details see the manual pages for > varianceStabilizingTransformation or rlogTransformation, or the vignette > for DESeq2). The transformed data, with more constant variance throughout > the dynamic range, is more appropriate for down-stream methods which assume > independent and identically distributed errors. > > Mike > > > > I guessed that because for PCA is what is stated in the vignette and makes > perfect sense. Thank you > > I read your reply saying that you can add a continuous variable in the > design formula. If then the p-value is associated with the hypothesis that > the coefficient is different to 0, how the coefficient would inform you of > the 'slope' of the correlation? How would it be related to an 'r-like > regression coefficient, for instance? I guess what we wish is a parameter > that measures for each gene strength of correlation (like the r from -1 to > 1) and then some kind of 'goodness' of the correlation. Could this be coef > and p-values in Deseq2? > Thanks again > Jose > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > On Wed, Sep 11, 2013 at 1:31 PM, Jose M Garcia Manteiga < > garciamanteiga.josemanuel@hsr.it> wrote: > >> > Dear all, >> > I take advantage to add another question to statisticians, since many >> times I have had the same question in mind, for microarray but also for >> RNA-Seq. How about a PLS (partial least square - projection of latent >> variables) kind of multivariate regression to find groups of variables >> (Latent variables) which correlate to a continuous variable? Is there any >> caveat in using it with reads-count table? There are some papers out there >> that try to use PLS, and its "factor" version PLS-DA to select variables >> with transcriptomics data but I would like to know people in the field's >> opinions on that. >> > thanks in advance >> > Regards >> > >> > Jose >> > ------------------------------------------------------------ >> > Jose M. Garcia Manteiga PhD >> > >> > Data analyst in Functional Genomics >> > Center for Translational Genomics and Bioinformatics >> > DIBIT2-A3 Room 21 >> > San Raffaele Scientific Institute >> > Via Olgettina 58 >> > 20132 Milano >> > Italy >> > >> > Office: +39 02 26439114 >> >> >> >> >> >> >> >> ------------------------------------------------------------ >> Jose M. Garcia Manteiga PhD >> >> Data analyst in Functional Genomics >> Center for Translational Genomics and Bioinformatics >> DIBIT2-A3 Room 21 >> San Raffaele Scientific Institute >> Via Olgettina 58 >> 20132 Milano >> Italy >> >> Office: +39 02 26439114 >> >> >> >> >> >> >> > >> > On Sep 11, 2013, at 11:45 AM, Marie SÃ©mon <marie.semon@ens- lyon.fr=""> >> wrote: >> > >> >> Dear all, >> >> >> >> I wish to find genes whose expression profile correlates with a >> variable >> >> of interest, which is quantitative (say, the weight of the >> individuals). >> >> Is it possible/a good practice to use DESeq2 for this? >> >> >> >> One one hand the vignette clearly describes factors (qualitative) >> >> variables, but, on the other hand there is this sentence, p8: >> >> "If the variable of interest is not a factor, the log2 fold change can >> >> be interpreted as the amount of doubling observed on average for every >> >> unit of change." >> >> >> >> Many thanks for you help >> >> >> >> With kind regards, >> >> >> >> Marie >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Michael Love 41k

0

Entering edit mode

Try looking at independent component analysis, which is like PCA of qualitative data. However, multivariate methods for MDS type methods and many other scales (nominal, continuous, etc) were developed quite deeply by Jan Deleeuw and colleagues many years ago. A software package called LISREL for structural equation modeling (sem) is yet another approach which may have been expanded to cover hidden constructs for categorical factors. Sent from my iPhone On Sep 11, 2013, at 6:32 AM, "Jose M Garcia Manteiga" <garciamanteiga.josemanuel at="" hsr.it=""> wrote: >> Dear all, >> I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. >> thanks in advance >> Regards >> >> Jose >> ------------------------------------------------------------ >> Jose M. Garcia Manteiga PhD >> >> Data analyst in Functional Genomics >> Center for Translational Genomics and Bioinformatics >> DIBIT2-A3 Room 21 >> San Raffaele Scientific Institute >> Via Olgettina 58 >> 20132 Milano >> Italy >> >> Office: +39 02 26439114 > > > > > > > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > > > > >> >> On Sep 11, 2013, at 11:45 AM, Marie S?mon <marie.semon at="" ens-="" lyon.fr=""> wrote: >> >>> Dear all, >>> >>> I wish to find genes whose expression profile correlates with a variable >>> of interest, which is quantitative (say, the weight of the individuals). >>> Is it possible/a good practice to use DESeq2 for this? >>> >>> One one hand the vignette clearly describes factors (qualitative) >>> variables, but, on the other hand there is this sentence, p8: >>> "If the variable of interest is not a factor, the log2 fold change can >>> be interpreted as the amount of doubling observed on average for every >>> unit of change." >>> >>> Many thanks for you help >>> >>> With kind regards, >>> >>> Marie >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Houston Methodist. Leading Medicine. Ranked by U.S.News & World Report as one of America's "Best Hospitals" in 13 specialties. Named to FORTUNE? Magazine's "100 Best Companies to Work For?" list eight years in a row. Designated as a Magnet hospital for excellence in nursing. Visit us at houstonmethodist.org. Follow us at twitter.com/MethodistHosp and www.facebook.com/HoustonMethodist ***CONFIDENTIALITY NOTICE*** This e-mail is the property of Houston Methodist Hospital and/or its relevant affiliates and may contain restricted and privileged material for the sole use of the intended recipient(s). Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender and delete all copies of the message. Thank you.

ADD REPLY • link 10.6 years ago lep ▴ 90

Login before adding your answer.