Search
Question: DESeq2 : when the variable of interest is not a factor
0
5.1 years ago by
Marie Sémon50
Marie Sémon50 wrote:
Dear all, I wish to find genes whose expression profile correlates with a variable of interest, which is quantitative (say, the weight of the individuals). Is it possible/a good practice to use DESeq2 for this? One one hand the vignette clearly describes factors (qualitative) variables, but, on the other hand there is this sentence, p8: "If the variable of interest is not a factor, the log2 fold change can be interpreted as the amount of doubling observed on average for every unit of change." Many thanks for you help With kind regards, Marie
modified 5.1 years ago by Jose M Garcia Manteiga310 • written 5.1 years ago by Marie Sémon50
0
5.1 years ago by
Michael Love19k
United States
Michael Love19k wrote:
hi Marie, Yes, you can use quantitative / continuous valued variables in the design formula just as you would use a factor variable. The statistical test is the equivalently of the null hypothesis that the coefficient (log2 fold change) associated with the variable is equal to 0. Mike On Wed, Sep 11, 2013 at 11:45 AM, Marie SÃ©mon <marie.semon@ens- lyon.fr="">wrote: > Dear all, > > I wish to find genes whose expression profile correlates with a variable > of interest, which is quantitative (say, the weight of the individuals). Is > it possible/a good practice to use DESeq2 for this? > > One one hand the vignette clearly describes factors (qualitative) > variables, but, on the other hand there is this sentence, p8: > "If the variable of interest is not a factor, the log2 fold change can be > interpreted as the amount of doubling observed on average for every unit of > change." > > Many thanks for you help > > With kind regards, > > Marie > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
0
5.1 years ago by
Italy
Jose M Garcia Manteiga310 wrote:
Dear all, I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. thanks in advance Regards Jose ------------------------------------------------------------ Jose M. Garcia Manteiga PhD Data analyst in Functional Genomics Center for Translational Genomics and Bioinformatics DIBIT2-A3 Room 21 San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439114 On Sep 11, 2013, at 11:45 AM, Marie Sémon <marie.semon@ens-lyon.fr> wrote: > Dear all, > > I wish to find genes whose expression profile correlates with a variable > of interest, which is quantitative (say, the weight of the individuals). > Is it possible/a good practice to use DESeq2 for this? > > One one hand the vignette clearly describes factors (qualitative) > variables, but, on the other hand there is this sentence, p8: > "If the variable of interest is not a factor, the log2 fold change can > be interpreted as the amount of doubling observed on average for every > unit of change." > > Many thanks for you help > > With kind regards, > > Marie > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
0
5.1 years ago by
Italy
Jose M Garcia Manteiga310 wrote:
> Dear all, > I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. > thanks in advance > Regards > > Jose > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 ------------------------------------------------------------ Jose M. Garcia Manteiga PhD Data analyst in Functional Genomics Center for Translational Genomics and Bioinformatics DIBIT2-A3 Room 21 San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439114 > > On Sep 11, 2013, at 11:45 AM, Marie Sémon <marie.semon@ens-lyon.fr> wrote: > >> Dear all, >> >> I wish to find genes whose expression profile correlates with a variable >> of interest, which is quantitative (say, the weight of the individuals). >> Is it possible/a good practice to use DESeq2 for this? >> >> One one hand the vignette clearly describes factors (qualitative) >> variables, but, on the other hand there is this sentence, p8: >> "If the variable of interest is not a factor, the log2 fold change can >> be interpreted as the amount of doubling observed on average for every >> unit of change." >> >> Many thanks for you help >> >> With kind regards, >> >> Marie >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
hi Jose, As far as using other techniques on count data, we recommend first using the variance stabilizing transformation or the rlog transformation (if the sequencing depth varies greatly between samples) on the count table first (for more details see the manual pages for varianceStabilizingTransformation or rlogTransformation, or the vignette for DESeq2). The transformed data, with more constant variance throughout the dynamic range, is more appropriate for down-stream methods which assume independent and identically distributed errors. Mike On Wed, Sep 11, 2013 at 1:31 PM, Jose M Garcia Manteiga < garciamanteiga.josemanuel@hsr.it> wrote: > > Dear all, > > I take advantage to add another question to statisticians, since many > times I have had the same question in mind, for microarray but also for > RNA-Seq. How about a PLS (partial least square - projection of latent > variables) kind of multivariate regression to find groups of variables > (Latent variables) which correlate to a continuous variable? Is there any > caveat in using it with reads-count table? There are some papers out there > that try to use PLS, and its "factor" version PLS-DA to select variables > with transcriptomics data but I would like to know people in the field's > opinions on that. > > thanks in advance > > Regards > > > > Jose > > ------------------------------------------------------------ > > Jose M. Garcia Manteiga PhD > > > > Data analyst in Functional Genomics > > Center for Translational Genomics and Bioinformatics > > DIBIT2-A3 Room 21 > > San Raffaele Scientific Institute > > Via Olgettina 58 > > 20132 Milano > > Italy > > > > Office: +39 02 26439114 > > > > > > > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > > > > > > > > On Sep 11, 2013, at 11:45 AM, Marie SÃ©mon <marie.semon@ens- lyon.fr=""> > wrote: > > > >> Dear all, > >> > >> I wish to find genes whose expression profile correlates with a variable > >> of interest, which is quantitative (say, the weight of the individuals). > >> Is it possible/a good practice to use DESeq2 for this? > >> > >> One one hand the vignette clearly describes factors (qualitative) > >> variables, but, on the other hand there is this sentence, p8: > >> "If the variable of interest is not a factor, the log2 fold change can > >> be interpreted as the amount of doubling observed on average for every > >> unit of change." > >> > >> Many thanks for you help > >> > >> With kind regards, > >> > >> Marie > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
Thanks Michael, On Sep 11, 2013, at 2:14 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: > hi Jose, > > As far as using other techniques on count data, we recommend first using the variance stabilizing transformation or the rlog transformation (if the sequencing depth varies greatly between samples) on the count table first (for more details see the manual pages for varianceStabilizingTransformation or rlogTransformation, or the vignette for DESeq2). The transformed data, with more constant variance throughout the dynamic range, is more appropriate for down- stream methods which assume independent and identically distributed errors. > > Mike I guessed that because for PCA is what is stated in the vignette and makes perfect sense. Thank you I read your reply saying that you can add a continuous variable in the design formula. If then the p-value is associated with the hypothesis that the coefficient is different to 0, how the coefficient would inform you of the 'slope' of the correlation? How would it be related to an 'r-like regression coefficient, for instance? I guess what we wish is a parameter that measures for each gene strength of correlation (like the r from -1 to 1) and then some kind of 'goodness' of the correlation. Could this be coef and p-values in Deseq2? Thanks again Jose ------------------------------------------------------------ Jose M. Garcia Manteiga PhD Data analyst in Functional Genomics Center for Translational Genomics and Bioinformatics DIBIT2-A3 Room 21 San Raffaele Scientific Institute Via Olgettina 58 20132 Milano Italy Office: +39 02 26439114 > > On Wed, Sep 11, 2013 at 1:31 PM, Jose M Garcia Manteiga <garciamanteiga.josemanuel@hsr.it> wrote: > > Dear all, > > I take advantage to add another question to statisticians, since many times I have had the same question in mind, for microarray but also for RNA-Seq. How about a PLS (partial least square - projection of latent variables) kind of multivariate regression to find groups of variables (Latent variables) which correlate to a continuous variable? Is there any caveat in using it with reads-count table? There are some papers out there that try to use PLS, and its "factor" version PLS-DA to select variables with transcriptomics data but I would like to know people in the field's opinions on that. > > thanks in advance > > Regards > > > > Jose > > ------------------------------------------------------------ > > Jose M. Garcia Manteiga PhD > > > > Data analyst in Functional Genomics > > Center for Translational Genomics and Bioinformatics > > DIBIT2-A3 Room 21 > > San Raffaele Scientific Institute > > Via Olgettina 58 > > 20132 Milano > > Italy > > > > Office: +39 02 26439114 > > > > > > > > ------------------------------------------------------------ > Jose M. Garcia Manteiga PhD > > Data analyst in Functional Genomics > Center for Translational Genomics and Bioinformatics > DIBIT2-A3 Room 21 > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439114 > > > > > > > > > > On Sep 11, 2013, at 11:45 AM, Marie Sémon <marie.semon@ens- lyon.fr=""> wrote: > > > >> Dear all, > >> > >> I wish to find genes whose expression profile correlates with a variable > >> of interest, which is quantitative (say, the weight of the individuals). > >> Is it possible/a good practice to use DESeq2 for this? > >> > >> One one hand the vignette clearly describes factors (qualitative) > >> variables, but, on the other hand there is this sentence, p8: > >> "If the variable of interest is not a factor, the log2 fold change can > >> be interpreted as the amount of doubling observed on average for every > >> unit of change." > >> > >> Many thanks for you help > >> > >> With kind regards, > >> > >> Marie > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]