Question: RNA-seq differentially expressed gene finding methods
0
gravatar for Son Pham
5.0 years ago by
Son Pham40
Son Pham40 wrote:
Dear all, I know that we have quite very good packages (edgeR, deseq) that calculate the list of differentially expressed genes in 2 conditions (with replicates) from raw counts. But I do not know what is wrong with the following simple approach (and whether other people have been using it): 1. Get the (estimated) tpm/fpkm for each gene in each sample 2. Do a t-test for two groups on each gene. 3. Adjust the p value for multiple tests (p-adj) Thanks, Son. [[alternative HTML version deleted]]
• 1.8k views
ADD COMMENTlink modified 5.0 years ago by Gordon Smyth38k • written 5.0 years ago by Son Pham40
Answer: RNA-seq differentially expressed gene finding methods
0
gravatar for Son Pham
5.0 years ago by
Son Pham40
Son Pham40 wrote:
Thank you Richard, Devon and Paul for very insight answers. I completely agree that the approach I raised above is inappropriate when the group size is small (3, 4...). But when the group size is large enough ( > 20 or 30), the sampling distribution of the mean will be (closed to) normally distributed, and that is why I believe that the t-test is ok. -Son. On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Hi Son, > > My understanding is that the approach you describe could be considered > valid for large enough numbers of samples, however, RNA-seq > experiments will typically have smaller numbers (<30) samples per > condition, meaning that a t-test is not valid (because RNA-seq data > isn't normally distributed). However, while I don't think that a > t-test is "invalid" given enough samples, its very difficult to > justify using such a method when much better powered methods have been > invented specifically for this type of data. > > Paul > > On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman > <friedman at="" c2b2.columbia.edu=""> wrote: > > Dear Son, > > > > The t-test assumes a normal distribution, > > which is appropriate for continous variables. RNAseq > > data deals with counts (discrete entities). A negative binomial > distribution > > (EdgeR, Deseq) or a mean dependent variance (VOOM) > > is much more approriate. Also the 3 methods mentioned > > above estimate variablity better with information from all genes > > using empirical Bayesian methods, than does the one-gene > > at-a-time frequentist t-test. > > > > Best wishes, > > Rich > > Richard A. Friedman, PhD > > Associate Research Scientist, > > Biomedical Informatics Shared Resource > > Herbert Irving Comprehensive Cancer Center (HICCC) > > Lecturer, > > Department of Biomedical Informatics (DBMI) > > Educational Coordinator, > > Center for Computational Biology and Bioinformatics (C2B2)/ > > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > > Columbia Department of Systems Biology > > Room 824 > > Irving Cancer Research Center > > Columbia University > > 1130 St. Nicholas Ave > > New York, NY 10032 > > (212)851-4765 (voice) > > friedman at c2b2.columbia.edu > > http://friedman.c2b2.columbia.edu/ > > > > "There is nothing in my Contemporary Jewish Literature course that is > > either contemporary, Jewish, or literature". > > > > -Rose Friedman, age 17 > > > > > > On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > > > >> Dear all, > >> I know that we have quite very good packages (edgeR, deseq) that > calculate > >> the list of differentially expressed genes in 2 conditions (with > >> replicates) from raw counts. But I do not know what is wrong with the > >> following simple approach (and whether other people have been using it): > >> > >> 1. Get the (estimated) tpm/fpkm for each gene in each sample > >> 2. Do a t-test for two groups on each gene. > >> 3. Adjust the p value for multiple tests (p-adj) > >> > >> > >> Thanks, > >> > >> Son. > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Dr. Paul Geeleher, PhD > Section of Hematology-Oncology > Department of Medicine > The University of Chicago > 900 E. 57th St., > KCBD, Room 7144 > Chicago, IL 60637 > -- > www.bioinformaticstutorials.com > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.0 years ago by Son Pham40
Son of course you are right. Here?s an excerpt of our 2010 Genome Biology paper: Conclusions Why is it necessary to develop new statistical methodology for sequence count data? If large numbers of replicates were available, questions of data distribution could be avoided by using non- parametric methods, such as rank-based or permutation tests. However, it is desirable (and possible) to consider experiments with smaller numbers of replicates per condition. In order to compare an observed difference with an expected random variation, we can improve our picture of the latter in two ways: first, we can use distribution families, such as normal, Poisson and negative binomial distributions, in order to determine the higher moments, and hence the tail behavior, of statistics for differential expression, based on observed low order moments such as mean and variance. Second, we can share information, for instance, distributional parameters, between genes, based on the notion that data from different genes follow similar patterns of variability. Here, we have described an instance of such an approach, ... Btw, t-test can be perfectly ?valid? even if the data are non-Normal, in particular, when they are fatter. The test then just looses power, sometimes badly so. I find it odd that so many people worry about that so much. Correlations between samples (e.g. ?batch effects?) are much more problematic. Best wishes Wolfgang Il giorno 05 Sep 2014, alle ore 19:31, Son Pham <spham at="" salk.edu=""> ha scritto: > Thank you Richard, Devon and Paul for very insight answers. > I completely agree that the approach I raised above is inappropriate when > the group size is small (3, 4...). > But when the group size is large enough ( > 20 or 30), the sampling > distribution of the mean will be (closed to) normally distributed, and that > is why I believe that the t-test is ok. > > > -Son. > > > > > On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: > >> Hi Son, >> >> My understanding is that the approach you describe could be considered >> valid for large enough numbers of samples, however, RNA-seq >> experiments will typically have smaller numbers (<30) samples per >> condition, meaning that a t-test is not valid (because RNA-seq data >> isn't normally distributed). However, while I don't think that a >> t-test is "invalid" given enough samples, its very difficult to >> justify using such a method when much better powered methods have been >> invented specifically for this type of data. >> >> Paul >> >> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >> <friedman at="" c2b2.columbia.edu=""> wrote: >>> Dear Son, >>> >>> The t-test assumes a normal distribution, >>> which is appropriate for continous variables. RNAseq >>> data deals with counts (discrete entities). A negative binomial >> distribution >>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>> is much more approriate. Also the 3 methods mentioned >>> above estimate variablity better with information from all genes >>> using empirical Bayesian methods, than does the one-gene >>> at-a-time frequentist t-test. >>> >>> Best wishes, >>> Rich >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>> Columbia Department of Systems Biology >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at c2b2.columbia.edu >>> http://friedman.c2b2.columbia.edu/ >>> >>> "There is nothing in my Contemporary Jewish Literature course that is >>> either contemporary, Jewish, or literature". >>> >>> -Rose Friedman, age 17 >>> >>> >>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>> >>>> Dear all, >>>> I know that we have quite very good packages (edgeR, deseq) that >> calculate >>>> the list of differentially expressed genes in 2 conditions (with >>>> replicates) from raw counts. But I do not know what is wrong with the >>>> following simple approach (and whether other people have been using it): >>>> >>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>> 2. Do a t-test for two groups on each gene. >>>> 3. Adjust the p value for multiple tests (p-adj) >>>> >>>> >>>> Thanks, >>>> >>>> Son. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Dr. Paul Geeleher, PhD >> Section of Hematology-Oncology >> Department of Medicine >> The University of Chicago >> 900 E. 57th St., >> KCBD, Room 7144 >> Chicago, IL 60637 >> -- >> www.bioinformaticstutorials.com >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 5.0 years ago by Wolfgang Huber13k
Answer: RNA-seq differentially expressed gene finding methods
0
gravatar for Richard Friedman
5.0 years ago by
Richard Friedman80 wrote:
Dear Son, The t-test assumes a normal distribution, which is appropriate for continous variables. RNAseq data deals with counts (discrete entities). A negative binomial distribution (EdgeR, Deseq) or a mean dependent variance (VOOM) is much more approriate. Also the 3 methods mentioned above estimate variablity better with information from all genes using empirical Bayesian methods, than does the one-gene at-a-time frequentist t-test. Best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Department of Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at c2b2.columbia.edu http://friedman.c2b2.columbia.edu/ "There is nothing in my Contemporary Jewish Literature course that is either contemporary, Jewish, or literature". -Rose Friedman, age 17 On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > Dear all, > I know that we have quite very good packages (edgeR, deseq) that calculate > the list of differentially expressed genes in 2 conditions (with > replicates) from raw counts. But I do not know what is wrong with the > following simple approach (and whether other people have been using it): > > 1. Get the (estimated) tpm/fpkm for each gene in each sample > 2. Do a t-test for two groups on each gene. > 3. Adjust the p value for multiple tests (p-adj) > > > Thanks, > > Son. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 5.0 years ago by Richard Friedman80
Hi Son, My understanding is that the approach you describe could be considered valid for large enough numbers of samples, however, RNA-seq experiments will typically have smaller numbers (<30) samples per condition, meaning that a t-test is not valid (because RNA-seq data isn't normally distributed). However, while I don't think that a t-test is "invalid" given enough samples, its very difficult to justify using such a method when much better powered methods have been invented specifically for this type of data. Paul On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman <friedman at="" c2b2.columbia.edu=""> wrote: > Dear Son, > > The t-test assumes a normal distribution, > which is appropriate for continous variables. RNAseq > data deals with counts (discrete entities). A negative binomial distribution > (EdgeR, Deseq) or a mean dependent variance (VOOM) > is much more approriate. Also the 3 methods mentioned > above estimate variablity better with information from all genes > using empirical Bayesian methods, than does the one-gene > at-a-time frequentist t-test. > > Best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Department of Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at c2b2.columbia.edu > http://friedman.c2b2.columbia.edu/ > > "There is nothing in my Contemporary Jewish Literature course that is > either contemporary, Jewish, or literature". > > -Rose Friedman, age 17 > > > On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > >> Dear all, >> I know that we have quite very good packages (edgeR, deseq) that calculate >> the list of differentially expressed genes in 2 conditions (with >> replicates) from raw counts. But I do not know what is wrong with the >> following simple approach (and whether other people have been using it): >> >> 1. Get the (estimated) tpm/fpkm for each gene in each sample >> 2. Do a t-test for two groups on each gene. >> 3. Adjust the p value for multiple tests (p-adj) >> >> >> Thanks, >> >> Son. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Dr. Paul Geeleher, PhD Section of Hematology-Oncology Department of Medicine The University of Chicago 900 E. 57th St., KCBD, Room 7144 Chicago, IL 60637 -- www.bioinformaticstutorials.com
ADD REPLYlink written 5.0 years ago by Paul Geeleher1.3k
Answer: RNA-seq differentially expressed gene finding methods
0
gravatar for Devon Ryan
5.0 years ago by
Devon Ryan200
Germany
Devon Ryan200 wrote:
N.B., I forgot to CC the list originally. Hi Son, To add a bit to Richard's response, there's also the issue that conversion to FPKM/RPKM/TPM loses precision information. For example, suppose two samples in a group produce values of 1.0 and 1.2 for some gene (these can be any of the aforementioned metrics). It's rarely the case that the number of mapped reads (or even those aligning to genes) is constant across samples, so it's quite likely that one of those numbers was derived from more data than the other, meaning that we'd like to weight estimates of the group measure toward it. That'd be impossible with only FPKM/etc. values, since we lose this information. Best, Devon ____________________________________________ Devon Ryan, Ph.D. Email: dpryan at dpryan.com Tel: +49 (0)178 298-6067 Molecular and Cellular Cognition Lab German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn, Germany On Sep 5, 2014, at 6:44 PM, Son Pham wrote: > Dear all, > I know that we have quite very good packages (edgeR, deseq) that calculate > the list of differentially expressed genes in 2 conditions (with > replicates) from raw counts. But I do not know what is wrong with the > following simple approach (and whether other people have been using it): > > 1. Get the (estimated) tpm/fpkm for each gene in each sample > 2. Do a t-test for two groups on each gene. > 3. Adjust the p value for multiple tests (p-adj) > > > Thanks, > > Son. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 5.0 years ago by Devon Ryan200
Answer: RNA-seq differentially expressed gene finding methods
0
gravatar for Gordon Smyth
5.0 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:
Dear Son, The problem has little to do with normality or group size and more to do with the fact that fpkm values can have very different variances depending on the size of the original count. The creates a problem for the t-test which assumes equal variances. See the voom paper for discussion of this: http://genomebiology.com/2014/15/2/R29 Best wishes Gordon > Date: Fri, 5 Sep 2014 10:31:25 -0700 > From: Son Pham <spham at="" salk.edu=""> > To: Paul Geeleher <paulgeeleher at="" gmail.com=""> > Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] RNA-seq differentially expressed gene finding > methods > > Thank you Richard, Devon and Paul for very insight answers. > I completely agree that the approach I raised above is inappropriate when > the group size is small (3, 4...). > But when the group size is large enough ( > 20 or 30), the sampling > distribution of the mean will be (closed to) normally distributed, and that > is why I believe that the t-test is ok. > > > -Son. > > > > > On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: > >> Hi Son, >> >> My understanding is that the approach you describe could be considered >> valid for large enough numbers of samples, however, RNA-seq >> experiments will typically have smaller numbers (<30) samples per >> condition, meaning that a t-test is not valid (because RNA-seq data >> isn't normally distributed). However, while I don't think that a >> t-test is "invalid" given enough samples, its very difficult to >> justify using such a method when much better powered methods have been >> invented specifically for this type of data. >> >> Paul >> >> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >> <friedman at="" c2b2.columbia.edu=""> wrote: >>> Dear Son, >>> >>> The t-test assumes a normal distribution, >>> which is appropriate for continous variables. RNAseq >>> data deals with counts (discrete entities). A negative binomial >> distribution >>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>> is much more approriate. Also the 3 methods mentioned >>> above estimate variablity better with information from all genes >>> using empirical Bayesian methods, than does the one-gene >>> at-a-time frequentist t-test. >>> >>> Best wishes, >>> Rich >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>> Columbia Department of Systems Biology >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at c2b2.columbia.edu >>> http://friedman.c2b2.columbia.edu/ >>> >>> "There is nothing in my Contemporary Jewish Literature course that is >>> either contemporary, Jewish, or literature". >>> >>> -Rose Friedman, age 17 >>> >>> >>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>> >>>> Dear all, >>>> I know that we have quite very good packages (edgeR, deseq) that >> calculate >>>> the list of differentially expressed genes in 2 conditions (with >>>> replicates) from raw counts. But I do not know what is wrong with the >>>> following simple approach (and whether other people have been using it): >>>> >>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>> 2. Do a t-test for two groups on each gene. >>>> 3. Adjust the p value for multiple tests (p-adj) >>>> >>>> >>>> Thanks, >>>> >>>> Son. >>>> >> >> >> -- >> Dr. Paul Geeleher, PhD >> Section of Hematology-Oncology >> Department of Medicine >> The University of Chicago >> 900 E. 57th St., >> KCBD, Room 7144 >> Chicago, IL 60637 >> -- >> www.bioinformaticstutorials.com ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENTlink written 5.0 years ago by Gordon Smyth38k
For previous discussion on this list see https://stat.ethz.ch/pipermail/bioconductor/2013-May/052802.html This and the voom paper discuss what one needs to do to make t-tests work well in the RNA-seq context. Gordon On Sun, 7 Sep 2014, Gordon K Smyth wrote: > Dear Son, > > The problem has little to do with normality or group size and more to do with > the fact that fpkm values can have very different variances depending on the > size of the original count. The creates a problem for the t-test which > assumes equal variances. > > See the voom paper for discussion of this: > > http://genomebiology.com/2014/15/2/R29 > > Best wishes > Gordon > >> Date: Fri, 5 Sep 2014 10:31:25 -0700 >> From: Son Pham <spham at="" salk.edu=""> >> To: Paul Geeleher <paulgeeleher at="" gmail.com=""> >> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] RNA-seq differentially expressed gene finding >> methods >> >> Thank you Richard, Devon and Paul for very insight answers. >> I completely agree that the approach I raised above is inappropriate when >> the group size is small (3, 4...). >> But when the group size is large enough ( > 20 or 30), the sampling >> distribution of the mean will be (closed to) normally distributed, and that >> is why I believe that the t-test is ok. >> >> >> -Son. >> >> >> >> >> On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> >> wrote: >> >>> Hi Son, >>> >>> My understanding is that the approach you describe could be considered >>> valid for large enough numbers of samples, however, RNA-seq >>> experiments will typically have smaller numbers (<30) samples per >>> condition, meaning that a t-test is not valid (because RNA-seq data >>> isn't normally distributed). However, while I don't think that a >>> t-test is "invalid" given enough samples, its very difficult to >>> justify using such a method when much better powered methods have been >>> invented specifically for this type of data. >>> >>> Paul >>> >>> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >>> <friedman at="" c2b2.columbia.edu=""> wrote: >>>> Dear Son, >>>> >>>> The t-test assumes a normal distribution, >>>> which is appropriate for continous variables. RNAseq >>>> data deals with counts (discrete entities). A negative binomial >>> distribution >>>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>>> is much more approriate. Also the 3 methods mentioned >>>> above estimate variablity better with information from all genes >>>> using empirical Bayesian methods, than does the one-gene >>>> at-a-time frequentist t-test. >>>> >>>> Best wishes, >>>> Rich >>>> Richard A. Friedman, PhD >>>> Associate Research Scientist, >>>> Biomedical Informatics Shared Resource >>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>> Lecturer, >>>> Department of Biomedical Informatics (DBMI) >>>> Educational Coordinator, >>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>>> Columbia Department of Systems Biology >>>> Room 824 >>>> Irving Cancer Research Center >>>> Columbia University >>>> 1130 St. Nicholas Ave >>>> New York, NY 10032 >>>> (212)851-4765 (voice) >>>> friedman at c2b2.columbia.edu >>>> http://friedman.c2b2.columbia.edu/ >>>> >>>> "There is nothing in my Contemporary Jewish Literature course that is >>>> either contemporary, Jewish, or literature". >>>> >>>> -Rose Friedman, age 17 >>>> >>>> >>>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>>> >>>>> Dear all, >>>>> I know that we have quite very good packages (edgeR, deseq) that >>> calculate >>>>> the list of differentially expressed genes in 2 conditions (with >>>>> replicates) from raw counts. But I do not know what is wrong with the >>>>> following simple approach (and whether other people have been using it): >>>>> >>>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>>> 2. Do a t-test for two groups on each gene. >>>>> 3. Adjust the p value for multiple tests (p-adj) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Son. >>>>> >>> >>> >>> -- >>> Dr. Paul Geeleher, PhD >>> Section of Hematology-Oncology >>> Department of Medicine >>> The University of Chicago >>> 900 E. 57th St., >>> KCBD, Room 7144 >>> Chicago, IL 60637 >>> -- >>> www.bioinformaticstutorials.com > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLYlink written 5.0 years ago by Gordon Smyth38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 256 users visited in the last hour