RNA-seq differentially expressed gene finding methods

0

Entering edit mode

Son Pham ▴ 40

@son-pham-6721

Last seen 9.6 years ago

Dear all, I know that we have quite very good packages (edgeR, deseq) that calculate the list of differentially expressed genes in 2 conditions (with replicates) from raw counts. But I do not know what is wrong with the following simple approach (and whether other people have been using it): 1. Get the (estimated) tpm/fpkm for each gene in each sample 2. Do a t-test for two groups on each gene. 3. Adjust the p value for multiple tests (p-adj) Thanks, Son. [[alternative HTML version deleted]]

• 3.6k views

ADD COMMENT • link updated 9.6 years ago by Gordon Smyth 50k • written 9.6 years ago by Son Pham ▴ 40

0

Entering edit mode

Son Pham ▴ 40

@son-pham-6721

Last seen 9.6 years ago

Thank you Richard, Devon and Paul for very insight answers. I completely agree that the approach I raised above is inappropriate when the group size is small (3, 4...). But when the group size is large enough ( > 20 or 30), the sampling distribution of the mean will be (closed to) normally distributed, and that is why I believe that the t-test is ok. -Son. On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Hi Son, > > My understanding is that the approach you describe could be considered > valid for large enough numbers of samples, however, RNA-seq > experiments will typically have smaller numbers (<30) samples per > condition, meaning that a t-test is not valid (because RNA-seq data > isn't normally distributed). However, while I don't think that a > t-test is "invalid" given enough samples, its very difficult to > justify using such a method when much better powered methods have been > invented specifically for this type of data. > > Paul > > On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman > <friedman at="" c2b2.columbia.edu=""> wrote: > > Dear Son, > > > > The t-test assumes a normal distribution, > > which is appropriate for continous variables. RNAseq > > data deals with counts (discrete entities). A negative binomial > distribution > > (EdgeR, Deseq) or a mean dependent variance (VOOM) > > is much more approriate. Also the 3 methods mentioned > > above estimate variablity better with information from all genes > > using empirical Bayesian methods, than does the one-gene > > at-a-time frequentist t-test. > > > > Best wishes, > > Rich > > Richard A. Friedman, PhD > > Associate Research Scientist, > > Biomedical Informatics Shared Resource > > Herbert Irving Comprehensive Cancer Center (HICCC) > > Lecturer, > > Department of Biomedical Informatics (DBMI) > > Educational Coordinator, > > Center for Computational Biology and Bioinformatics (C2B2)/ > > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > > Columbia Department of Systems Biology > > Room 824 > > Irving Cancer Research Center > > Columbia University > > 1130 St. Nicholas Ave > > New York, NY 10032 > > (212)851-4765 (voice) > > friedman at c2b2.columbia.edu > > http://friedman.c2b2.columbia.edu/ > > > > "There is nothing in my Contemporary Jewish Literature course that is > > either contemporary, Jewish, or literature". > > > > -Rose Friedman, age 17 > > > > > > On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > > > >> Dear all, > >> I know that we have quite very good packages (edgeR, deseq) that > calculate > >> the list of differentially expressed genes in 2 conditions (with > >> replicates) from raw counts. But I do not know what is wrong with the > >> following simple approach (and whether other people have been using it): > >> > >> 1. Get the (estimated) tpm/fpkm for each gene in each sample > >> 2. Do a t-test for two groups on each gene. > >> 3. Adjust the p value for multiple tests (p-adj) > >> > >> > >> Thanks, > >> > >> Son. > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Dr. Paul Geeleher, PhD > Section of Hematology-Oncology > Department of Medicine > The University of Chicago > 900 E. 57th St., > KCBD, Room 7144 > Chicago, IL 60637 > -- > www.bioinformaticstutorials.com > [[alternative HTML version deleted]]

ADD COMMENT • link 9.6 years ago Son Pham ▴ 40

0

Entering edit mode

Son of course you are right. Here?s an excerpt of our 2010 Genome Biology paper: Conclusions Why is it necessary to develop new statistical methodology for sequence count data? If large numbers of replicates were available, questions of data distribution could be avoided by using non- parametric methods, such as rank-based or permutation tests. However, it is desirable (and possible) to consider experiments with smaller numbers of replicates per condition. In order to compare an observed difference with an expected random variation, we can improve our picture of the latter in two ways: first, we can use distribution families, such as normal, Poisson and negative binomial distributions, in order to determine the higher moments, and hence the tail behavior, of statistics for differential expression, based on observed low order moments such as mean and variance. Second, we can share information, for instance, distributional parameters, between genes, based on the notion that data from different genes follow similar patterns of variability. Here, we have described an instance of such an approach, ... Btw, t-test can be perfectly ?valid? even if the data are non-Normal, in particular, when they are fatter. The test then just looses power, sometimes badly so. I find it odd that so many people worry about that so much. Correlations between samples (e.g. ?batch effects?) are much more problematic. Best wishes Wolfgang Il giorno 05 Sep 2014, alle ore 19:31, Son Pham <spham at="" salk.edu=""> ha scritto: > Thank you Richard, Devon and Paul for very insight answers. > I completely agree that the approach I raised above is inappropriate when > the group size is small (3, 4...). > But when the group size is large enough ( > 20 or 30), the sampling > distribution of the mean will be (closed to) normally distributed, and that > is why I believe that the t-test is ok. > > > -Son. > > > > > On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: > >> Hi Son, >> >> My understanding is that the approach you describe could be considered >> valid for large enough numbers of samples, however, RNA-seq >> experiments will typically have smaller numbers (<30) samples per >> condition, meaning that a t-test is not valid (because RNA-seq data >> isn't normally distributed). However, while I don't think that a >> t-test is "invalid" given enough samples, its very difficult to >> justify using such a method when much better powered methods have been >> invented specifically for this type of data. >> >> Paul >> >> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >> <friedman at="" c2b2.columbia.edu=""> wrote: >>> Dear Son, >>> >>> The t-test assumes a normal distribution, >>> which is appropriate for continous variables. RNAseq >>> data deals with counts (discrete entities). A negative binomial >> distribution >>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>> is much more approriate. Also the 3 methods mentioned >>> above estimate variablity better with information from all genes >>> using empirical Bayesian methods, than does the one-gene >>> at-a-time frequentist t-test. >>> >>> Best wishes, >>> Rich >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>> Columbia Department of Systems Biology >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at c2b2.columbia.edu >>> http://friedman.c2b2.columbia.edu/ >>> >>> "There is nothing in my Contemporary Jewish Literature course that is >>> either contemporary, Jewish, or literature". >>> >>> -Rose Friedman, age 17 >>> >>> >>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>> >>>> Dear all, >>>> I know that we have quite very good packages (edgeR, deseq) that >> calculate >>>> the list of differentially expressed genes in 2 conditions (with >>>> replicates) from raw counts. But I do not know what is wrong with the >>>> following simple approach (and whether other people have been using it): >>>> >>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>> 2. Do a t-test for two groups on each gene. >>>> 3. Adjust the p value for multiple tests (p-adj) >>>> >>>> >>>> Thanks, >>>> >>>> Son. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Dr. Paul Geeleher, PhD >> Section of Hematology-Oncology >> Department of Medicine >> The University of Chicago >> 900 E. 57th St., >> KCBD, Room 7144 >> Chicago, IL 60637 >> -- >> www.bioinformaticstutorials.com >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 9.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Richard Friedman ▴ 80

@richard-friedman-6273

Last seen 9.6 years ago

Dear Son, The t-test assumes a normal distribution, which is appropriate for continous variables. RNAseq data deals with counts (discrete entities). A negative binomial distribution (EdgeR, Deseq) or a mean dependent variance (VOOM) is much more approriate. Also the 3 methods mentioned above estimate variablity better with information from all genes using empirical Bayesian methods, than does the one-gene at-a-time frequentist t-test. Best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Department of Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at c2b2.columbia.edu http://friedman.c2b2.columbia.edu/ "There is nothing in my Contemporary Jewish Literature course that is either contemporary, Jewish, or literature". -Rose Friedman, age 17 On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > Dear all, > I know that we have quite very good packages (edgeR, deseq) that calculate > the list of differentially expressed genes in 2 conditions (with > replicates) from raw counts. But I do not know what is wrong with the > following simple approach (and whether other people have been using it): > > 1. Get the (estimated) tpm/fpkm for each gene in each sample > 2. Do a t-test for two groups on each gene. > 3. Adjust the p value for multiple tests (p-adj) > > > Thanks, > > Son. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 9.6 years ago Richard Friedman ▴ 80

0

Entering edit mode

Hi Son, My understanding is that the approach you describe could be considered valid for large enough numbers of samples, however, RNA-seq experiments will typically have smaller numbers (<30) samples per condition, meaning that a t-test is not valid (because RNA-seq data isn't normally distributed). However, while I don't think that a t-test is "invalid" given enough samples, its very difficult to justify using such a method when much better powered methods have been invented specifically for this type of data. Paul On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman <friedman at="" c2b2.columbia.edu=""> wrote: > Dear Son, > > The t-test assumes a normal distribution, > which is appropriate for continous variables. RNAseq > data deals with counts (discrete entities). A negative binomial distribution > (EdgeR, Deseq) or a mean dependent variance (VOOM) > is much more approriate. Also the 3 methods mentioned > above estimate variablity better with information from all genes > using empirical Bayesian methods, than does the one-gene > at-a-time frequentist t-test. > > Best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Department of Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at c2b2.columbia.edu > http://friedman.c2b2.columbia.edu/ > > "There is nothing in my Contemporary Jewish Literature course that is > either contemporary, Jewish, or literature". > > -Rose Friedman, age 17 > > > On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > >> Dear all, >> I know that we have quite very good packages (edgeR, deseq) that calculate >> the list of differentially expressed genes in 2 conditions (with >> replicates) from raw counts. But I do not know what is wrong with the >> following simple approach (and whether other people have been using it): >> >> 1. Get the (estimated) tpm/fpkm for each gene in each sample >> 2. Do a t-test for two groups on each gene. >> 3. Adjust the p value for multiple tests (p-adj) >> >> >> Thanks, >> >> Son. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Dr. Paul Geeleher, PhD Section of Hematology-Oncology Department of Medicine The University of Chicago 900 E. 57th St., KCBD, Room 7144 Chicago, IL 60637 -- www.bioinformaticstutorials.com

ADD REPLY • link 9.6 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Devon Ryan ▴ 200

@devon-ryan-6054

Last seen 8.3 years ago

Germany

N.B., I forgot to CC the list originally. Hi Son, To add a bit to Richard's response, there's also the issue that conversion to FPKM/RPKM/TPM loses precision information. For example, suppose two samples in a group produce values of 1.0 and 1.2 for some gene (these can be any of the aforementioned metrics). It's rarely the case that the number of mapped reads (or even those aligning to genes) is constant across samples, so it's quite likely that one of those numbers was derived from more data than the other, meaning that we'd like to weight estimates of the group measure toward it. That'd be impossible with only FPKM/etc. values, since we lose this information. Best, Devon ____________________________________________ Devon Ryan, Ph.D. Email: dpryan at dpryan.com Tel: +49 (0)178 298-6067 Molecular and Cellular Cognition Lab German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn, Germany On Sep 5, 2014, at 6:44 PM, Son Pham wrote: > Dear all, > I know that we have quite very good packages (edgeR, deseq) that calculate > the list of differentially expressed genes in 2 conditions (with > replicates) from raw counts. But I do not know what is wrong with the > following simple approach (and whether other people have been using it): > > 1. Get the (estimated) tpm/fpkm for each gene in each sample > 2. Do a t-test for two groups on each gene. > 3. Adjust the p value for multiple tests (p-adj) > > > Thanks, > > Son. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 9.6 years ago Devon Ryan ▴ 200

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Dear Son, The problem has little to do with normality or group size and more to do with the fact that fpkm values can have very different variances depending on the size of the original count. The creates a problem for the t-test which assumes equal variances. See the voom paper for discussion of this: http://genomebiology.com/2014/15/2/R29 Best wishes Gordon > Date: Fri, 5 Sep 2014 10:31:25 -0700 > From: Son Pham <spham at="" salk.edu=""> > To: Paul Geeleher <paulgeeleher at="" gmail.com=""> > Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] RNA-seq differentially expressed gene finding > methods > > Thank you Richard, Devon and Paul for very insight answers. > I completely agree that the approach I raised above is inappropriate when > the group size is small (3, 4...). > But when the group size is large enough ( > 20 or 30), the sampling > distribution of the mean will be (closed to) normally distributed, and that > is why I believe that the t-test is ok. > > > -Son. > > > > > On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: > >> Hi Son, >> >> My understanding is that the approach you describe could be considered >> valid for large enough numbers of samples, however, RNA-seq >> experiments will typically have smaller numbers (<30) samples per >> condition, meaning that a t-test is not valid (because RNA-seq data >> isn't normally distributed). However, while I don't think that a >> t-test is "invalid" given enough samples, its very difficult to >> justify using such a method when much better powered methods have been >> invented specifically for this type of data. >> >> Paul >> >> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >> <friedman at="" c2b2.columbia.edu=""> wrote: >>> Dear Son, >>> >>> The t-test assumes a normal distribution, >>> which is appropriate for continous variables. RNAseq >>> data deals with counts (discrete entities). A negative binomial >> distribution >>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>> is much more approriate. Also the 3 methods mentioned >>> above estimate variablity better with information from all genes >>> using empirical Bayesian methods, than does the one-gene >>> at-a-time frequentist t-test. >>> >>> Best wishes, >>> Rich >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>> Columbia Department of Systems Biology >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at c2b2.columbia.edu >>> http://friedman.c2b2.columbia.edu/ >>> >>> "There is nothing in my Contemporary Jewish Literature course that is >>> either contemporary, Jewish, or literature". >>> >>> -Rose Friedman, age 17 >>> >>> >>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>> >>>> Dear all, >>>> I know that we have quite very good packages (edgeR, deseq) that >> calculate >>>> the list of differentially expressed genes in 2 conditions (with >>>> replicates) from raw counts. But I do not know what is wrong with the >>>> following simple approach (and whether other people have been using it): >>>> >>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>> 2. Do a t-test for two groups on each gene. >>>> 3. Adjust the p value for multiple tests (p-adj) >>>> >>>> >>>> Thanks, >>>> >>>> Son. >>>> >> >> >> -- >> Dr. Paul Geeleher, PhD >> Section of Hematology-Oncology >> Department of Medicine >> The University of Chicago >> 900 E. 57th St., >> KCBD, Room 7144 >> Chicago, IL 60637 >> -- >> www.bioinformaticstutorials.com ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 9.6 years ago Gordon Smyth 50k

0

Entering edit mode

For previous discussion on this list see https://stat.ethz.ch/pipermail/bioconductor/2013-May/052802.html This and the voom paper discuss what one needs to do to make t-tests work well in the RNA-seq context. Gordon On Sun, 7 Sep 2014, Gordon K Smyth wrote: > Dear Son, > > The problem has little to do with normality or group size and more to do with > the fact that fpkm values can have very different variances depending on the > size of the original count. The creates a problem for the t-test which > assumes equal variances. > > See the voom paper for discussion of this: > > http://genomebiology.com/2014/15/2/R29 > > Best wishes > Gordon > >> Date: Fri, 5 Sep 2014 10:31:25 -0700 >> From: Son Pham <spham at="" salk.edu=""> >> To: Paul Geeleher <paulgeeleher at="" gmail.com=""> >> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] RNA-seq differentially expressed gene finding >> methods >> >> Thank you Richard, Devon and Paul for very insight answers. >> I completely agree that the approach I raised above is inappropriate when >> the group size is small (3, 4...). >> But when the group size is large enough ( > 20 or 30), the sampling >> distribution of the mean will be (closed to) normally distributed, and that >> is why I believe that the t-test is ok. >> >> >> -Son. >> >> >> >> >> On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> >> wrote: >> >>> Hi Son, >>> >>> My understanding is that the approach you describe could be considered >>> valid for large enough numbers of samples, however, RNA-seq >>> experiments will typically have smaller numbers (<30) samples per >>> condition, meaning that a t-test is not valid (because RNA-seq data >>> isn't normally distributed). However, while I don't think that a >>> t-test is "invalid" given enough samples, its very difficult to >>> justify using such a method when much better powered methods have been >>> invented specifically for this type of data. >>> >>> Paul >>> >>> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >>> <friedman at="" c2b2.columbia.edu=""> wrote: >>>> Dear Son, >>>> >>>> The t-test assumes a normal distribution, >>>> which is appropriate for continous variables. RNAseq >>>> data deals with counts (discrete entities). A negative binomial >>> distribution >>>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>>> is much more approriate. Also the 3 methods mentioned >>>> above estimate variablity better with information from all genes >>>> using empirical Bayesian methods, than does the one-gene >>>> at-a-time frequentist t-test. >>>> >>>> Best wishes, >>>> Rich >>>> Richard A. Friedman, PhD >>>> Associate Research Scientist, >>>> Biomedical Informatics Shared Resource >>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>> Lecturer, >>>> Department of Biomedical Informatics (DBMI) >>>> Educational Coordinator, >>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>>> Columbia Department of Systems Biology >>>> Room 824 >>>> Irving Cancer Research Center >>>> Columbia University >>>> 1130 St. Nicholas Ave >>>> New York, NY 10032 >>>> (212)851-4765 (voice) >>>> friedman at c2b2.columbia.edu >>>> http://friedman.c2b2.columbia.edu/ >>>> >>>> "There is nothing in my Contemporary Jewish Literature course that is >>>> either contemporary, Jewish, or literature". >>>> >>>> -Rose Friedman, age 17 >>>> >>>> >>>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>>> >>>>> Dear all, >>>>> I know that we have quite very good packages (edgeR, deseq) that >>> calculate >>>>> the list of differentially expressed genes in 2 conditions (with >>>>> replicates) from raw counts. But I do not know what is wrong with the >>>>> following simple approach (and whether other people have been using it): >>>>> >>>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>>> 2. Do a t-test for two groups on each gene. >>>>> 3. Adjust the p value for multiple tests (p-adj) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Son. >>>>> >>> >>> >>> -- >>> Dr. Paul Geeleher, PhD >>> Section of Hematology-Oncology >>> Department of Medicine >>> The University of Chicago >>> 900 E. 57th St., >>> KCBD, Room 7144 >>> Chicago, IL 60637 >>> -- >>> www.bioinformaticstutorials.com > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 9.6 years ago Gordon Smyth 50k

Login before adding your answer.