Question: identifying consistently expressed genes between replicates
0
8.3 years ago by
Wendy Qiao360
Wendy Qiao360 wrote:
Hi all, I am comparing a number of cell types, and am wanting to find the signature genes of each cell type. I used the limma package to do this. The signature genes of a given cell type are found by the fold different between the given cell type and grand mean of all the cell types, as well as the BH- adjusted p-values. I want to add another condition to test the consistency of expression levels of the selected genes for each cell type. I can do this by looking at the standard deviations of gene expressions between replicates. I am just wondering if there is any function in limma or other BioConductor package to do this. Thank you in advance, Wendy [[alternative HTML version deleted]]
limma • 556 views
modified 8.3 years ago by Gordon Smyth37k • written 8.3 years ago by Wendy Qiao360
Answer: identifying consistently expressed genes between replicates
0
8.3 years ago by
Florence Cavalli50 wrote:
Hi Wendy, Depending on your number of cell types, you can try the SpeCond package for your analysis. It allows to detect condition-specific gene expression, in your case cell-type specific gene expression. Let me know if you have any questions about it. Best, Florence 2011/4/10 Wendy Qiao <wendy2.qiao@gmail.com> > Hi all, > > I am comparing a number of cell types, and am wanting to find the signature > genes of each cell type. I used the limma package to do this. The signature > genes of a given cell type are found by the fold different between the > given > cell type and grand mean of all the cell types, as well as the BH- adjusted > p-values. I want to add another condition to test the consistency of > expression levels of the selected genes for each cell type. I can do this > by > looking at the standard deviations of gene expressions between replicates. > I > am just wondering if there is any function in limma or other BioConductor > package to do this. > > Thank you in advance, > > Wendy > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- ---------------------------------------- Florence Cavalli PhD student, Luscombe group EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton CB10 1SD Cambridge UK email: florence@ebi.ac.uk Global poverty project | Cambridge http://www.globalpovertyproject.com/ [[alternative HTML version deleted]]
Answer: identifying consistently expressed genes between replicates
0
8.3 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:
Dear Wendy, >From your email, I assume that you have found signature genes by comparing each cell type to all the other cell types treated as one group. As you have correctly observed, this does not take account of consistency within the other cell types. Another way to find signature genes, that I think is superior, is to choose signature genes to be those genes that are uniquely higher or lower in the relevant cell type with respect to each of the other cell types individually. In other words, a positive signature gene is higher in the relevant cell type against every other cell type, not just against the average of the other cell types. This was the method used in: Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML, Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; kConFab, Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, Lindeman GJ. Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nature Medicine 2009. to find stem cell signature genes. If you do it this way, consistency within the cell types is automatically taken care off, because the t-tests will only choose genes with consistent behaviour. limma can do all the relevant pairwise tests for you in a couple of lines, then use decideTests() to choose the signature genes. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, NHMRC Senior Research Fellow, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth > Date: Sat, 9 Apr 2011 19:57:25 -0400 > From: Wendy Qiao <wendy2.qiao at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] identifying consistently expressed genes between > replicates > > Hi all, > > I am comparing a number of cell types, and am wanting to find the signature > genes of each cell type. I used the limma package to do this. The signature > genes of a given cell type are found by the fold different between the given > cell type and grand mean of all the cell types, as well as the BH- adjusted > p-values. I want to add another condition to test the consistency of > expression levels of the selected genes for each cell type. I can do this by > looking at the standard deviations of gene expressions between replicates. I > am just wondering if there is any function in limma or other BioConductor > package to do this. > > Thank you in advance, > Wendy ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
Dear Gordon, Thank you very much for your information. You are right-I am comparing each cell type to the average of all the others. Ideally, I want to compare each cell type to the others pairwisely and find the signature genes as you suggested. I tried this before, but I am afraid that I did not take the full advantages of limma as I am new here. Here is my problem. I am comparing 24 blood cell types (92 arrays in total). Following are the steps that I took. The pairwise comparison take dozens of ligands. Then I used topTable to find overexpressed genes from each comparison, and finally do the 'intersect'. I believe that there is an easy way to do all the pairwise comparisons and use decideTests(). Would you mind giving me some hints on that? Thank you very much. Wendy f<-factor(samplenames) #sampelenames = colnames of 92 arrays with replicates have the same name design<-model.matrix(~0+f) fit<-lmFites.mx,design) fit<-eBayes(fit) contrast.matrix<-makeContrasts(fBASO1-fBCELLA1, fBASO1-fBCELLA2..... fBASO1 fBCELLA1 fBCELLA2 fBCELLA3 ... 1 1 0 0 0 ... 2 1 0 0 0 ... 3 1 0 0 0 ... 4 0 1 0 0 ... ... 92 0 0 0 0 ... On 10 April 2011 18:30, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Wendy, > > From your email, I assume that you have found signature genes by comparing > each cell type to all the other cell types treated as one group. As you > have correctly observed, this does not take account of consistency within > the other cell types. Another way to find signature genes, that I think is > superior, is to choose signature genes to be those genes that are uniquely > higher or lower in the relevant cell type with respect to each of the other > cell types individually. In other words, a positive signature gene is > higher in the relevant cell type against every other cell type, not just > against the average of the other cell types. This was the method used in: > > Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML, > Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; kConFab, > Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, Lindeman GJ. > Aberrant luminal progenitors as the candidate target population for basal > tumor development in BRCA1 mutation carriers. Nature Medicine 2009. > > to find stem cell signature genes. If you do it this way, consistency > within the cell types is automatically taken care off, because the t-tests > will only choose genes with consistent behaviour. limma can do all the > relevant pairwise tests for you in a couple of lines, then use decideTests() > to choose the signature genes. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > NHMRC Senior Research Fellow, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > smyth@wehi.edu.au > http://www.wehi.edu.au > http://www.statsci.org/smyth > > > Date: Sat, 9 Apr 2011 19:57:25 -0400 >> From: Wendy Qiao <wendy2.qiao@gmail.com> >> To: bioconductor@r-project.org >> Subject: [BioC] identifying consistently expressed genes between >> replicates >> >> Hi all, >> >> I am comparing a number of cell types, and am wanting to find the >> signature >> genes of each cell type. I used the limma package to do this. The >> signature >> genes of a given cell type are found by the fold different between the >> given >> cell type and grand mean of all the cell types, as well as the BH- adjusted >> p-values. I want to add another condition to test the consistency of >> expression levels of the selected genes for each cell type. I can do this >> by >> looking at the standard deviations of gene expressions between replicates. >> I >> am just wondering if there is any function in limma or other BioConductor >> package to do this. >> >> Thank you in advance, >> Wendy >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
Hi Wendy, First, let me mention that fit$sigma holds the between-replicate standard deviation for each gene, which is probably what you were looking for in your original post. Second, here is a way to compare each cell type with each of the others. Suppose you want signature genes for BCELLA2. The following will compare all other cell types back to BCELLA2: f <- factor(samplenames) BCELLA2vs <- relevel(f,ref="BCELLA2") design <- model.matrix(~BCELLA2vs) fit <- eBayes(lmFites.mx,design)) Now do all the pairwise tests asking for FDR better than 0.1 and fold change at least 1.5 (you can choose the settings you want): results <- decideTests(fit[,-1], p=0.1, lfc=log2(1.5)) You can find the indices of positive signature genes that are up in all comparisons by: i <- apply(results>0,1,all) or negative signature genes by i <- apply(results<0,1,all) However, you have so many cell types, some of which are probably quite similar. You might allow some of these comparisons to be non- significant. Suppose you decide to restrict to genes that are up in BCELLA2 vs 20 out of the 23 other cell types: i <- rowSums(results>0) >= 20 You can see that any variation of this is quite easy. Best wishes Gordon On Sun, 10 Apr 2011, Wendy Qiao wrote > Dear Gordon, > > Thank you very much for your information. > > You are right-I am comparing each cell type to the average of all the > others. Ideally, I want to compare each cell type to the others pairwisely > and find the signature genes as you suggested. I tried this before, but I am > afraid that I did not take the full advantages of limma as I am new here. > Here is my problem. I am comparing 24 blood cell types (92 arrays in total). > Following are the steps that I took. The pairwise comparison take dozens of > ligands. Then I used topTable to find overexpressed genes from each > comparison, and finally do the 'intersect'. I believe that there is an easy > way to do all the pairwise comparisons and use decideTests(). Would you mind > giving me some hints on that? > > Thank you very much. > Wendy > > f<-factor(samplenames) #sampelenames = colnames of 92 arrays with > replicates have the same name > design<-model.matrix(~0+f) > fit<-lmFites.mx,design) > fit<-eBayes(fit) > > contrast.matrix<-makeContrasts(fBASO1-fBCELLA1, fBASO1-fBCELLA2..... > > > > fBASO1 fBCELLA1 fBCELLA2 fBCELLA3 ... > 1 1 0 0 0 ... > 2 1 0 0 0 ... > 3 1 0 0 0 ... > 4 0 1 0 0 ... > ... > 92 0 0 0 0 ... > > > On 10 April 2011 18:30, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Wendy, >> >> From your email, I assume that you have found signature genes by comparing >> each cell type to all the other cell types treated as one group. As you >> have correctly observed, this does not take account of consistency within >> the other cell types. Another way to find signature genes, that I think is >> superior, is to choose signature genes to be those genes that are uniquely >> higher or lower in the relevant cell type with respect to each of the other >> cell types individually. In other words, a positive signature gene is >> higher in the relevant cell type against every other cell type, not just >> against the average of the other cell types. This was the method used in: >> >> Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML, >> Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; kConFab, >> Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, Lindeman GJ. >> Aberrant luminal progenitors as the candidate target population for basal >> tumor development in BRCA1 mutation carriers. Nature Medicine 2009. >> >> to find stem cell signature genes. If you do it this way, consistency >> within the cell types is automatically taken care off, because the t-tests >> will only choose genes with consistent behaviour. limma can do all the >> relevant pairwise tests for you in a couple of lines, then use decideTests() >> to choose the signature genes. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> NHMRC Senior Research Fellow, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> Tel: (03) 9345 2326, Fax (03) 9347 0852, >> smyth at wehi.edu.au >> http://www.wehi.edu.au >> http://www.statsci.org/smyth >> >> >> Date: Sat, 9 Apr 2011 19:57:25 -0400 >>> From: Wendy Qiao <wendy2.qiao at="" gmail.com=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] identifying consistently expressed genes between >>> replicates >>> >>> Hi all, >>> >>> I am comparing a number of cell types, and am wanting to find the >>> signature genes of each cell type. I used the limma package to do >>> this. The signature genes of a given cell type are found by the fold >>> different between the given cell type and grand mean of all the cell >>> types, as well as the BH-adjusted p-values. I want to add another >>> condition to test the consistency of expression levels of the selected >>> genes for each cell type. I can do this by looking at the standard >>> deviations of gene expressions between replicates. I am just wondering >>> if there is any function in limma or other BioConductor package to do >>> this. >>> >>> Thank you in advance, >>> Wendy ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ADD REPLYlink written 8.3 years ago by Gordon Smyth37k Hi Wendy, It occured to me after sending my last email that the code I gave you will compute contrasts in the form OtherCell-BCELLA2 rather than BCELLA2-OtherCell, so you need to use results<0 for positive signature genes and results>0 for negative, i.e., the other way around to my email. Best wishes Gordon On Mon, 11 Apr 2011, Gordon K Smyth wrote: > Hi Wendy, > > First, let me mention that fit$sigma holds the between-replicate standard > deviation for each gene, which is probably what you were looking for in your > original post. > > Second, here is a way to compare each cell type with each of the others. > Suppose you want signature genes for BCELLA2. The following will compare all > other cell types back to BCELLA2: > > f <- factor(samplenames) > BCELLA2vs <- relevel(f,ref="BCELLA2") > design <- model.matrix(~BCELLA2vs) > fit <- eBayes(lmFites.mx,design)) > > Now do all the pairwise tests asking for FDR better than 0.1 and fold change > at least 1.5 (you can choose the settings you want): > > results <- decideTests(fit[,-1], p=0.1, lfc=log2(1.5)) > > You can find the indices of positive signature genes that are up in all > comparisons by: > > i <- apply(results>0,1,all) > > or negative signature genes by > > i <- apply(results<0,1,all) > > However, you have so many cell types, some of which are probably quite > similar. You might allow some of these comparisons to be non- significant. > Suppose you decide to restrict to genes that are up in BCELLA2 vs 20 out of > the 23 other cell types: > > i <- rowSums(results>0) >= 20 > > You can see that any variation of this is quite easy. > > Best wishes > Gordon > > > On Sun, 10 Apr 2011, Wendy Qiao wrote > >> Dear Gordon, >> >> Thank you very much for your information. >> >> You are right-I am comparing each cell type to the average of all the >> others. Ideally, I want to compare each cell type to the others >> pairwisely and find the signature genes as you suggested. I tried this >> before, but I am afraid that I did not take the full advantages of >> limma as I am new here. >> Here is my problem. I am comparing 24 blood cell types (92 arrays in >> total). Following are the steps that I took. The pairwise comparison >> take dozens of ligands. Then I used topTable to find overexpressed >> genes from each comparison, and finally do the 'intersect'. I believe >> that there is an easy way to do all the pairwise comparisons and use >> decideTests(). Would you mind giving me some hints on that? >> >> Thank you very much. >> Wendy >> >> f<-factor(samplenames) #sampelenames = colnames of 92 arrays with >> replicates have the same name >> design<-model.matrix(~0+f) >> fit<-lmFites.mx,design) >> fit<-eBayes(fit) >> >> contrast.matrix<-makeContrasts(fBASO1-fBCELLA1, fBASO1-fBCELLA2..... >> >> >> >> fBASO1 fBCELLA1 fBCELLA2 fBCELLA3 ... >> 1 1 0 0 0 ... >> 2 1 0 0 0 ... >> 3 1 0 0 0 ... >> 4 0 1 0 0 ... >> ... >> 92 0 0 0 0 ... >> >> >> On 10 April 2011 18:30, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Wendy, >>> >>> From your email, I assume that you have found signature genes by >>> comparing each cell type to all the other cell types treated as one >>> group. As you have correctly observed, this does not take account of >>> consistency within the other cell types. Another way to find >>> signature genes, that I think is superior, is to choose signature >>> genes to be those genes that are uniquely higher or lower in the >>> relevant cell type with respect to each of the other cell types >>> individually. In other words, a positive signature gene is higher in >>> the relevant cell type against every other cell type, not just against >>> the average of the other cell types. This was the method used in: >>> >>> Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML, >>> Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; >>> kConFab, >>> Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, Lindeman GJ. >>> Aberrant luminal progenitors as the candidate target population for basal >>> tumor development in BRCA1 mutation carriers. Nature Medicine 2009. >>> >>> to find stem cell signature genes. If you do it this way, consistency >>> within the cell types is automatically taken care off, because the >>> t-tests will only choose genes with consistent behaviour. limma can >>> do all the relevant pairwise tests for you in a couple of lines, then >>> use decideTests() to choose the signature genes. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> NHMRC Senior Research Fellow, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> Tel: (03) 9345 2326, Fax (03) 9347 0852, >>> smyth at wehi.edu.au >>> http://www.wehi.edu.au >>> http://www.statsci.org/smyth >>> >>> >>> Date: Sat, 9 Apr 2011 19:57:25 -0400 >>>> From: Wendy Qiao <wendy2.qiao at="" gmail.com=""> >>>> To: bioconductor at r-project.org >>>> Subject: [BioC] identifying consistently expressed genes between >>>> replicates >>>> >>>> Hi all, >>>> >>>> I am comparing a number of cell types, and am wanting to find the >>>> signature genes of each cell type. I used the limma package to do this. >>>> The signature genes of a given cell type are found by the fold different >>>> between the given cell type and grand mean of all the cell types, as well >>>> as the BH-adjusted p-values. I want to add another condition to test the >>>> consistency of expression levels of the selected genes for each cell >>>> type. I can do this by looking at the standard deviations of gene >>>> expressions between replicates. I am just wondering if there is any >>>> function in limma or other BioConductor package to do this. >>>> >>>> Thank you in advance, >>>> Wendy > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}