Analyzing arrays with no replicates

0

Entering edit mode

McGee, Monnie ▴ 300

@mcgee-monnie-1108

Last seen 9.6 years ago

Dear BioC, I am trying to analyze a microarray data set with 4 samples and 54K genes. None of the samples are replicated. I would like to know which genes are differentially expressed among the 4 samples. Here are the details of my data: > print(affy.object) AffyBatch object size of arrays=1164x1164 features (42345 kb) cdf=HG-U133_Plus_2 (54675 affyids) number of samples=4 number of genes=54675 annotation=hgu133plus2 I have several questions: 1. I used rma(affy.object) to obtain normalized, background corrected expression levels. However, rma uses median polish to obtain expression values. I read in the achieves of this list that median polish was not a good idea when there are no replicates, and that rlm would be better. In order to use rlm, one needs an expression set, which means I have to specify some sort of summary method (using expresso) or take median polish (rma). If not median polish, which summary method? I have also read on the list that reviewers of some journals consider that median polish is the best method, and, if it is not used, the analysis is suspect (resulting in a reanalysis or a rejection). So, if I don't use median polish (or rma) how do I justify the method I use? 2. When I do my.eset <- rma(affy.object), I obtain an object of class "exprSet", but I think the phenoData attribute is incorrect; > print(my.eset) Expression Set (exprSet) with 54675 genes 4 samples phenoData object with 1 variables and 4 cases varLabels sample: arbitrary numbering > pData(my.eset) sample Control 1 Treat1 2 Treat2 3 Treat3 4 It seems to me that the phenoData object should have 4 variables and 1 case. In other words, phenoData sees my treatments as cases (or patients) and says that I have one variable. 3. Once I have expression levels (and a correct eset), what is the best way to do non-specific filtering? Clearly, I don't want to test all 54K genes. I tried the following, suggested in one of the R working papers: >library(genefilter) > f1 <- pOverA(0.25, log2(100)) > f2 <- function(x) (IQR(x) > 0.5) > ff <- filterfun(f1,f2) > selected <- genefilter(my.eset,ff) >sum(selected) [1] 57 Maybe 57 genes is a great number, but it seems much smaller than the examples I've seen in papers. Also, one of the AFFX genes is in this filtered sample. Since they are housekeeping genes, I would think that they aren't supposed to pass through the filter. 4. Once I have my filtered genes, how can I test for differential expression among the 4 groups? I've played around with mt.maxT (with the test="f" option), but I haven't been able to get the proper class labels. Thanks for your help and suggestions! Obviously, I am new to this. I find the work fascinating, but I realize every day that I have so much more to understand. Thanks, Monnie Monnie McGee, Ph.D. Assistant Professor Department of Statistical Science Southern Methodist University Ph: 214-768-2462 Fax: 214-768-4035

Microarray Microarray • 1.1k views

ADD COMMENT • link updated 19.2 years ago by James W. MacDonald 65k • written 19.2 years ago by McGee, Monnie ▴ 300

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

> 4. Once I have my filtered genes, how can I test for differential > expression among the 4 groups? I've played around with mt.maxT (with the > test="f" option), but I haven't been able to get the proper class labels. You won't be able to use any standard t-test. A t-test requires you to have a standard deviation within each group. In microarray analysis, almost all analyses are done only within gene, so since you have no replicates, you cannot do a t-test. In fact, although limma would (I think) allow you do do such an analysis because it uses all the genes to determine a "pooled" standard deviation, I know that such an analysis will not pass any kind of peer review. If you are using this for hypothesis generation, probably using fold-change is just as useful as hypothesis testing. Others may correct me on this (and I would like to hear what others think), but your experimental design SERIOUSLY limits any statistics you attempt. Sean

ADD COMMENT • link 19.2 years ago Sean Davis 21k

0

Entering edit mode

I agree with Sean. There are some ad hoc things you can do, but they basically amount to finding a cut-off for the fold-change and they rely heavily on the assumption that all genes have the same variance (which is unlikely). You need a "between array" measure of variance and you do not have one. --Naomi At 04:02 PM 2/20/2005, Sean Davis wrote: >>4. Once I have my filtered genes, how can I test for differential >>expression among the 4 groups? I've played around with mt.maxT (with the >>test="f" option), but I haven't been able to get the proper class labels. > >You won't be able to use any standard t-test. A t-test requires you to >have a standard deviation within each group. In microarray analysis, >almost all analyses are done only within gene, so since you have no >replicates, you cannot do a t-test. In fact, although limma would (I >think) allow you do do such an analysis because it uses all the genes to >determine a "pooled" standard deviation, I know that such an analysis will >not pass any kind of peer review. If you are using this for hypothesis >generation, probably using fold-change is just as useful as hypothesis >testing. Others may correct me on this (and I would like to hear what >others think), but your experimental design SERIOUSLY limits any >statistics you attempt. > >Sean > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 19.2 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

McGee, Monnie wrote: > > 1. I used rma(affy.object) to obtain normalized, background > corrected expression levels. However, rma uses median polish to > obtain expression values. I read in the achieves of this list that > median polish was not a good idea when there are no replicates, and > that rlm would be better. In order to use rlm, one needs an > expression set, which means I have to specify some sort of summary > method (using expresso) or take median polish (rma). If not median > polish, which summary method? I have also read on the list that > reviewers of some journals consider that median polish is the best > method, and, if it is not used, the analysis is suspect (resulting in > a reanalysis or a rejection). So, if I don't use median polish (or > rma) how do I justify the method I use? I think you are confused. RMA doesn't care if there are any replicates or not. It is simply used to convert PM data to an expression measure for each gene on each chip. > > 2. When I do my.eset <- rma(affy.object), I obtain an object of class > "exprSet", but I think the phenoData attribute is incorrect; > >> print(my.eset) > > Expression Set (exprSet) with 54675 genes 4 samples phenoData object > with 1 variables and 4 cases varLabels sample: arbitrary numbering > >> pData(my.eset) > > sample Control 1 Treat1 2 Treat2 3 Treat3 4 > > It seems to me that the phenoData object should have 4 variables and > 1 case. In other words, phenoData sees my treatments as cases (or > patients) and says that I have one variable. You can change the phenoData slot at any time. See ?exprSet for more information. > > > 3. Once I have expression levels (and a correct eset), what is the > best way to do non-specific filtering? Clearly, I don't want to test > all 54K genes. I tried the following, suggested in one of the R > working papers: > > >> library(genefilter) f1 <- pOverA(0.25, log2(100)) f2 <- function(x) >> (IQR(x) > 0.5) ff <- filterfun(f1,f2) selected <- >> genefilter(my.eset,ff) sum(selected) > > [1] 57 > > Maybe 57 genes is a great number, but it seems much smaller than the > examples I've seen in papers. Also, one of the AFFX genes is in this > filtered sample. Since they are housekeeping genes, I would think > that they aren't supposed to pass through the filter. You will likely have to do some sort of pre-filtering before looking at your fold change data (see answers from Sean Davis and Naomi Altman). Without any replication there is no reason to expect that an AFFX gene would be removed by a filter, because there is no reason to expect that these probes are any less noisy than any other, and without any measure of variability you will not be able to distinguish differentially expressed genes from noisy genes. Jim -- James W. MacDonald University of Michigan Affymetrix and cDNA Microarray Core 1500 E Medical Center Drive Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 19.2 years ago James W. MacDonald 65k

Login before adding your answer.