Minimum expression threshold?
1
0
Entering edit mode
Bob ▴ 20
@bob-419
Last seen 10.3 years ago
Thanks for your reply. I'm still a bit puzzled over the range of expression values returned from RMA. Using the following: my.affy<- ReadAffy() my.eset <- rma(my.affy) summary(exprs(my.eset[,1])) Returns: Min. : 3.083 1st Qu.: 5.711 Median : 6.973 Mean : 7.144 3rd Qu.: 8.491 Max. :13.848 This shows the minimum expression value is 3.083 - but the tissue I'm using cannot be expressing all of these genes (I'm hybridizing to the U133A chip). So, I guess there are two main questions: * Should RMA only be used for comparative studies? What if someone wanted to create a database of all genes expressed in tissue X? (not that I'm doing this, but what if?) What I'd like to do is filter the gene list so I can cut down on the number of tests in the multiple testing routine (and hence get better numbers). * What exactly is the expression value that RMA returns? I know it is log2 transformed, but I don't understand what it corresponds to. Sorry if these questions are answered somewhere - I've looked but maybe not looked well enough. Thanks in advance. "Rafael A. Irizarry" <ririzarr@jhsph.edu> wrote: hi! i don't know of any good references. in practice i don't like to arbitrarily decide on a such cut-offs. this could be very problematic if you use MAS 5.0, but with other expression measures such as pm only li wong and rma you usually don't need this filtering step. sorry i cant be of more help, rafael On Tue, 26 Aug 2003, Bob wrote: > Hello, > I have started using bioconductor (which is great, by > the way), and I have a question regarding how to > choose a minimum expression threshold. > > I have read in the Affymetrix cel files, calculated > expression using rma(), and now have a data frame with > ~22k expression values across 14 samples (using the > U133A chip). There are expression values for each > Affy spot - although it is probably not true that this > tissue expresses all 22k genes. My question is how do > I choose a threshold above which I consider the gene > to be expressed? > > In addition (please correct me if I'm wrong), using > only the number of expressed genes (or at least not > all of the spots) will make for better values using > the multtest package. > > Can someone point me in the right direction, or point > out some good references on this topic? > > Thanks. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > --------------------------------- [[alternative HTML version deleted]]
multtest affy multtest affy • 1.6k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Hi Bob, > This shows the minimum expression value is 3.083 - but the tissue I'm > using cannot be expressing all of these genes (I'm hybridizing to the > U133A chip). > * What exactly is the expression value that RMA returns? I know it is > log2 transformed, but I don't understand what it corresponds to. Before the log2 transformation, there is a background correction, and what that does is to "push up" or bias the data a little bit upwards, to make sure that the data are all positive, even for unexpressed genes. This has the advantage of making the variance of the log2 (-ratios) more stable. > * Should RMA only be used for comparative studies? > What if someone wanted to create a database of all genes expressed in > tissue X? (not that I'm doing this, but what if?) How do you define "expressed in tissue X"? How many copies of the transcript per cell, in what fraction of cells among the population of cells? Also, there is always some kind of background signal, even if very small, from unspecific hybridisation and the like, so that at the lower end of the intensity spectrum it is probably is not possible to find a threshold that yields both a low false positive and high true positive rate. Also, a more sensitive criterion to decide whether a gene is actually expressed or not may be to look for correlation of its expression values with those of other genes or phenotypic variables. (If it is not expressed, the "noise" should show no such correlations.) Best regards Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/mga/whuber
ADD COMMENT
0
Entering edit mode

Dear Dr.Huber I downloaded affymetrix microarray data from GEO database. After finally palying around in Rstudio, I managed to get a CSV file from CEL file. This file consits of 11 rows: Affymetrix ID(sometimes duplicated) and represented as s_at, x_at, a_at, logFC, AverageExpression, t, P.value, adj.p.val, B, ensembl-gene-id, gene biotype and external gene name. I am not able to understand what threshold should be consider if now I want to have a list of upregulated and downregulated protein coding genes from such a list. Which column must be considered? Thank you Amruta

ADD REPLY

Login before adding your answer.

Traffic: 497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6