Question

Ranking genes according to tissue specificity

0

Entering edit mode

Cei Abreu-Goodger ▴ 830

@cei-abreu-goodger-4433

Last seen 9.1 years ago

Mexico

Hello all, I'm new to R/BioC, but I've been trying to use them for the following analysis. I appologize if this email is a bit long, but I bet someone in this list could point me in the right direction. I have the GNF dataset with Affy expression data from 61 mouse tissues (each with 1 biological replicate, 122 total CEL files) In the end I would like to obtain, for each tissue, the gene list sorted according to the specificity of their expression in that tissue. That is, genes whose expression is highest in that tissue, relative to the other tissues (although their absolute expression levels could be low) at the top, and genes whose expession is lowest in that tissue (although their absolute expression levels could be high) at the bottom. Ideally, I would like to have some confidence value (p-value?) associated to each gene as well. Initially, I downloaded the pre-normalized (with MAS or gcRMA) files, and did all the manipulation with perl scripts. For each probe X, I took its expression values Xi (i = 1..61) for each tissue, and substituted the expression value for (Xi - mean(x))/ std_dev(x), essentially a Z-score. In this way, the "Z-score" represents how specifically expressed a particular gene is in a particular tissue, considering the std_dev of the expression levels of that gene. One of the first problems with this, is that I am only processing a subset of the probes, since I only use those with a RefSeq transcript. So I thought it would be better to re-normalize everything considering only the subset of the transcripts that I will be analyzing. Is this correct? I think for my particular case I'm better off with a RMA/gcRMA summary. I can see that I can use the "subset" parameter to select only the probesets I want. Also, I can't make much use of the A/M/P calls of MAS analysis, since I don't want the low-expression values to be cut off. I read a couple of papers where they compared these and other methods, and decided to initially try gcRMA. I guess my main questions are, other than trying to get general suggestions: 1) at what point do I use the biological replicates? 2) is there a package that I can use to obtain "relative" expression levels, among all the tissues? I can find many examples of how to get relative expression levels when comparing two cases (or a few more, but always comparing in pairs). How can I best compare each tissue to "all the rest"? Thanks for your time, Cei

probe affy gcrma probe affy gcrma • 1.1k views

ADD COMMENT • link 17.1 years ago Cei Abreu-Goodger ▴ 830