Entering edit mode
Hi Xiaowei,
On 12/30/2010 12:55 PM, Xiaowei Guan wrote:
> Dear Bioconudctor,
>
> I have this question about how to compress the gene expression
dataset from
> probe sets denoted to gene denoted values.
>
> The analysis has two simultaneous goals, first is to convert the
probe sets
> to gene names. Second is to convert the probe sets values into just
one
> summary gene expression value of the associated gene.
>
> For example, we have 21 probes that corresponding to only 4 genes,
Is there
> any package will fulfill the goal of deriving a summary information
of a
> probe set corresponding to a specific gene?
You are not using very exacting language here. If I assume you are
using
Affymetrix chips, then a probe is quite different from a probe set. If
I
further assume that any time you say 'probe' you actually mean probe
set
(e.g., 'we have 21 probe [sets] that correspond to only 4 genes'),
then
there are a couple of ways you can go here. And each has its own
positive and negative aspects.
You could use one of the MBNI re-mapped CDF packages, which map the
probes to (genes, transcripts, etc, depending on the package), so each
probe set uniquely measures a single entity. The positive aspects of
these CDF packages is that you no longer have multiple probesets for
each gene. The negative aspect is that the number of probes per probe
set is highly variable, so the accuracy of the measurements will vary
as
well (and this is usually not accounted for when doing downstream
analyses).
Alternatively, you could choose just one probe set for each gene,
based
on something like the most variability between your sample types, or
the
largest difference. There is a function findLargest() in the
genefilter
package that can help, and there may be others as well. The positives
for this approach are that you again only have one probe set per gene.
The negative is that you are making the (unfounded IMO) assumption
that
you can determine which probe set is measuring a given gene on such a
simple criterion.
>
> Another question is: if there are no gene assignments for our data
(only
> probes names here),is there any way to assign the genes to each
probe
> simultaneous when getting the summary information of the probe set?
When
> download the annotation file. I noticed there are some probes which
have two
> or more gene names, and in such case, we want to have two same
columns with
> different gene names.
If there are no gene assignments to a probe set, that is because it
likely doesn't measure any transcript. Most of the Affy chips were
designed years ago, and given the fluid nature of gene annotations it
isn't unusual for some of them to no longer be considered to measure a
known gene.
As for your second question, that's easy. Just make another row and
tack
on the second gene name. It's probably not advisable, however. In
these
cases you aren't sure what the probe set is measuring, so attributing
the data to two genes is a fairly risky proposition.
Best,
Jim
Thank you so much!
>
> Best,
> Xiaowei
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues