Hi all,
I have a small curiosity regarding annotation of probesets in affy
GeneChips. I find that some times 2 probe sets refer to the same gene.
For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
35372_r_at) both point to the same gene IL8. I wonder what is the
scientific reason for such a duplication?
I understand that the signal from 2 probesets would be affected by
dye-labeling effect and hybridization effect in addition to mRNA
abundance. What is then the point of having 2 probe sets which might
give different results for the same gene?
Please send any pointers/references that you find appropriate.
Thanks for your consideration.
With thanks,
Saroj
Saroj Mohapatra wrote:
> Hi all,
>
> I have a small curiosity regarding annotation of probesets in affy
> GeneChips. I find that some times 2 probe sets refer to the same
gene.
>
> For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
> 35372_r_at) both point to the same gene IL8. I wonder what is the
> scientific reason for such a duplication?
There can be a number of reasons for such duplication. The first and
foremost is probably that we are typically measuring "transcript"
expression rather than gene expression, except in the case that there
is
only one transcript for a given gene. If there is more than one
transcript, it may be necessary to have more than one probeset to
capture all of them.
I would say that in general, most modern arrays cover many genes more
than once; one can certainly not make any assumptions about each gene
being represented only once.
> I understand that the signal from 2 probesets would be affected by
> dye-labeling effect and hybridization effect in addition to mRNA
> abundance. What is then the point of having 2 probe sets which might
> give different results for the same gene?
They often give similar results, but sometimes not. A certain amount
of
redundancy is probably a good thing, although it can be a headache.
Sean
Affy spot spot multiple probeset of the same "gene". Especially when
there is an extension like _s_at and _r_at meaning "similarity
constraint" and "rules dropped" for the selection of the probes in the
probeset.
see
http://www.affymetrix.com/Auth/support/downloads/manuals/
data_analysis_fundamentals_manual.pdf
page 93-94 for more info.
Sometime they also spot multiple specific probeset _at for the same
gene to measure the alternative transcripts.
But sometime probeset give different intensities even if the spot the
same transcript. There could be multiple reasons for that, like the GC
of the probe, the unspecificity of the mismatch probe, artifact on the
chip etc...
David
On Apr 24, 2006, at 3:14, Sean Davis wrote:
> Saroj Mohapatra wrote:
>> Hi all,
>>
>> I have a small curiosity regarding annotation of probesets in affy
>> GeneChips. I find that some times 2 probe sets refer to the same
gene.
>>
>> For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
>> 35372_r_at) both point to the same gene IL8. I wonder what is the
>> scientific reason for such a duplication?
>
> There can be a number of reasons for such duplication. The first
and
> foremost is probably that we are typically measuring "transcript"
> expression rather than gene expression, except in the case that
there
> is
> only one transcript for a given gene. If there is more than one
> transcript, it may be necessary to have more than one probeset to
> capture all of them.
>
> I would say that in general, most modern arrays cover many genes
more
> than once; one can certainly not make any assumptions about each
gene
> being represented only once.
>
>> I understand that the signal from 2 probesets would be affected by
>> dye-labeling effect and hybridization effect in addition to mRNA
>> abundance. What is then the point of having 2 probe sets which
might
>> give different results for the same gene?
>
> They often give similar results, but sometimes not. A certain
amount
> of
> redundancy is probably a good thing, although it can be a headache.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Hi, Saroj,
How have you been?
As far as I know, the different probe sets are corresponding to
different region of the gene, I don't know why Affy do this, probably
they originally thought the probe sets for the same gene but different
region will serve just like a "probe sets sets", a 2nd-layer
confirmation of the gene expression, but it turned out sometimes the
different probe sets of same gene express differently too. Sometimes
it's because the probe sets are not all hybridize to the coding region
of the gene, so when we do our analysis, we only consider the
expression of the coding region probe sets, which, of course, take
some "Blast".
Hope other experts can give better ideas about this!
Bin
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch on behalf of Saroj
Mohapatra
Sent: Sun 4/23/2006 6:02 PM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Duplicated probesets for the same gene
Hi all,
I have a small curiosity regarding annotation of probesets in affy
GeneChips. I find that some times 2 probe sets refer to the same gene.
For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
35372_r_at) both point to the same gene IL8. I wonder what is the
scientific reason for such a duplication?
I understand that the signal from 2 probesets would be affected by
dye-labeling effect and hybridization effect in addition to mRNA
abundance. What is then the point of having 2 probe sets which might
give different results for the same gene?
Please send any pointers/references that you find appropriate.
Thanks for your consideration.
With thanks,
Saroj
As Sean mentioned, there are possibly many reasons for multiple
probesets. First, they may be intended to interrogate splice variants.
Second, these probesets are based on UniGene build 95, which is very
old
(the current build is #190), and many ESTs or Riken genes may have
been
mapped in the intervening period to genes that already existed on the
chip.
In addition, many of the probesets contain probes that are now known
to
either interrogate unrelated sequences or not map to any known
sequence.
You can now download the re-mapped cdfs that are provided by the
Molecular and Behavioral Neuroscience Institute (MBNI) at the
University
of Michigan directly from BioC. These cdfs contain probesets that have
been re-mapped based on the current UniGene, Ensembl, Entrez Gene,
RefSeq, or Tigr annotations. The benefits of using these cdfs are
twofold. First, there is only one probeset per gene (may not be true
of
RefSeq - I think there may be some redundancy there, but am not sure).
Second, any probe that interrogates multiple transcripts or no longer
maps to the genome have been removed, so theoretically you should get
better data.
The major downside (for me at least) is the loss of the easy
preprocess
==> analyze ==> annotate pipeline provided by the affy, limma, and
annaffy packages. However, Steffen Durinck has kindly modified his
biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==>
annotate analysis pipeline. Anybody interested in such things can take
a
look at the prettyOutput vignette in biomaRt.
Best,
Jim
Ye, Bin wrote:
> Hi, Saroj,
>
> How have you been? As far as I know, the different probe sets are
> corresponding to different region of the gene, I don't know why Affy
> do this, probably they originally thought the probe sets for the
same
> gene but different region will serve just like a "probe sets sets",
a
> 2nd-layer confirmation of the gene expression, but it turned out
> sometimes the different probe sets of same gene express differently
> too. Sometimes it's because the probe sets are not all hybridize to
> the coding region of the gene, so when we do our analysis, we only
> consider the expression of the coding region probe sets, which, of
> course, take some "Blast".
>
> Hope other experts can give better ideas about this!
>
>
> Bin
>
>
> -----Original Message----- From:
> bioconductor-bounces at stat.math.ethz.ch on behalf of Saroj
Mohapatra
> Sent: Sun 4/23/2006 6:02 PM To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Duplicated probesets for the same gene
>
> Hi all,
>
> I have a small curiosity regarding annotation of probesets in affy
> GeneChips. I find that some times 2 probe sets refer to the same
> gene.
>
> For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
> 35372_r_at) both point to the same gene IL8. I wonder what is the
> scientific reason for such a duplication?
>
> I understand that the signal from 2 probesets would be affected by
> dye-labeling effect and hybridization effect in addition to mRNA
> abundance. What is then the point of having 2 probe sets which might
> give different results for the same gene?
>
> Please send any pointers/references that you find appropriate.
>
> Thanks for your consideration.
>
> With thanks,
>
> Saroj
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
Thanks to Sean, Bin, David and Jim, I have now a much better
understanding of the issues.
I am going to try the re-mapped cdfs.
Sincerely,
Saroj
James W. MacDonald wrote:
> As Sean mentioned, there are possibly many reasons for multiple
> probesets. First, they may be intended to interrogate splice
variants.
> Second, these probesets are based on UniGene build 95, which is very
old
> (the current build is #190), and many ESTs or Riken genes may have
been
> mapped in the intervening period to genes that already existed on
the chip.
>
> In addition, many of the probesets contain probes that are now known
to
> either interrogate unrelated sequences or not map to any known
sequence.
>
> You can now download the re-mapped cdfs that are provided by the
> Molecular and Behavioral Neuroscience Institute (MBNI) at the
University
> of Michigan directly from BioC. These cdfs contain probesets that
have
> been re-mapped based on the current UniGene, Ensembl, Entrez Gene,
> RefSeq, or Tigr annotations. The benefits of using these cdfs are
> twofold. First, there is only one probeset per gene (may not be true
of
> RefSeq - I think there may be some redundancy there, but am not
sure).
> Second, any probe that interrogates multiple transcripts or no
longer
> maps to the genome have been removed, so theoretically you should
get
> better data.
>
> The major downside (for me at least) is the loss of the easy
preprocess
> ==> analyze ==> annotate pipeline provided by the affy, limma, and
> annaffy packages. However, Steffen Durinck has kindly modified his
> biomaRt code to allow for an alternate affy ==> limma ==> biomaRt
==>
> annotate analysis pipeline. Anybody interested in such things can
take a
> look at the prettyOutput vignette in biomaRt.
>
> Best,
>
> Jim
>
>
>
>