Duplicated probesets for the same gene
2
0
Entering edit mode
@saroj-mohapatra-1446
Last seen 9.6 years ago
Hi all, I have a small curiosity regarding annotation of probesets in affy GeneChips. I find that some times 2 probe sets refer to the same gene. For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and 35372_r_at) both point to the same gene IL8. I wonder what is the scientific reason for such a duplication? I understand that the signal from 2 probesets would be affected by dye-labeling effect and hybridization effect in addition to mRNA abundance. What is then the point of having 2 probe sets which might give different results for the same gene? Please send any pointers/references that you find appropriate. Thanks for your consideration. With thanks, Saroj
Annotation probe Annotation probe • 1.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Saroj Mohapatra wrote: > Hi all, > > I have a small curiosity regarding annotation of probesets in affy > GeneChips. I find that some times 2 probe sets refer to the same gene. > > For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and > 35372_r_at) both point to the same gene IL8. I wonder what is the > scientific reason for such a duplication? There can be a number of reasons for such duplication. The first and foremost is probably that we are typically measuring "transcript" expression rather than gene expression, except in the case that there is only one transcript for a given gene. If there is more than one transcript, it may be necessary to have more than one probeset to capture all of them. I would say that in general, most modern arrays cover many genes more than once; one can certainly not make any assumptions about each gene being represented only once. > I understand that the signal from 2 probesets would be affected by > dye-labeling effect and hybridization effect in addition to mRNA > abundance. What is then the point of having 2 probe sets which might > give different results for the same gene? They often give similar results, but sometimes not. A certain amount of redundancy is probably a good thing, although it can be a headache. Sean
ADD COMMENT
0
Entering edit mode
Affy spot spot multiple probeset of the same "gene". Especially when there is an extension like _s_at and _r_at meaning "similarity constraint" and "rules dropped" for the selection of the probes in the probeset. see http://www.affymetrix.com/Auth/support/downloads/manuals/ data_analysis_fundamentals_manual.pdf page 93-94 for more info. Sometime they also spot multiple specific probeset _at for the same gene to measure the alternative transcripts. But sometime probeset give different intensities even if the spot the same transcript. There could be multiple reasons for that, like the GC of the probe, the unspecificity of the mismatch probe, artifact on the chip etc... David On Apr 24, 2006, at 3:14, Sean Davis wrote: > Saroj Mohapatra wrote: >> Hi all, >> >> I have a small curiosity regarding annotation of probesets in affy >> GeneChips. I find that some times 2 probe sets refer to the same gene. >> >> For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and >> 35372_r_at) both point to the same gene IL8. I wonder what is the >> scientific reason for such a duplication? > > There can be a number of reasons for such duplication. The first and > foremost is probably that we are typically measuring "transcript" > expression rather than gene expression, except in the case that there > is > only one transcript for a given gene. If there is more than one > transcript, it may be necessary to have more than one probeset to > capture all of them. > > I would say that in general, most modern arrays cover many genes more > than once; one can certainly not make any assumptions about each gene > being represented only once. > >> I understand that the signal from 2 probesets would be affected by >> dye-labeling effect and hybridization effect in addition to mRNA >> abundance. What is then the point of having 2 probe sets which might >> give different results for the same gene? > > They often give similar results, but sometimes not. A certain amount > of > redundancy is probably a good thing, although it can be a headache. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Ye, Bin ▴ 150
@ye-bin-1280
Last seen 9.6 years ago
Hi, Saroj, How have you been? As far as I know, the different probe sets are corresponding to different region of the gene, I don't know why Affy do this, probably they originally thought the probe sets for the same gene but different region will serve just like a "probe sets sets", a 2nd-layer confirmation of the gene expression, but it turned out sometimes the different probe sets of same gene express differently too. Sometimes it's because the probe sets are not all hybridize to the coding region of the gene, so when we do our analysis, we only consider the expression of the coding region probe sets, which, of course, take some "Blast". Hope other experts can give better ideas about this! Bin -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of Saroj Mohapatra Sent: Sun 4/23/2006 6:02 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Duplicated probesets for the same gene Hi all, I have a small curiosity regarding annotation of probesets in affy GeneChips. I find that some times 2 probe sets refer to the same gene. For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and 35372_r_at) both point to the same gene IL8. I wonder what is the scientific reason for such a duplication? I understand that the signal from 2 probesets would be affected by dye-labeling effect and hybridization effect in addition to mRNA abundance. What is then the point of having 2 probe sets which might give different results for the same gene? Please send any pointers/references that you find appropriate. Thanks for your consideration. With thanks, Saroj
ADD COMMENT
0
Entering edit mode
As Sean mentioned, there are possibly many reasons for multiple probesets. First, they may be intended to interrogate splice variants. Second, these probesets are based on UniGene build 95, which is very old (the current build is #190), and many ESTs or Riken genes may have been mapped in the intervening period to genes that already existed on the chip. In addition, many of the probesets contain probes that are now known to either interrogate unrelated sequences or not map to any known sequence. You can now download the re-mapped cdfs that are provided by the Molecular and Behavioral Neuroscience Institute (MBNI) at the University of Michigan directly from BioC. These cdfs contain probesets that have been re-mapped based on the current UniGene, Ensembl, Entrez Gene, RefSeq, or Tigr annotations. The benefits of using these cdfs are twofold. First, there is only one probeset per gene (may not be true of RefSeq - I think there may be some redundancy there, but am not sure). Second, any probe that interrogates multiple transcripts or no longer maps to the genome have been removed, so theoretically you should get better data. The major downside (for me at least) is the loss of the easy preprocess ==> analyze ==> annotate pipeline provided by the affy, limma, and annaffy packages. However, Steffen Durinck has kindly modified his biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==> annotate analysis pipeline. Anybody interested in such things can take a look at the prettyOutput vignette in biomaRt. Best, Jim Ye, Bin wrote: > Hi, Saroj, > > How have you been? As far as I know, the different probe sets are > corresponding to different region of the gene, I don't know why Affy > do this, probably they originally thought the probe sets for the same > gene but different region will serve just like a "probe sets sets", a > 2nd-layer confirmation of the gene expression, but it turned out > sometimes the different probe sets of same gene express differently > too. Sometimes it's because the probe sets are not all hybridize to > the coding region of the gene, so when we do our analysis, we only > consider the expression of the coding region probe sets, which, of > course, take some "Blast". > > Hope other experts can give better ideas about this! > > > Bin > > > -----Original Message----- From: > bioconductor-bounces at stat.math.ethz.ch on behalf of Saroj Mohapatra > Sent: Sun 4/23/2006 6:02 PM To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Duplicated probesets for the same gene > > Hi all, > > I have a small curiosity regarding annotation of probesets in affy > GeneChips. I find that some times 2 probe sets refer to the same > gene. > > For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and > 35372_r_at) both point to the same gene IL8. I wonder what is the > scientific reason for such a duplication? > > I understand that the signal from 2 probesets would be affected by > dye-labeling effect and hybridization effect in addition to mRNA > abundance. What is then the point of having 2 probe sets which might > give different results for the same gene? > > Please send any pointers/references that you find appropriate. > > Thanks for your consideration. > > With thanks, > > Saroj > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD REPLY
0
Entering edit mode
Thanks to Sean, Bin, David and Jim, I have now a much better understanding of the issues. I am going to try the re-mapped cdfs. Sincerely, Saroj James W. MacDonald wrote: > As Sean mentioned, there are possibly many reasons for multiple > probesets. First, they may be intended to interrogate splice variants. > Second, these probesets are based on UniGene build 95, which is very old > (the current build is #190), and many ESTs or Riken genes may have been > mapped in the intervening period to genes that already existed on the chip. > > In addition, many of the probesets contain probes that are now known to > either interrogate unrelated sequences or not map to any known sequence. > > You can now download the re-mapped cdfs that are provided by the > Molecular and Behavioral Neuroscience Institute (MBNI) at the University > of Michigan directly from BioC. These cdfs contain probesets that have > been re-mapped based on the current UniGene, Ensembl, Entrez Gene, > RefSeq, or Tigr annotations. The benefits of using these cdfs are > twofold. First, there is only one probeset per gene (may not be true of > RefSeq - I think there may be some redundancy there, but am not sure). > Second, any probe that interrogates multiple transcripts or no longer > maps to the genome have been removed, so theoretically you should get > better data. > > The major downside (for me at least) is the loss of the easy preprocess > ==> analyze ==> annotate pipeline provided by the affy, limma, and > annaffy packages. However, Steffen Durinck has kindly modified his > biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==> > annotate analysis pipeline. Anybody interested in such things can take a > look at the prettyOutput vignette in biomaRt. > > Best, > > Jim > > > >
ADD REPLY

Login before adding your answer.

Traffic: 491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6