question about get a summary gene expression information from the probe set associated with one specific gene

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Xiaowei, On 12/30/2010 12:55 PM, Xiaowei Guan wrote: > Dear Bioconudctor, > > I have this question about how to compress the gene expression dataset from > probe sets denoted to gene denoted values. > > The analysis has two simultaneous goals, first is to convert the probe sets > to gene names. Second is to convert the probe sets values into just one > summary gene expression value of the associated gene. > > For example, we have 21 probes that corresponding to only 4 genes, Is there > any package will fulfill the goal of deriving a summary information of a > probe set corresponding to a specific gene? You are not using very exacting language here. If I assume you are using Affymetrix chips, then a probe is quite different from a probe set. If I further assume that any time you say 'probe' you actually mean probe set (e.g., 'we have 21 probe [sets] that correspond to only 4 genes'), then there are a couple of ways you can go here. And each has its own positive and negative aspects. You could use one of the MBNI re-mapped CDF packages, which map the probes to (genes, transcripts, etc, depending on the package), so each probe set uniquely measures a single entity. The positive aspects of these CDF packages is that you no longer have multiple probesets for each gene. The negative aspect is that the number of probes per probe set is highly variable, so the accuracy of the measurements will vary as well (and this is usually not accounted for when doing downstream analyses). Alternatively, you could choose just one probe set for each gene, based on something like the most variability between your sample types, or the largest difference. There is a function findLargest() in the genefilter package that can help, and there may be others as well. The positives for this approach are that you again only have one probe set per gene. The negative is that you are making the (unfounded IMO) assumption that you can determine which probe set is measuring a given gene on such a simple criterion. > > Another question is: if there are no gene assignments for our data (only > probes names here),is there any way to assign the genes to each probe > simultaneous when getting the summary information of the probe set? When > download the annotation file. I noticed there are some probes which have two > or more gene names, and in such case, we want to have two same columns with > different gene names. If there are no gene assignments to a probe set, that is because it likely doesn't measure any transcript. Most of the Affy chips were designed years ago, and given the fluid nature of gene annotations it isn't unusual for some of them to no longer be considered to measure a known gene. As for your second question, that's easy. Just make another row and tack on the second gene name. It's probably not advisable, however. In these cases you aren't sure what the probe set is measuring, so attributing the data to two genes is a fairly risky proposition. Best, Jim Thank you so much! > > Best, > Xiaowei > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

Annotation GO cdf probe affy convert ASSIGN Annotation GO cdf probe affy convert • 1.3k views

ADD COMMENT • link updated 15.1 years ago by Xiaowei Guan ▴ 90 • written 15.1 years ago by James W. MacDonald 68k

0

Entering edit mode

Xiaowei Guan ▴ 90

@xiaowei-guan-4405

Last seen 11.4 years ago

Thanks Jim. I will try the packages you suggested. Really appreciate your help. best, Xiaowei On Mon, Jan 3, 2011 at 1:00 PM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Xiaowei, > > > On 12/30/2010 12:55 PM, Xiaowei Guan wrote: > >> Dear Bioconudctor, >> >> I have this question about how to compress the gene expression dataset >> from >> probe sets denoted to gene denoted values. >> >> The analysis has two simultaneous goals, first is to convert the probe >> sets >> to gene names. Second is to convert the probe sets values into just one >> summary gene expression value of the associated gene. >> >> For example, we have 21 probes that corresponding to only 4 genes, Is >> there >> any package will fulfill the goal of deriving a summary information of a >> probe set corresponding to a specific gene? >> > > You are not using very exacting language here. If I assume you are using > Affymetrix chips, then a probe is quite different from a probe set. If I > further assume that any time you say 'probe' you actually mean probe set > (e.g., 'we have 21 probe [sets] that correspond to only 4 genes'), then > there are a couple of ways you can go here. And each has its own positive > and negative aspects. > > You could use one of the MBNI re-mapped CDF packages, which map the probes > to (genes, transcripts, etc, depending on the package), so each probe set > uniquely measures a single entity. The positive aspects of these CDF > packages is that you no longer have multiple probesets for each gene. The > negative aspect is that the number of probes per probe set is highly > variable, so the accuracy of the measurements will vary as well (and this is > usually not accounted for when doing downstream analyses). > > Alternatively, you could choose just one probe set for each gene, based on > something like the most variability between your sample types, or the > largest difference. There is a function findLargest() in the genefilter > package that can help, and there may be others as well. The positives for > this approach are that you again only have one probe set per gene. The > negative is that you are making the (unfounded IMO) assumption that you can > determine which probe set is measuring a given gene on such a simple > criterion. > > > >> Another question is: if there are no gene assignments for our data (only >> probes names here),is there any way to assign the genes to each probe >> simultaneous when getting the summary information of the probe set? When >> download the annotation file. I noticed there are some probes which have >> two >> or more gene names, and in such case, we want to have two same columns >> with >> different gene names. >> > > If there are no gene assignments to a probe set, that is because it likely > doesn't measure any transcript. Most of the Affy chips were designed years > ago, and given the fluid nature of gene annotations it isn't unusual for > some of them to no longer be considered to measure a known gene. > > As for your second question, that's easy. Just make another row and tack on > the second gene name. It's probably not advisable, however. In these cases > you aren't sure what the probe set is measuring, so attributing the data to > two genes is a fairly risky proposition. > > Best, > > Jim > > > Thank you so much! > >> >> Best, >> Xiaowei >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD COMMENT • link 15.1 years ago Xiaowei Guan ▴ 90

Login before adding your answer.