SCAN UPC probability - validation of gene expression results using UPC
1
0
Entering edit mode
@w-evan-johnson-5447
Last seen 28 days ago
United States
Hey Gadi, Thanks for your post. To answer your questions: #1: The UPC probability represents the "probability the gene is expressed at a level above the background." So it really depends on how confident you want to be. If being 50% confident that the gene is active in the sample (i.e. the gene is most likely expressed) is good enough confidence for you, then 50% is fine. If you are okay with "the gene might be active" then you can use 0.2 or 0.3. If you want to be nearly certain it is active (i.e. the gene is almost certainly expressed), then use something like 0.9, 0.95, 0.99 or 1.0. Its hard to give you more details unless I knew exactly what you were trying to do. Also with regard to filtering genes with low probability: If the gene is inactive in all your samples, then yes, its a good idea to filter it before downstream analysis and ComBat. However, suppose the gene was inactive in the controls, but activated by the treatment--in this case you don't want to filter it! So it might be safe to say that filtering by UPC is good as long as most (or all) of your samples have a low UPC for the gene/probe, and the UPC is NOT correlated with your outcome variables. #2: Yes the output is logged, I don't remember if they are log2, or natural log. Steve might be able to help here. Thanks! Evan On Nov 9, 2012, at 6:00 AM, bioconductor-request@r-project.org wrote: > ------------------------------ > > Message: 31 > Date: Thu, 8 Nov 2012 13:27:24 -0800 (PST) > From: "Gadi Miron [guest]" <guest@bioconductor.org> > To: bioconductor@r-project.org, gadi.miron@gmail.com > Cc: "SCAN.UPC Maintainer" <stephen.piccolo@hsc.utah.edu> > Subject: [BioC] SCAN UPC probability - validation of gene expression > results using UPC > Message-ID: <20121108212724.0EB24142BCB@mamba.fhcrc.org> > > > Hello all, > > I currently use SCAN to normalize Affimetrix HGU133A2 arrays, and COMBAT for treatment of batch effect in normalized arrays. I would like to implement the new UPC option - I have questions regarding : > > 1. From which probability is it safe to assume that feature represents true signal ? Should I cut off at 0.5, or can I only use features marked as 1? Furthermore, should all features with a low probability of being true signal be filtered out of downstream analysis, and should this filter be applied before or after using COMBAT for batch effect correction ? > > 2. The expression value that is the output of SCAN - these are normalized log2 values? > > Thank you in advance! > > Gadi Miron > gadi.miron@gmail.com > > -- output of sessionInfo(): > > normalized1 = UPC(celFilePath, "output_file1.txt") > > -- > Sent via the guest posting facility at bioconductor.org. [[alternative HTML version deleted]]
hgu133a2 safe SCAN.UPC hgu133a2 safe SCAN.UPC • 1.4k views
ADD COMMENT
0
Entering edit mode
@stephen-piccolo-6761
Last seen 3.7 years ago
United States
Hi Gadi, The SCAN values are log2-transformed and then centered around zero. -Steve From: "W. Evan Johnson" <wej@bu.edu<mailto:wej@bu.edu>> Date: Friday, November 9, 2012 Fri, Nov 9, 2011 6:29 AM To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Cc: Gadi Miron <gadi.miron@gmail.com<mailto:gadi.miron@gmail.com>>, Stephen Piccolo <stephen.piccolo@hsc.utah.edu<mailto:stephen.piccolo@hsc.utah.edu>> Subject: Re: [BioC] SCAN UPC probability - validation of gene expression results using UPC Hey Gadi, Thanks for your post. To answer your questions: #1: The UPC probability represents the "probability the gene is expressed at a level above the background." So it really depends on how confident you want to be. If being 50% confident that the gene is active in the sample (i.e. the gene is most likely expressed) is good enough confidence for you, then 50% is fine. If you are okay with "the gene might be active" then you can use 0.2 or 0.3. If you want to be nearly certain it is active (i.e. the gene is almost certainly expressed), then use something like 0.9, 0.95, 0.99 or 1.0. Its hard to give you more details unless I knew exactly what you were trying to do. Also with regard to filtering genes with low probability: If the gene is inactive in all your samples, then yes, its a good idea to filter it before downstream analysis and ComBat. However, suppose the gene was inactive in the controls, but activated by the treatment--in this case you don't want to filter it! So it might be safe to say that filtering by UPC is good as long as most (or all) of your samples have a low UPC for the gene/probe, and the UPC is NOT correlated with your outcome variables. #2: Yes the output is logged, I don't remember if they are log2, or natural log. Steve might be able to help here. Thanks! Evan On Nov 9, 2012, at 6:00 AM, bioconductor-request@r-project.org<mailto :bioconductor-request@r-project.org=""> wrote: ------------------------------ Message: 31 Date: Thu, 8 Nov 2012 13:27:24 -0800 (PST) From: "Gadi Miron [guest]" <guest@bioconductor.org<mailto:guest@bioconductor.org>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>, gadi.miron@gmail.com<mailto:gadi.miron@gmail.com> Cc: "SCAN.UPC Maintainer" <stephen.piccolo@hsc.utah.edu<mailto:stephen.piccolo@hsc.utah.edu>> Subject: [BioC] SCAN UPC probability - validation of gene expression resultsusing UPC Message-ID: <20121108212724.0EB24142BCB@mamba.fhcrc.org<mailto:2012110 8212724.0eb24142bcb@mamba.fhcrc.org="">> Hello all, I currently use SCAN to normalize Affimetrix HGU133A2 arrays, and COMBAT for treatment of batch effect in normalized arrays. I would like to implement the new UPC option - I have questions regarding : 1. From which probability is it safe to assume that feature represents true signal ? Should I cut off at 0.5, or can I only use features marked as 1? Furthermore, should all features with a low probability of being true signal be filtered out of downstream analysis, and should this filter be applied before or after using COMBAT for batch effect correction ? 2. The expression value that is the output of SCAN - these are normalized log2 values? Thank you in advance! Gadi Miron gadi.miron@gmail.com<mailto:gadi.miron@gmail.com> -- output of sessionInfo(): normalized1 = UPC(celFilePath, "output_file1.txt") -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org=""/>. [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6