Entering edit mode
Hey Gadi,
Thanks for your post. To answer your questions:
#1: The UPC probability represents the "probability the gene is
expressed at a level above the background." So it really depends on
how confident you want to be. If being 50% confident that the gene is
active in the sample (i.e. the gene is most likely expressed) is good
enough confidence for you, then 50% is fine. If you are okay with "the
gene might be active" then you can use 0.2 or 0.3. If you want to be
nearly certain it is active (i.e. the gene is almost certainly
expressed), then use something like 0.9, 0.95, 0.99 or 1.0. Its hard
to give you more details unless I knew exactly what you were trying to
do.
Also with regard to filtering genes with low probability: If the gene
is inactive in all your samples, then yes, its a good idea to filter
it before downstream analysis and ComBat. However, suppose the gene
was inactive in the controls, but activated by the treatment--in this
case you don't want to filter it! So it might be safe to say that
filtering by UPC is good as long as most (or all) of your samples have
a low UPC for the gene/probe, and the UPC is NOT correlated with your
outcome variables.
#2: Yes the output is logged, I don't remember if they are log2, or
natural log. Steve might be able to help here.
Thanks!
Evan
On Nov 9, 2012, at 6:00 AM, bioconductor-request@r-project.org wrote:
> ------------------------------
>
> Message: 31
> Date: Thu, 8 Nov 2012 13:27:24 -0800 (PST)
> From: "Gadi Miron [guest]" <guest@bioconductor.org>
> To: bioconductor@r-project.org, gadi.miron@gmail.com
> Cc: "SCAN.UPC Maintainer" <stephen.piccolo@hsc.utah.edu>
> Subject: [BioC] SCAN UPC probability - validation of gene expression
> results using UPC
> Message-ID: <20121108212724.0EB24142BCB@mamba.fhcrc.org>
>
>
> Hello all,
>
> I currently use SCAN to normalize Affimetrix HGU133A2 arrays, and
COMBAT for treatment of batch effect in normalized arrays. I would
like to implement the new UPC option - I have questions regarding :
>
> 1. From which probability is it safe to assume that feature
represents true signal ? Should I cut off at 0.5, or can I only use
features marked as 1? Furthermore, should all features with a low
probability of being true signal be filtered out of downstream
analysis, and should this filter be applied before or after using
COMBAT for batch effect correction ?
>
> 2. The expression value that is the output of SCAN - these are
normalized log2 values?
>
> Thank you in advance!
>
> Gadi Miron
> gadi.miron@gmail.com
>
> -- output of sessionInfo():
>
> normalized1 = UPC(celFilePath, "output_file1.txt")
>
> --
> Sent via the guest posting facility at bioconductor.org.
[[alternative HTML version deleted]]