Entering edit mode
Wolfgang Huber
★
13k
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…
Hi Paul
given your description, one possibility to explore might be a variance
stabilising transformation.
E.g. DESeq provides one that smoothly interpolates between the square-
root function for low counts and the log-transformation for higher
counts, see Section 6 (and 7) of the vignette.
Best wishes
Wolfgang
Il giorno Jan 31, 2013, alle ore 8:57 AM, Paul Harrison <paul.harrison at="" monash.edu=""> ha scritto:
> Hello,
>
> We have been using voom and limma for some time now, and while we're
> fairly happy with it, it seems to produce significance levels that
are
> on the conservative side. We also use edgeR to produce more
optimistic
> results, but don't entirely trust the significance levels that it
> reports. I am looking for something in-between these extremes, and
> want to run an idea past this list as a sanity check. I would
> especially value Gordon and Charity's comments if they have time.
>
> The voom log transformation is essentially:
>
> log2( (count+0.5) / library.size )
>
> It then does some clever things with weights. What I'm considering
instead is
>
> log2( count / library.size + moderation.amount / mean.library.size
)
>
> where moderation.amount is much larger then 0.5, say 5. A couple of
things here:
>
> - Instead of down-weighting low counts, I'm trying to get rid of the
> extra variation from low counts by artificially left censoring the
> data.
>
> - I'm using the mean of the libaray sizes because I want the left
> censor to be in the same place for each sample even if the library
> sizes are different, so that if a gene is entirely switched off in
one
> condition it won't look variable just because there is a different
> left censor in each sample.
>
> I'm also using this transformation to create heatmaps.
>
> This seems to be working with the data set I am working with, I get
> more significant results and they seem reasonable by eye. It seems
to
> me that even if this approach isn't ideal it should at least be
safe,
> at worst it will cause limma to reduce the df.prior and produce less
> significant results. Anything I've missed?
>
> --
> Paul Harrison
>
> Victorian Bioinformatics Consortium / Monash University
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor