Entering edit mode
hi Dario,
I CC the Bioconductor mailing list,
On Wed, May 7, 2014 at 11:00 PM, Dario Strbenac
<dstr7320 at="" uni.sydney.edu.au=""> wrote:
> Hello,
>
> As section 5.3 of the vignette explains, the transformed data can be
used for applications like clustering of samples. I was considering
the best way to use it instead for clustering genes of a time-series
experiment. I would have to account for gene length to make different
genes comparable. This could be done after the transformation, by
dividing by appropriate constants.
I would divide before transformation or subtract after transformation
as log2(x * k) = log2(x) + log2(k), where x is some row-wise constant.
Both DESeq2 transformations are log2-like.
But I would also suggest you might want to center the genes before
clustering:
mat <- assay(rld)
matcenter <- sweep( mat, 1, rowMeans(mat), "-" )
Now each gene should have mean 0. This makes sense if you are
interested in clustering genes which have the same trend, but maybe
different expression strength ("up, down, up", etc.).
> Also, the counts used are probabilistically assigned counts to
transcripts by RSEM. Are you aware of any previous studies which use
the transformed data for such an analysis ?
Not off the top of my head.
Mike
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia