DESeq2 Regularised Log for Clustering of Genes

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

hi Dario, I CC the Bioconductor mailing list, On Wed, May 7, 2014 at 11:00 PM, Dario Strbenac <dstr7320 at="" uni.sydney.edu.au=""> wrote: > Hello, > > As section 5.3 of the vignette explains, the transformed data can be used for applications like clustering of samples. I was considering the best way to use it instead for clustering genes of a time-series experiment. I would have to account for gene length to make different genes comparable. This could be done after the transformation, by dividing by appropriate constants. I would divide before transformation or subtract after transformation as log2(x * k) = log2(x) + log2(k), where x is some row-wise constant. Both DESeq2 transformations are log2-like. But I would also suggest you might want to center the genes before clustering: mat <- assay(rld) matcenter <- sweep( mat, 1, rowMeans(mat), "-" ) Now each gene should have mean 0. This makes sense if you are interested in clustering genes which have the same trend, but maybe different expression strength ("up, down, up", etc.). > Also, the counts used are probabilistically assigned counts to transcripts by RSEM. Are you aware of any previous studies which use the transformed data for such an analysis ? Not off the top of my head. Mike > > -------------------------------------- > Dario Strbenac > PhD Student > University of Sydney > Camperdown NSW 2050 > Australia

Clustering DESeq2 Clustering DESeq2 • 2.3k views

ADD COMMENT • link updated 11.8 years ago by Simon Anders ★ 3.8k • written 11.8 years ago by Michael Love 43k

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 5.5 years ago

Zentrum für Molekularbiologie, Universi…

Hi Dario On Wed, May 7, 2014 at 11:00 PM, Dario Strbenac wrote: >> As section 5.3 of the vignette explains, the transformed data can >> be used for applications like clustering of samples. I was >> considering the best way to use it instead for clustering genes of >> a time-series experiment. I would have to account for gene length >> to make different genes comparable. Actually, no. I don't think accounting for gene length is necessary. It depends on your distance metric: Do you want to consider two genes as similar (and hence would want them to cluster together) if they have similar absolute expression strength, or rather if they have a similar profile of _changes_ during the time course? I would expect that the latter is more helpful for analysing time- course data, and that you will hence get biologically more meaningful clusters if you normalize each gene's expression by its expression strength at time 0. At the natural scale, this means division by, and at the log scale, subtraction of the time-0 (or: control) value. In either case, gene length cancels out. This also means that, in case of a design with replicates or with factors besides time point, it might be preferable to not use DESeq2's rlog transform, but rather use DESeq2's normal wrokflow to estimate shrunken log fold changes for contrasts of all later time points against zero time and then perform clustering on these values. (Thinking about it, we should maybe consider adding a section in the vignette to demonstrate this approach.) Simon

ADD COMMENT • link 11.8 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Hello, Those are two good ideas for analysing profiles. We are aware that the majority of published clustering methods are designed for analysing profiles. However, biologists have told us that transcription factors usually appear at levels many times lower than the genes they cause the transcription of, leading us to explore clustering with absolute levels. > I would divide before transformation or subtract after transformation Only the subtraction after transformation would be possible, because dividing before would cause non-integer values, causing the creation of a DESeqDataSet to fail, wouldn't it ? -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia

ADD REPLY • link 11.8 years ago Dario Strbenac ★ 1.6k

0

Entering edit mode

On Sat, May 10, 2014 at 9:00 PM, Dario Strbenac <dstr7320 at="" uni.sydney.edu.au=""> wrote: > Hello, > > Those are two good ideas for analysing profiles. We are aware that the majority of published clustering methods are designed for analysing profiles. However, biologists have told us that transcription factors usually appear at levels many times lower than the genes they cause the transcription of, leading us to explore clustering with absolute levels. > >> I would divide before transformation or subtract after transformation > > Only the subtraction after transformation would be possible, because dividing before would cause non-integer values, causing the creation of a DESeqDataSet to fail, wouldn't it ? > Yes. you can subtract afterward. > -------------------------------------- > Dario Strbenac > PhD Student > University of Sydney > Camperdown NSW 2050 > Australia > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.8 years ago Michael Love 43k

Login before adding your answer.