Question

using SC3 and scater for bulk-RNAseq

1

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 3 months ago

Germany

Hello

I was wondering whether it makes sense (or even possible) to use the SC3-scater packages to analyse bulk-RNA. We have a data set from human cancer with ~350 samples. In those I have three risk groups and multiple abnormalities within, which I would like to use like with scRNA as cell types and sub-types.

As I don't have Spike-Ins in the dataset, I was thinking either to use the normalised values from a previous DESeq2 analysis i did or normalise with the size factors by calculating the geometric mean.

I would appreciate any Ideas

thanks

Assa

sc3 scater scRNAseq • 2.5k views

ADD COMMENT • link updated 8.8 years ago by Vladimir Kiselev ▴ 150 • written 8.8 years ago by Assa Yeroslaviz ★ 1.5k

score 1 · Answer 1 · 2017-03-10

Hi Assa It is certainly possible and makes sense to use scater for bulk RNA-seq data. I expect that SC3 would also work well on bulk RNA-seq data. The authors of that package could offer more insight, but a priori I think it would perform well in this setting as well. Normalised values (if they are on the log scale) from a previous DESeq2 should be fine as expression values for input to scater and SC3. However, if you were to start with raw count data, then I would construct an SCESet object in scater from the count matrix and then normalise with size-factor methods designed for bulk RNA-seq data. In the "normaliseExprs" function in scater you can apply TMM normalisation (from edgeR) or the DESeq size-factor normalisation approach to obtain log2-scale normalised expression values that would provide appropriate input for SC3. The TMM and DESeq size factor methods designed for bulk RNA-seq should be a little better than using the geometric mean, though the difference might be small if your libraries are similar. Best Davis On 10/03/2017 12:27, Assa Yeroslaviz [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User Assa Yeroslaviz <https: support.bioconductor.org="" u="" 1597=""/> wrote > Question: using SC3 and scater for bulk-RNAseq > <https: support.bioconductor.org="" p="" 93681=""/>: > > Hello > > I was wondering whether it makes sense (or even possible) to use the > SC3-scater packages to analyse bulk-RNA. We have a data set from human > cancer with ~350 samples. In those I have three risk groups and > multiple abnormalities within, which I would like to use like with > scRNA as cell types and sub-types. > > As I don't have Spike-Ins in the dataset, I was thinking either to use > the normalised values from a previous DESeq2 analysis i did or > normalise with the size factors by calculating the geometric mean. > > I would appreciate any Ideas > > thanks > > Assa > > ------------------------------------------------------------------------ > > Post tags: sc3, scater, scRNAseq > > You may reply via email or visit > using SC3 and scater for bulk-RNAseq > -- Davis McCarthy NHMRC Early Career Fellow Stegle Group EMBL-EBI, Cambridge, UK www.ebi.ac.uk

score 1 · Answer 2 · 2017-03-10

Is it possible? Yes. Counts are counts are counts, and scater will process them regardless of their origin.

Is it sensible? Well, I guess so, most clustering algorithms don't care where the counts came from. However, a lot of the subtleties with scater (e.g., QC on single-cells, normalization) are not relevant for bulk data. You might as well just run cpm with log=TRUE (from edgeR) and feed that directly into the clustering algorithms.

score 1 · Answer 3 · 2017-03-10

Hi Assa,

Regarding scater Davis and Aaron extensively replied above.

Regarding SC3 I will second Davis - using SC3 in general should be OK. But keep in mind that we optimised the range of the eigenvectors used for clustering (4%-7% of N, where N is the number of cells, see paper for details - http://biorxiv.org/content/early/2016/09/02/036558 ) specifically for scRNA-seq data. For the bulk data this range may not be optimal anymore, but still OK, because we cut all the noisy eigenvectors anyway.

You can change the range of eigenvectors by using d_region_min and d_region_max parameters.

Cheers,

Vladimir