using SC3 and scater for bulk-RNAseq
3
1
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 7 weeks ago
Germany

Hello

I was wondering whether it makes sense (or even possible) to use the SC3-scater packages to analyse bulk-RNA. We have a data set from human cancer with ~350 samples. In those I have three risk groups and multiple abnormalities within, which I would like to use like with scRNA as cell types and sub-types. 

As I don't have Spike-Ins in the dataset, I was thinking either to use the normalised values from a previous DESeq2 analysis i did or normalise with the size factors by calculating the geometric mean. 

I would appreciate any Ideas

thanks

Assa

sc3 scater scRNAseq • 2.1k views
ADD COMMENT
1
Entering edit mode
davis ▴ 90
@davis-8868
Last seen 7.2 years ago
United Kingdom
Hi Assa It is certainly possible and makes sense to use scater for bulk RNA-seq data. I expect that SC3 would also work well on bulk RNA-seq data. The authors of that package could offer more insight, but a priori I think it would perform well in this setting as well. Normalised values (if they are on the log scale) from a previous DESeq2 should be fine as expression values for input to scater and SC3. However, if you were to start with raw count data, then I would construct an SCESet object in scater from the count matrix and then normalise with size-factor methods designed for bulk RNA-seq data. In the "normaliseExprs" function in scater you can apply TMM normalisation (from edgeR) or the DESeq size-factor normalisation approach to obtain log2-scale normalised expression values that would provide appropriate input for SC3. The TMM and DESeq size factor methods designed for bulk RNA-seq should be a little better than using the geometric mean, though the difference might be small if your libraries are similar. Best Davis On 10/03/2017 12:27, Assa Yeroslaviz [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User Assa Yeroslaviz <https: support.bioconductor.org="" u="" 1597=""/> wrote > Question: using SC3 and scater for bulk-RNAseq > <https: support.bioconductor.org="" p="" 93681=""/>: > > Hello > > I was wondering whether it makes sense (or even possible) to use the > SC3-scater packages to analyse bulk-RNA. We have a data set from human > cancer with ~350 samples. In those I have three risk groups and > multiple abnormalities within, which I would like to use like with > scRNA as cell types and sub-types. > > As I don't have Spike-Ins in the dataset, I was thinking either to use > the normalised values from a previous DESeq2 analysis i did or > normalise with the size factors by calculating the geometric mean. > > I would appreciate any Ideas > > thanks > > Assa > > ------------------------------------------------------------------------ > > Post tags: sc3, scater, scRNAseq > > You may reply via email or visit > using SC3 and scater for bulk-RNAseq > -- Davis McCarthy NHMRC Early Career Fellow Stegle Group EMBL-EBI, Cambridge, UK www.ebi.ac.uk
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

Is it possible? Yes. Counts are counts are counts, and scater will process them regardless of their origin.

Is it sensible? Well, I guess so, most clustering algorithms don't care where the counts came from. However, a lot of the subtleties with scater (e.g., QC on single-cells, normalization) are not relevant for bulk data. You might as well just run cpm with log=TRUE (from edgeR) and feed that directly into the clustering algorithms.

ADD COMMENT
1
Entering edit mode
@vladimir-kiselev-9342
Last seen 5.7 years ago
Sanger Institute, Cambridge, UK

Hi Assa,

Regarding scater Davis and Aaron extensively replied above.

Regarding SC3 I will second Davis - using SC3 in general should be OK. But keep in mind that we optimised the range of the eigenvectors used for clustering (4%-7% of N, where N is the number of cells, see paper for details - http://biorxiv.org/content/early/2016/09/02/036558 ) specifically for scRNA-seq data. For the bulk data this range may not be optimal anymore, but still OK, because we cut all the noisy eigenvectors anyway.

You can change the range of eigenvectors by using d_region_min and d_region_max parameters.

Cheers,

Vladimir

ADD COMMENT

Login before adding your answer.

Traffic: 364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6