Search
Question: using SC3 and scater for bulk-RNAseq
1
gravatar for Assa Yeroslaviz
8 months ago by
Assa Yeroslaviz1.3k
Munich, Germany
Assa Yeroslaviz1.3k wrote:

Hello

I was wondering whether it makes sense (or even possible) to use the SC3-scater packages to analyse bulk-RNA. We have a data set from human cancer with ~350 samples. In those I have three risk groups and multiple abnormalities within, which I would like to use like with scRNA as cell types and sub-types. 

As I don't have Spike-Ins in the dataset, I was thinking either to use the normalised values from a previous DESeq2 analysis i did or normalise with the size factors by calculating the geometric mean. 

I would appreciate any Ideas

thanks

Assa

ADD COMMENTlink modified 8 months ago by Vladimir Kiselev120 • written 8 months ago by Assa Yeroslaviz1.3k
1
gravatar for davis
8 months ago by
davis90
United Kingdom
davis90 wrote:
Hi Assa It is certainly possible and makes sense to use scater for bulk RNA-seq data. I expect that SC3 would also work well on bulk RNA-seq data. The authors of that package could offer more insight, but a priori I think it would perform well in this setting as well. Normalised values (if they are on the log scale) from a previous DESeq2 should be fine as expression values for input to scater and SC3. However, if you were to start with raw count data, then I would construct an SCESet object in scater from the count matrix and then normalise with size-factor methods designed for bulk RNA-seq data. In the "normaliseExprs" function in scater you can apply TMM normalisation (from edgeR) or the DESeq size-factor normalisation approach to obtain log2-scale normalised expression values that would provide appropriate input for SC3. The TMM and DESeq size factor methods designed for bulk RNA-seq should be a little better than using the geometric mean, though the difference might be small if your libraries are similar. Best Davis On 10/03/2017 12:27, Assa Yeroslaviz [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User Assa Yeroslaviz <https: support.bioconductor.org="" u="" 1597=""/> wrote > Question: using SC3 and scater for bulk-RNAseq > <https: support.bioconductor.org="" p="" 93681=""/>: > > Hello > > I was wondering whether it makes sense (or even possible) to use the > SC3-scater packages to analyse bulk-RNA. We have a data set from human > cancer with ~350 samples. In those I have three risk groups and > multiple abnormalities within, which I would like to use like with > scRNA as cell types and sub-types. > > As I don't have Spike-Ins in the dataset, I was thinking either to use > the normalised values from a previous DESeq2 analysis i did or > normalise with the size factors by calculating the geometric mean. > > I would appreciate any Ideas > > thanks > > Assa > > ------------------------------------------------------------------------ > > Post tags: sc3, scater, scRNAseq > > You may reply via email or visit > using SC3 and scater for bulk-RNAseq > -- Davis McCarthy NHMRC Early Career Fellow Stegle Group EMBL-EBI, Cambridge, UK www.ebi.ac.uk
ADD COMMENTlink written 8 months ago by davis90
1
gravatar for Aaron Lun
8 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

Is it possible? Yes. Counts are counts are counts, and scater will process them regardless of their origin.

Is it sensible? Well, I guess so, most clustering algorithms don't care where the counts came from. However, a lot of the subtleties with scater (e.g., QC on single-cells, normalization) are not relevant for bulk data. You might as well just run cpm with log=TRUE (from edgeR) and feed that directly into the clustering algorithms.

ADD COMMENTlink written 8 months ago by Aaron Lun17k
1
gravatar for Vladimir Kiselev
8 months ago by
Sanger Institute, Cambridge, UK
Vladimir Kiselev120 wrote:

Hi Assa,

Regarding scater Davis and Aaron extensively replied above.

Regarding SC3 I will second Davis - using SC3 in general should be OK. But keep in mind that we optimised the range of the eigenvectors used for clustering (4%-7% of N, where N is the number of cells, see paper for details - http://biorxiv.org/content/early/2016/09/02/036558 ) specifically for scRNA-seq data. For the bulk data this range may not be optimal anymore, but still OK, because we cut all the noisy eigenvectors anyway.

You can change the range of eigenvectors by using d_region_min and d_region_max parameters.

Cheers,

Vladimir

ADD COMMENTlink modified 8 months ago • written 8 months ago by Vladimir Kiselev120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 167 users visited in the last hour