Question: sc-RNA data analysis using scater
16 months ago by
hrishi27n0 wrote:

Hello All,

I am trying to analyze data for a single cell RNA sequencing experiment, for QC and normalization I am considering using the scater package. There are a few things I would like to know before starting analyzing this dataset. All your help and suggestions are much appreciated. This is my first attempt to analyze sc-RNA, I apologize in advance if my questions are confusing. Questions: 1) The sequencing lab is using an unpublished protocol, they have provided read counts and spike-ins file separately. Do I need to combine these two files? I am considering this, for the "feature_controls" option for calculateQCMetrics method.

2) After doing the initial QC, I see that total counts for all my wells(cells) is almost >50k+ and the number of genes detected is above 10k. I am removing genes that have 0 expression, I am also filtering genes with very low average nonzero expression across all cells(using a mean of counts across all cells). Do I need to do any other filtering for both cells and genes?

1
16 months ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

For your first question; does the "read counts" file already contain counts for the spike-in transcripts? If yes, then that's all you need to make a SCESet object in scater; just supply the count matrix as countData= in the constructor. Otherwise, you'll first have to rbind the matrix of gene counts with that of the spike-in counts. Note that you don't need to know the concentrations of the spike-ins to use most scater functions.

As for your second question, have a look at https://www.bioconductor.org/help/workflows/simpleSingleCell/.

Aaron,

Thanks for the reply. My "read count" file does't include the ERCC's. A grep for "^ERCC-" doesn't really give anything, however the ERCC's are provided in a separate file. I guess I might have to do a rbind on the gene count file to include the spike-in data.

