Search
Question: newSCESet and SingleCellExperiment basic differences and TMM Normalization
0
gravatar for hamza_karakurt
7 days ago by
hamza_karakurt0 wrote:

Hello everyone,

I am trying to replicate a single cell RNA-Seq data analysis. Actually, the main problem is version. The old results were generated via newSCESet and normalized with normaliseExprs command (TMM method) but for the newest version of R, Bioconductor and packages so I used SingleCellExperiment command.

But I still do not understand what are the differences between the outputs of newSCESet and SingleCellExperiment. I have looked but could not find any information. Mostly pData and fData converted to colData and rowData

They look the same but in the end, but the results look different.

I also want to do TMM normalisation but normalizeExprs command is also deprecated. Is there another way to do TMM normalization?

 

Thank you.

ADD COMMENTlink modified 6 days ago • written 7 days ago by hamza_karakurt0
1
gravatar for Aaron Lun
7 days ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

There are a number of aspects of your post that need addressing, so let's do it one at a time.

The first is the switch from SCESet to SingleCellExperiment. This happened a while ago, motivated by the superiority of the SummarizedExperiment class as a general data container in terms of stability, flexibility and usability. From a user perspective, this simply involves changing the constructor call (from newSCESet() to SingleCellExperiment()), and the various accessors (e.g., fData() to rowData(), pData() to colData()). Not particularly hard, and it also allows you to interface with any SummarizedExperiment-compatible packages, e.g., iSEE, DESeq2.

As for TMM normalization - we've known for a while that this was a poor choice of normalization method for single-cell RNA-seq data with lots of zeroes, see https://doi.org/10.1186/s13059-016-0947-7 for a study of this. (Similar criticisms apply to DESeq's default normalization.) Thus, we no longer recommend using TMM normalization and have removed all functions that do so. I would suggest using alternatives like scran:::computeSumFactors(), see the simpleSingleCell workflow to see how it's done. That said, if you insist on using TMM, you can simply call edgeR::calcNormFactors directly on your count matrix and multiply the result by the library sizes to get the "TMM size factors". The multiplication is important as calcNormFactors alone will only yield the normalization factors, these need to be scaled by the library sizes to obtain the size factors (yes, there's a difference between these two terms!).

The situation of normalizeExprs is a bit more complicated because it tries to do three things at once - TMM normalization, log-transformation and batch correction. I didn't write this function, but I hated it. It doesn't have a single purpose, it's just cobbled together from three separate functions that might as well be called separately. Separate calls would require a bit more writing, but at least the user (and reader of the code) understands what is happening. A reader seeing a call to normalizeExprs() would find it hard to figure out the function does. If we had to use a single function, it should instead be called:

calcTMMFactorsAndNormalizeAndRemoveBatchEffects

... which we can all agree is a stupid name. I deprecated normalizeExprs() because it was better for users to be explicit about what they wanted to do and call the relevant functions directly.

ADD COMMENTlink modified 7 days ago • written 7 days ago by Aaron Lun21k

Hey Aaron,

Thank you for your answer. I will use calcNormFactors in scater. After this line, I need to multiply my SumFactors with my counts to normalize right?

Or after computeSumFactors, directly normalize(sce) command does not do the job? I am working on unique barcoded single cell RNA-Seq.

 

Thanks again.

ADD REPLYlink written 6 days ago by hamza_karakurt0

For your first question: get your terminology right, otherwise this discussion will be very confusing. calcNormFactors is from edgeR. It returns TMM normalization factors, one per cell. This needs to be multiplied by the library size for each cell to obtain the size factor. You can then save the size factors into the SingleCellExperiment object with sizeFactors(sce) <- tmm.size.factors, and run normalize to compute log-transformed normalized expression values.

For your second question, I'm not sure what you're actually asking. Running computeSumFactors will compute the size factors and store them in the SingleCellExperiment object (assuming that the input was also an SCE object). Running normalize will then compute log-transformed normalized expression values.

ADD REPLYlink written 6 days ago by Aaron Lun21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 442 users visited in the last hour