How to identify real cells in 10X RNA-seq ?
1
2
Entering edit mode
xingxd16 ▴ 20
@xingxd16-20156
Last seen 5.5 years ago

Dear All :

  • I use 10X to do the single cell RNA-seq . While I found that the algotithm in cellranger to call cells is not fit to my samples. And I compare cellranger 2.0 and cellranger 3.0 for the same sample and found the cells number are quite different. The expect cells for me is 6000 cells, cellranger 2.0 think there are 3442 cells , while cellranger 3.0 think there are 10275 cells. I think the version 2.0 are too less while version 3.0 too more, so I want to use DropletUtils Package to identify real cells compare to cell fragments background. I found two functions to do this things, "barcodeRanks" and "emptyDrops" .

  • For "barcodeRanks" , the cells above knee.point are 2082 cells, the cells above inflection.point are 4085 cells. While the "emptyDrops" function give me 8605 cells. In this case, I think the inflection.point is more reasonable.

  • My question is

  • (1) Can I only depend on the total UMIs threshold which above inflection.point to call it as real cells ?

  • (2) Why the methods are vary so much for the cell numbers ?

  • (3) I find that "emptyDrops" is similiar to cellranger 3.0 that it tend to give more cells in most cases. So which case should I choose to use "emptyDrops" ?

  • (4) Whats different between knee.point and inflection.point , does these two points are both OK to call cells ? I find in most case, the knee.point is little, the inflection.point is better !

  • (5) Now I have 8 samples, I want to find a uniformed method to call cells , but the 6 samples I think the infection.point are reasonable and suitable. And there is one sample I think the knee.point is better, while there is another sample that I think the "emptyDrops" is better. How can I do in this situation ? Can I use different methods in DropletUtils package to call cells depend on my samples of specific situation? Is that be allowed and admitted ?

  • Best
single-cell 10X Cell calling DropletUtils cellranger • 7.5k views
ADD COMMENT
0
Entering edit mode

Suggest putting some paragraph breaks in your post.

ADD REPLY
0
Entering edit mode

Sorry , for my first time to use it. I am not familer with the format !

ADD REPLY
0
Entering edit mode

Or you can just edit your original post, you know.

ADD REPLY
0
Entering edit mode

BTW, the "DropletUtils" give a very user friendly tutorial here enter link description here

ADD REPLY
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 5 hours ago
The city by the bay

cellranger 2.0 think there are 3442 cells , while cellranger 3.0 think there are 10275 cells.

This is a result of a change to the cell calling method in version 3, see the "Calling cell barcodes" section here.

I think the version 2.0 are too less while version 3.0 too more

Why? Based on your expectation of 6000 cells? Expectations are not always correct, especially when you're dealing with a complex population that is difficult to quantify.

In this case, I think the inflection.point is more reasonable.

I see no justification for this position. It seems like you're just trying to force the analysis to give you the number of cells that you want, and you're ignoring what the data is actually showing you.

(1) Can I only depend on the total UMIs threshold which above inflection.point to call it as real cells ?

No, this is less sensitive and probably less specific than emptyDrops. I don't know why you would do this.

(2) Why the methods are vary so much for the cell numbers ?

Because they operate on different principles. I suggest you read the emptyDrops paper. In particular, calling cells based on total UMI counts can result in entire cell type populations being discarded.

(3) I find that "emptyDrops" is similiar to cellranger 3.0 that it tend to give more cells in most cases. So which case should I choose to use "emptyDrops" ?

That's because the cell calling in CellRanger version 3.0 is based on thet emptyDrops algorithm (see the link above). So, obviously, they're going to be similar. They are not exactly the same because (i) of various technical reasons related to the implementation, and (ii) they use a different UMI count threshold for defining high-confidence cells. I don't necessarily agree with their changes in (ii), but whatever, it's not my code or problem.

(4) Whats different between knee.point and inflection.point , does these two points are both OK to call cells ? I find in most case, the knee.point is little, the inflection.point is better !

No.

Can I use different methods in DropletUtils package to call cells depend on my samples of specific situation?

Just use emptyDrops for all samples and stop worrying about it.

ADD COMMENT
0
Entering edit mode

Thanks for reply first !

  • Exactly that , the "emptyDrops" is more resonable in theory. But in one of my sample. The cells I put into 10X machine are about 12000 cells, the capture efficiency is about 57% acoording to 10X offical documents, so I expect a cells of 6000-7000. But emptyDrops or cellranger 3.0 give me about 15000 cells . How can it be ture ? I think its impossible that cells is even more than what I put ! If I use emptyDrops for all samples , the cells num reflect that the capture efficiency for my sample is all near 100%.
  • And I find that the cells that called by emptyDrops rather than inflection.points are all high express of Mitochondrial genes , HBB genes and IGHG plasma cells genes. I may not interest in these cells......
  • Why the knee.point and inflection.point is not Ok ? In cellranger 2.0 , it just use a hard threshold by 99% quantily divided by 10 to identify real cells.
ADD REPLY
1
Entering edit mode

How can it be ture ? I think its impossible that cells is even more than what I put !

The most obvious explanation is that the sample contains more cells than you think. I'm not sure how you're counting your cells, but viability markers may underestimate the cell count if you have damaged cells. Now, you might say that you don't care about damaged cells, but in some populations (e.g., neurons) the majority of cells are damaged after dissociation, so ignoring them would be throwing out the baby with the bathwater.

If you are absolutely confident in your cell quantification, then the discrepancy warrants further investigation, rather than sweeping it under the carpet and pretending that 10X gave you the expected number of cells.

And I find that the cells that called by emptyDrops rather than inflection.points are all high express of Mitochondrial genes , HBB genes and IGHG plasma cells genes. I may not interest in these cells......

But they are still cells. Whether or not you are interested in them is irrelevant to the cell calling. If you don't want them, it is better to explicitly filter them out (e.g., based on a threshold on the mitochondrial proportion) rather than pretending that they didn't exist in the first place.

And besides, some of these sound interesting. High expression of hemoglobins correspond to erythrocytes or their precursors, while high IGHG corresponds to mature B cells. Are you sure you want to throw these out? I would only filter on the mitochondrial content to remove damaged cells, as discussed in the emptyDrops paper and here.

Why the knee.point and inflection.point is not Ok ? In cellranger 2.0 , it just use a hard threshold by 99% quantily divided by 10 to identify real cells.

The total count provides little information on whether a given droplet contains a cell. When you're talking about the knee and inflection point, you're actually looking at the shape of the distribution of total counts - in particular, the separation between the high count libraries that correspond to cells, and the low count libraries that correspond to empty droplets. This separation can be poor in complex populations with many cell types of differing RNA content, which makes it impossible to choose a threshold that retains all cell types while also removing empty droplets. The knee and inflection points represent the undesirable compromises that have to be made if you only use the total count for cell calling - the former is too conservative, while the latter often pulls in too many empty droplets.

As for CellRanger version 2 - if you can justify those parameters, then you deserve a reward. But you probably can't, and neither could the 10X people, so that's why they switched to emptyDrops.

ADD REPLY
0
Entering edit mode

Wow, now I am clearly know after read your paper and new tutorial you supply. Thanks a lot Aaron.

  • Now I know that cell calling and quality control are totally different steps, I always want to get the cells obey my expection only after cell calling, yes I agree with you, you are right !
  • I noticed that in the new tutorail , you change the default FDR threshold from 1% to 0.1% , and I find its only effect the cells called by "Limitted". And I wil use 0.1% in my research too, I think its more resonable !
  • BTW, I read some paper that processed by the hard threshold like kenn point or RNA contend among 10 fold , now I think that in many case they miss so much not only cell numbers but also cell types. So the "emptyDrops" method deserved recommand to more users.

Best

ADD REPLY
0
Entering edit mode

I found that the DropletUtils has been developed from 1.2.2 to 1.3.10 so quickly , whats the new feature in the new version. I found that the barcodeRanks function results can not get only by "$", its has a attribute called "metadata". What else new ? I have done my sample by using version 1.2.2 already , does me have to update the packages ? Does the results of emptyDrops can be reproducted ?

Thanks

ADD REPLY
0
Entering edit mode

If you want to ask a new question, use the "Ask Question" button at the top.

ADD REPLY
0
Entering edit mode

Hi , I posted a new question about limma here . Could you help me to answer and give some adives !

Best Regards

ADD REPLY
0
Entering edit mode

That question has nothing to do with this question. Don't add irrelevant comments.

ADD REPLY

Login before adding your answer.

Traffic: 763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6