Question

size factor not 1

0

Entering edit mode

mankadeep2 ▴ 40

@mankadeep2-23453

Last seen 4.5 years ago

Hi All, I calculated size factors for RNAseq data using the estimatesizefactor(). They were: 1a 1c 2b 2c 1b 2c 3c 1.7260367 1.8360566 0.7666819 0.7158603 1.3751402 0.7572188 0.5521436

As you see, there is a lot of variation between the samples implying outliers. What should I do in such a case, should I proceed with the DE analysis, or is there a way I can make the data look better.

Thanks in advance

deseq2 • 630 views

ADD COMMENT • link updated 4.5 years ago by swbarnes2 ★ 1.4k • written 4.5 years ago by mankadeep2 ▴ 40

score 1 · Answer 1 · 2020-06-09

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

This just means the samples are not sequenced to the same depth -- this is a normal part of DESeq2 analysis. Ideally, the variation in sequencing depth should not be correlated with the covariate of interest, that's referred to as technical confounding (see another recent post). Ideally means you should try to avoid this when you are sequencing the samples.

ADD COMMENT • link 4.5 years ago Michael Love 43k

0

Entering edit mode

So, if I inferred correctly, I can proceed with the data. However, for future sequencing experiments, I should try to avoid confounding factors. Just a follow-up question: Would doing sva help?

ADD REPLY • link 4.5 years ago mankadeep2 ▴ 40

0

Entering edit mode

I didn’t say there was confounding, but if there was confounding.

ADD REPLY • link 4.5 years ago Michael Love 43k

score 1 · Answer 2 · 2020-06-09

The people loading the samples on the instrument should already be aiming to get equal numbers of reads from each sample. But sometimes the nature of the samples is such that they can't do that great a job (if, say, one sample undergoes a treatment that is really toxic, it might really not have lots of RNA). So there might be uncontrollable confounding with library size.

You might be able to ask for reruns of the samples that have fewer reads, but a three-fold difference is okay, you can proceed with that. If a sample had a 10th of the average number of reads, that would be a problem.