How to interpret plotBCV plots in edgeR
2
0
Entering edit mode
davide.bau • 0
@davidebau-12368
Last seen 4.4 years ago

Hi all,

I just started to use edgeR and I have a question related to the plotBCV funtion.

When plotting the average low CPM vs the BCV (plotBCV), I get a plot where several patterns can be recognized (see attached image). I was wondering if anyone have ever seen such a behaviour and how it could be explained/interpreted.

Best regards,

Davide

rnaseq edger • 5.5k views
0
Entering edit mode

Here is the plotMDS image:

1
Entering edit mode

The MDS plots shows very big differences between the samples. Even samples 2 and 3 are separated by a distance of about 4, which corresponds to a leading fold change of about 16-fold. So no two of the samples are close together. It's impossible to say more without knowing which samples are replicates and which correspond to different treatments.

0
Entering edit mode

Each sample contains around 20-30 pluripotent cells.

0
Entering edit mode

For each sample, what proportion of the counts are zeros? How many genes do you detect (with a positive count) in each sample?

0
Entering edit mode
The samples correspond to four biological replicas. The amount of zeros for each sample are:

Sample Zeros
1      73%
2      69%
3      63%
4      57% 
The number of genes with a positive count per sample is:

Sample  Pos_Counts
1       10386
2       11685
3       14156
4       16219 

0
Entering edit mode

Well, the RNA-seq libraries seem somewhat like single cell libraries, although not as extreme. Perhaps the samples are just very variable.

Note that normalization is always an issue for single cell RNA-seq, and may be for your data as well. Default use of calcNormFactors() in edgeR might not be appropriate.

0
Entering edit mode

Thanks a lot Gordon!

I will try another normalization method.

1
Entering edit mode
@ryan-c-thompson-5618
Last seen 13 months ago
Scripps Research, La Jolla, CA

That is indeed a highly suspect BCV plot, and I would probably not trust the results of a differential expression test on this data until you figure out what is causing this. Can you provide the code, design matrix, and/or data that you used to generate this plot? Also, can you please show the MDS plot of the data (see ?plotMDS)?

0
Entering edit mode

Hi Ryan,

The samples consist of few cells, i.e. we are performing bulk RNA-seq analysis with a small sample size. Do you think this might be the cause of the high variability?

Davide

0
Entering edit mode

Not quite sure what you mean by "few cells". Do you mean that it's almost like single cell RNA-seq, with just a few cells contributing RNA for each sample?

What proportion of the counts are zeros?

1
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia

There's a serious problem either with your data or with the model you have fitted to it. The plots shows BCV values that are enormous, half of them over 200%. There are also subgroups of genes. There appears to be very strong systematic lack of fit in the data, including some special effects associated with particular subsets of genes.

I don't know anything about your data, so can't diagnose, but I would do some basic quality checking right back to an early stage. We do RNA-seq analyses all the time with small numbers of samples, so small sample size isn't the explanation.