Question

How to interpret plotBCV plots in edgeR

0

Entering edit mode

davide.bau • 0

@davidebau-12368

Last seen 6.9 years ago

Hi all,

I just started to use edgeR and I have a question related to the plotBCV funtion.

When plotting the average low CPM vs the BCV (plotBCV), I get a plot where several patterns can be recognized (see attached image). I was wondering if anyone have ever seen such a behaviour and how it could be explained/interpreted.

Thank you in advance for your help.

Best regards,

Davide

rnaseq edger • 10k views

ADD COMMENT • link 7.2 years ago davide.bau • 0

0

Entering edit mode

Here is the plotMDS image:

ADD REPLY • link 7.2 years ago davide.bau • 0

1

Entering edit mode

The MDS plots shows very big differences between the samples. Even samples 2 and 3 are separated by a distance of about 4, which corresponds to a leading fold change of about 16-fold. So no two of the samples are close together. It's impossible to say more without knowing which samples are replicates and which correspond to different treatments.

ADD REPLY • link 7.2 years ago Gordon Smyth 50k

0

Entering edit mode

Each sample contains around 20-30 pluripotent cells.

ADD REPLY • link 7.2 years ago davide.bau • 0

0

Entering edit mode

For each sample, what proportion of the counts are zeros? How many genes do you detect (with a positive count) in each sample?

ADD REPLY • link 7.2 years ago Gordon Smyth 50k

0

Entering edit mode

The samples correspond to four biological replicas. The amount of zeros for each sample are:

Sample Zeros
1      73%
2      69% 
3      63% 
4      57%

The number of genes with a positive count per sample is:

Sample  Pos_Counts
1       10386 
2       11685 
3       14156 
4       16219

ADD REPLY • link 7.2 years ago davide.bau • 0

0

Entering edit mode

Well, the RNA-seq libraries seem somewhat like single cell libraries, although not as extreme. Perhaps the samples are just very variable.

Note that normalization is always an issue for single cell RNA-seq, and may be for your data as well. Default use of calcNormFactors() in edgeR might not be appropriate.

ADD REPLY • link 7.2 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks a lot Gordon!

I will try another normalization method.

ADD REPLY • link 7.2 years ago davide.bau • 0

score 1 · Answer 1 · 2017-02-15

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

That is indeed a highly suspect BCV plot, and I would probably not trust the results of a differential expression test on this data until you figure out what is causing this. Can you provide the code, design matrix, and/or data that you used to generate this plot? Also, can you please show the MDS plot of the data (see ?plotMDS)?

See also: edgeR: plotBCV, gof() and plotMDS, for outlier detection

ADD COMMENT • link 7.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Ryan,

Thanks for your reply.

The samples consist of few cells, i.e. we are performing bulk RNA-seq analysis with a small sample size. Do you think this might be the cause of the high variability?

Davide

ADD REPLY • link 7.2 years ago davide.bau • 0

0

Entering edit mode

Not quite sure what you mean by "few cells". Do you mean that it's almost like single cell RNA-seq, with just a few cells contributing RNA for each sample?

What proportion of the counts are zeros?

ADD REPLY • link 7.2 years ago Gordon Smyth 50k

score 1 · Answer 2 · 2017-02-15

There's a serious problem either with your data or with the model you have fitted to it. The plots shows BCV values that are enormous, half of them over 200%. There are also subgroups of genes. There appears to be very strong systematic lack of fit in the data, including some special effects associated with particular subsets of genes.

I don't know anything about your data, so can't diagnose, but I would do some basic quality checking right back to an early stage. We do RNA-seq analyses all the time with small numbers of samples, so small sample size isn't the explanation.