How to interpret plotBCV plots in edgeR
2
0
Entering edit mode
davide.bau • 0
@davidebau-12368
Last seen 4.4 years ago

Hi all,

I just started to use edgeR and I have a question related to the plotBCV funtion.

When plotting the average low CPM vs the BCV (plotBCV), I get a plot where several patterns can be recognized (see attached image). I was wondering if anyone have ever seen such a behaviour and how it could be explained/interpreted.

Thank you in advance for your help.

Best regards,

Davide

 


 

 

rnaseq edger • 5.5k views
ADD COMMENT
0
Entering edit mode

Here is the plotMDS image:

ADD REPLY
1
Entering edit mode

The MDS plots shows very big differences between the samples. Even samples 2 and 3 are separated by a distance of about 4, which corresponds to a leading fold change of about 16-fold. So no two of the samples are close together. It's impossible to say more without knowing which samples are replicates and which correspond to different treatments.

ADD REPLY
0
Entering edit mode

Each sample contains around 20-30 pluripotent cells.

ADD REPLY
0
Entering edit mode

For each sample, what proportion of the counts are zeros? How many genes do you detect (with a positive count) in each sample?

ADD REPLY
0
Entering edit mode
The samples correspond to four biological replicas. The amount of zeros for each sample are:

Sample Zeros
1      73%
2      69% 
3      63% 
4      57% 
The number of genes with a positive count per sample is:

Sample  Pos_Counts
1       10386 
2       11685 
3       14156 
4       16219 

 

ADD REPLY
0
Entering edit mode

Well, the RNA-seq libraries seem somewhat like single cell libraries, although not as extreme. Perhaps the samples are just very variable.

Note that normalization is always an issue for single cell RNA-seq, and may be for your data as well. Default use of calcNormFactors() in edgeR might not be appropriate.

ADD REPLY
0
Entering edit mode

Thanks a lot Gordon!

I will try another normalization method.

ADD REPLY
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 13 months ago
Scripps Research, La Jolla, CA

That is indeed a highly suspect BCV plot, and I would probably not trust the results of a differential expression test on this data until you figure out what is causing this. Can you provide the code, design matrix, and/or data that you used to generate this plot? Also, can you please show the MDS plot of the data (see ?plotMDS)?

See also: edgeR: plotBCV, gof() and plotMDS, for outlier detection

ADD COMMENT
0
Entering edit mode

Hi Ryan,

Thanks for your reply. 

The samples consist of few cells, i.e. we are performing bulk RNA-seq analysis with a small sample size. Do you think this might be the cause of the high variability?

Davide 

 

 

ADD REPLY
0
Entering edit mode

Not quite sure what you mean by "few cells". Do you mean that it's almost like single cell RNA-seq, with just a few cells contributing RNA for each sample?

What proportion of the counts are zeros?

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia

There's a serious problem either with your data or with the model you have fitted to it. The plots shows BCV values that are enormous, half of them over 200%. There are also subgroups of genes. There appears to be very strong systematic lack of fit in the data, including some special effects associated with particular subsets of genes.

I don't know anything about your data, so can't diagnose, but I would do some basic quality checking right back to an early stage. We do RNA-seq analyses all the time with small numbers of samples, so small sample size isn't the explanation.

ADD COMMENT

Login before adding your answer.

Traffic: 351 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6