Question: assess quality of data
gravatar for bilcodygm
22 months ago by
bilcodygm0 wrote:

Dear all,

Please help me. I have RNAseq data, which I normalized with TMM and then applied the likelihood ratio test (edgeR).

when I look at the BCV plot I see this:

The red dots, are the significant genes in the dataset.

Is this OK? Are all the data points not too wide spread?

Thank you for your advice.

rnaseq edger limma-voom • 349 views
ADD COMMENTlink modified 22 months ago by Michael Love26k • written 22 months ago by bilcodygm0

Can you clarify what it is that is concerning you? What do you mean by "wide spread"? What did you expect the plot to look like?

Obviously the dispersion of your data is large, with the average BCV at nearly 100%. However you still have DE genes, so the separation between your groups must also be large.

We don't know anything else about your data. What do you want us to comment on?

How does this question relate to DESeq2 or limma (neither of which you have used)?

ADD REPLYlink modified 22 months ago • written 22 months ago by Gordon Smyth39k

Thank you for your reply!

I was a bit quick in adding limma and DESeq2 maybe. What I have done is a comparison of several normalization methods (TMM, upperquartile, quantile and DESeq2's default method) together with the tests (RLT, QLF in edgeR and voom-eBayes in limma) they all show this large variance/dispersion range and high BCV. so, I was just wondering how to look at this, simply as it is: highly variable data, probably due to the fact that it is human tissue? I still have DE genes indeed. The adjuste p-values vs the non-adjusted also do not indicate strange behaviour, except for the quantile-voom-eBayes method. there it seems, that the significant p-values cave in more drastically than with all the other methods, but since all the other methods show significant p + a plateau in non-significant p-values, I argue, that this is not so relevant. Further to this, the quantile-voom-eBayes method was also by far the most conservative method. While the other methods yielded 200-388 significant genes, this method only returned 50. Is this a known aspect of this method, I wondered?

On the whole I have a feelng that the dataset is simply highly variable, high dispersion. The differentially expressed genes will have to be validated somehow now. 

ADD REPLYlink written 22 months ago by bilcodygm0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 386 users visited in the last hour