Question

DESeq2 MA plot of encode data

1

Entering edit mode

tonja.r ▴ 80

@tonjar-7565

Last seen 7.5 years ago

United Kingdom

Hello,

I took encode data RNAseq from mouse G1E cell line and G1E treated with 24 hours of estradiol.

I had following data:
dataset A
G1E, 2 replicates, single end reads
G1E-treated, 2 replicates, single end reads

dataset B
G1E, 2 replicates,paired end reads
G1E-treated, 2 replicates, paired end reads

I did differential analysis within each dataset and for following MA plots. Number of DE genes in dataset A is 4038 and in dataset B is 2169. When I do an intersection of DE genes from both datasets I get 604 up-regulated and and 520 down-regulated and the genes correspond to the expected pathways. However, I am very concerned about the MA plots I get and if DESeq2 normalization is appropriate in this case.
dataset A:

with MLE:

PCA plot:

dataset B:

with MLE

and PCA plot

edit: I did the same analysis but with edgeR and this issue is not that striking.

dataset A:

dataset B:

I also plotted LFC of two datasets:
DESeq2:

edgeR:

Also, the number of disagreeing genes in DESeq2 and edgeR.

I am a bit concerned about such discordance by the two methods. Do you have any ideas about the reasons for that?

deseq2 normalization edger • 3.7k views

ADD COMMENT • link 8.3 years ago • updated 8.2 years ago tonja.r ▴ 80

score 1 · Accepted Answer · 2016-01-25

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 10 hours ago

United States

Can you make the plots with wider y axis and with MLE=TRUE (see help page for plotMA and addMLE argument of results). Also can you make PCA plots for these samples? It appears the treatment really had an extreme effect on these cell lines. How much estradiol was applied in each dataset?

ADD COMMENT • link 8.3 years ago Michael Love 41k

0

Entering edit mode

I edited my question by adding the plots you asked for. 24 hours differentiation of G1E-ER4 cells with 10 nM beta-estradiol.

ADD REPLY • link 8.3 years ago tonja.r ▴ 80

0

Entering edit mode

So to me this looks like the treatment had an extreme effect on the samples (see PC1 is nearly 100% of the variance separating treated from pre-treated).

To compare across dataset, I would plot log2FoldChange against log2FoldChange instead of p-values. The p-values depend on the power of the experiment, which depends on sample size (here only 2 vs 2) and on the technical and biological noise. But the effect size estimates should be relatively consistent, if both datasets are of high enough quality.

I don't see a problem with the normalization here (in the unshrunken plots, the y=0 axis goes through the middle of the cloud of points).

It's curious that the bulk of one dataset is up and the other down, are you sure you are comparing treated / pre-treated identically in both datasets?

ADD REPLY • link 8.2 years ago Michael Love 41k

0

Entering edit mode

Yes, I do compare them identically and I checked if I switched columns in one of the datasets and it is not the case. It is also a bit strange to me that the number of DE genes in dataset B is twice less then in dataset A.

I also checked the read lengths in dataset A and it turned out that in the cell line G1E (untreated) one replicate (g1e_1) was sequenced with 36bp read length and another (g1e_2) with 55bp read length. And one can see that in PCA plot. However, treated replicates were both sequenced with 36bp read lengths. Can be the reason for finding more DE genes than in dataset B?

ADD REPLY • link 8.2 years ago tonja.r ▴ 80

1

Entering edit mode

The number of genes which pass an FDR threshold is highly variable, especially with very few samples, where the effect of technical noise and difficulty in estimating parameters have a large effect. So I would not be surprised to see this number change across two datasets like this. I would be more concerned about the general shape of differential expression (that the bulk is up- in one dataset and down- in another) and I would compare LFC directly, not look at p-value sets.

ADD REPLY • link 8.2 years ago Michael Love 41k

0

Entering edit mode

I edited my question with new plots and edgeR results.

ADD REPLY • link 8.2 years ago tonja.r ▴ 80

1

Entering edit mode

It looks like you are filtering low count genes for the edgeR analysis, which are therefore cut off from the MA plot.

For a more clear comparison, why don't you also remove the same genes from the DESeq2 analysis that you remove from the edgeR analysis.

ADD REPLY • link 8.2 years ago Michael Love 41k