Question

Comparing two RNASeq protocols using DESeq2

0

Entering edit mode

pkachroo ▴ 10

@pkachroo-11576

Last seen 4.4 years ago

Hi,

I recently performed RNASeq using two different protocols on 2 strains in triplicates. I anticipate similar results using the two protocols and wanted to compare the DE genes between two protocols, when comparing strain2 to strain1. I used DESeq2 to analyze differential expression, and for protocol-1, strain-2 versus strain-1 gave 53 DE genes and for protocol-2, strain-2 versus strain-1 gave 46 DE genes. 20 genes are DE in protocol-1 only, 33 DE in both, 13 genes DE in protocol-2 only. For the genes DE in only one protocol (see table below), i looked at the fold-changes (FC), and although fold-changes are very similar, adjusted p-values (bonferroni) are significant for one protocol and not in the other (protocol). I wonder what contributes to higher adj-pvalues and NA values for following genes in protocol-2 ?

	FC(protocol-1)	FC(protocol-2)	adj-p(protocol-1)	adj-p(protocol-2)
gene1	1.8	1.7	0.000135209	0.381779936
gene2	1.6	1.5	2.30819E-05	0.20413359
gene3	1.6	1.7	0.000446964	0.058520602
gene4	1.6	1.2	0.000199375	1
gene5	1.5	1.4	0.004449978	0.962902825
gene6	-1.8	-1.5	0.009131046	0.200042882
gene7	-1.8	-1.7	2.38294E-08	0.279792112
gene8	-1.5	-1.4	0.000163779	0.172985361
gene9	1.7	1.2	0.000674295	NA
gene10	1.7	1.5	0.028869571	0.202590227
gene11	1.5	1.2	1.01927E-19	1
gene12	-1.9	-1.4	1.41247E-10	1
gene13	1.7	-1.1	0.011046489	NA
gene14	40.0	4.7	0	NA
gene15	43.5	3.3	0	NA
gene16	6.8	7.9	6.66528E-63	NA
gene17	7.2	4.6	1.4105E-79	NA
gene18	-1.5	-1.3	0.001077303	1
gene19	-1.6	-1.2	0.006786376	NA
gene20	-1.5	-1.3	5.78916E-17	0.003348031

Priyanka

deseq2 adjusted pvalue • 2.0k views

ADD COMMENT • link updated 8.3 years ago by Michael Love 43k • written 8.3 years ago by pkachroo ▴ 10

score 0 · Answer 1 · 2016-12-29

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 days ago

United States

Try looking at the counts for these specific genes with plotCounts() for each protocol, and additionally making PCA plots to see the overall within-group variability for the two protocols.

ADD COMMENT • link 8.3 years ago Michael Love 43k

0

Entering edit mode

Hi Micheal,

I looked at the PCA plots for 5 strains using the two protocols and they look fairly overlapping to me (PCA plots and other files in the shared folder link below). I also looked at the normalized counts, fold change,lfcSE and it appears that for some genes a slightly higher lfcSE in one protocol lowers wald's-stat and hence the significance (e.g. gene 1 in the attached spreadsheet). Is that right ? While for other genes, e.g. genes 14, 15, 16, the fold changes seem over-shrunk and high cooks-distance for protocol-2, explaining the NA's in adjusted-pvalues. I wonder what is the reason behind such shrinkage for those genes and high cooks distance.

For files and plots click: https://drive.google.com/drive/folders/0B3ZNauEYziEHMW90LU1QU3ZWWmM?usp=sharing

Priyanka

ADD REPLY • link 8.2 years ago pkachroo ▴ 10

0

Entering edit mode

The easiest way for me to follow is if you could show the plotCounts() plot for the specific genes which you suspect something is different across protocol but it should not be.

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

Below are the plotCounts for gene 1 and gene 14 using the two protocols. Both genes are differentially expressed (fold change 1.5 or more and adj-pval<0.05) between strain2 (S2) versus strain 1(S1) in protocol 1 but not in protocol2.

ADD REPLY • link 8.2 years ago pkachroo ▴ 10

0

Entering edit mode

Thank you, this is helpful in seeing what might be going on. Can you re-run on both protocols with DESeq(dds, betaPrior=FALSE) and results(dds, cooksCutoff=FALSE)? The first suggestion, betaPrior=FALSE for the inference, is going to be default in the next version of DESeq2 anyway, with LFC shrinkage split into a second step for visualization. Gene 1 might still not pass, because the observed difference is within the range of the within-group dispersion and the number of replicates is only 3 (giving a small p-value but not small enough for multiple test correction), but it should make the results more concordant for the other genes.

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

After re-running with DESeq(dds, betaPrior=FALSE) and results(dds, cooksCutoff=FALSE), I looked at the DE genes and this time both protocol called more DE genes, genes for protocol2 with adjpavlue as NA (e.g. gene 14) were now differentially expressed, but gene1, as you thought, did not pass the threshold. Overall, the overlap between the two protocols (common DE genes) for the strains I looked at increased from 50 to 55%.

ADD REPLY • link 8.2 years ago pkachroo ▴ 10

0

Entering edit mode

That makes more sense. You can expect variation in the DE sets which reflects the natural variation of performing experiments (and only having 3 vs 3 samples per protocol).

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

Thanks Micheal. I made a PCA plot of expression data from 5 strains done using both protocols and although each protocol retains the relationship between strains, expression data generated using two protocol appears distinct (PC1). I wonder what is contributing to this ? The spread of the counts, using boxplots before and after normalization look similar.

ADD REPLY • link 8.2 years ago pkachroo ▴ 10

0

Entering edit mode

I don't know what the source of the difference is. It could be a shift in counts for enough genes that it contributes such a percent of variance when performing PCA on the top 500 genes by variance (see ?plotPCA).

ADD REPLY • link 8.2 years ago Michael Love 43k