Question

EdgeR: Replicate samples diverge in MDS plot

0

Entering edit mode

fawazfebin ▴ 60

@fawazfebin-14053

Last seen 4.4 years ago

Hi

I was analysing RNA Seq datasets of an experiment selected from GEO datasets. Alignment to reference genome was done using STAR algorithm and quantification of transcripts was done using Subread package. The output 'counts.txt' was fed into edgeR for performing differential expression. The data exploration step (MDS plot) revealed a considerable amount of divergence among the replicates of same sample. Is this kind of divergence favourable for the edgeR analysis. Can I proceed to the next steps in differential expression analysis?

>countdata <- read.table("counts.txt", header=TRUE, row.names=1)

>countdata <- countdata[ ,6:ncol(countdata)]

>colnames(countdata) <- c(“sensitive1”,”sensitive2”,”resistant1”,”resistant2”)

> condition <- c(1,1,2,2)

>dge <- DGEList(counts=countdata,group=condition)

>dge$samples

> countsPerMillion <- cpm(dge)

> countCheck <- countsPerMillion > 1

> keep <- which(rowSums(countCheck) >= 2)

> dge <- dge[keep,]

> dge <- calcNormFactors(dge, method="TMM")

> plotMDS(dge)

Here is the url of the plot image : https://imgur.com/XPDq93d . Kindly requesting for your valuable guidance.

Febin@GC

edger plotmds • 1.8k views

ADD COMMENT • link 7.2 years ago fawazfebin ▴ 60

0

Entering edit mode

Great thanks for your guidance. Quick response as well !

ADD REPLY • link 7.2 years ago fawazfebin ▴ 60

score 2 · Answer 1 · 2017-09-27

2

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 9 weeks ago

Icahn School of Medicine at Mount Sinai…

It's impossible to know for sure with only 4 samples, but one possible explanation is that dimension 1 represents some sort of sample quality issue in the resistant2 sample that perturbed the measurements of gene expression in that sample, causing it to diverge from the others, while dimension 2 represents the effect of interest, control vs resistant. In any case, with so few samples, I don't think there's anything you can do to correct for this. You pretty much just have to take the data as is. If you had more samples, I would have recommended using sva to correct for such problems.

ADD COMMENT • link 7.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thanks Ryan! I shall check for the availability of more samples for the same experiment and use sva package if needed.

Febin@GC

ADD REPLY • link 7.2 years ago fawazfebin ▴ 60

0

Entering edit mode

Is sva correction possible with six samples?

ADD REPLY • link 7.2 years ago fawazfebin ▴ 60

1

Entering edit mode

I'm not sure. It might be possible. It depends on the severity of the confounding effect and the number of samples affected. sva and similar methods work best with lots of samples.

ADD REPLY • link 7.2 years ago Ryan C. Thompson ★ 7.9k

score 1 · Answer 2 · 2017-09-27

As Ryan says, you have to work with the data you have. You can set robust=TRUE when you run estimateDisp() in edgeR. If the problems with the resistant2 sample are isolated to a certain group of genes, then this will isolate those genes and and the analysis will run fine. If problems with resistant2 are widespread, then the variability of the resistant samples will simply decrease the number of DE genes you will find at any significance level.