EdgeR: Replicate samples diverge in MDS plot
2
0
Entering edit mode
fawazfebin ▴ 60
@fawazfebin-14053
Last seen 3.8 years ago

Hi

I was analysing RNA Seq datasets of an experiment selected from GEO datasets. Alignment to reference genome was done using STAR algorithm and quantification of transcripts was done using Subread package. The output 'counts.txt' was fed into edgeR for performing differential expression. The data exploration step (MDS plot) revealed a considerable amount of divergence among the replicates of same sample. Is this kind of divergence favourable for the edgeR analysis. Can I proceed to the next steps in differential expression analysis?

 

>countdata <- read.table("counts.txt", header=TRUE, row.names=1)

>countdata <- countdata[ ,6:ncol(countdata)]

>colnames(countdata) <- c(“sensitive1”,”sensitive2”,”resistant1”,”resistant2”)

> condition <- c(1,1,2,2)

>dge <- DGEList(counts=countdata,group=condition)

>dge$samples

> countsPerMillion <- cpm(dge)

> countCheck <- countsPerMillion > 1

> keep <- which(rowSums(countCheck) >= 2)

> dge <- dge[keep,]

> dge <- calcNormFactors(dge, method="TMM")

> plotMDS(dge)

Here is the url of the plot image : https://imgur.com/XPDq93d . Kindly requesting for your valuable guidance.

Febin@GC

edger plotmds • 1.5k views
ADD COMMENT
0
Entering edit mode
Great thanks for your guidance. Quick response as well !
ADD REPLY
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

It's impossible to know for sure with only 4 samples, but one possible explanation is that dimension 1 represents some sort of sample quality issue in the resistant2 sample that perturbed the measurements of gene expression in that sample, causing it to diverge from the others, while dimension 2 represents the effect of interest, control vs resistant. In any case, with so few samples, I don't think there's anything you can do to correct for this. You pretty much just have to take the data as is. If you had more samples, I would have recommended using sva to correct for such problems.

ADD COMMENT
0
Entering edit mode

Thanks Ryan! I shall check for the availability of more samples for the same experiment and use sva package if needed.

Febin@GC

ADD REPLY
0
Entering edit mode

Is sva correction possible with six samples?

 

ADD REPLY
1
Entering edit mode

I'm not sure. It might be possible. It depends on the severity of the confounding effect and the number of samples affected. sva and similar methods work best with lots of samples.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

As Ryan says, you have to work with the data you have. You can set robust=TRUE when you run estimateDisp() in edgeR. If the problems with the resistant2 sample are isolated to a certain group of genes, then this will isolate those genes and and the analysis will run fine. If problems with resistant2 are widespread, then the variability of the resistant samples will simply decrease the number of DE genes you will find at any significance level.

ADD COMMENT

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6