Question

Differential expression analysis of multiple RNA-seq datasets using DESeq2

0

Entering edit mode

rezashiralmohammad • 0

@rezashiralmohammad-21811

Last seen 3.5 years ago

Iran

Hi, I am working on differential expression analysis of multiple leukemia RNA-seq datasets retrieved from SRA. One of my datasets consists of both normal and leukemic samples, whereas the other two are only included leukemic samples. Although I set normal samples as the reference level, the sample distance matrix plot of all datasets clusters samples of one dataset together and samples of other datasets together, no matter they are normal or leukemic. Moreover, the list of significantly expressed genes produced by DESeq2 varies when I use samples of multiple datasets instead of one. I think this problem is rising from different library preparation and sequencing protocol (batch effects) of each dataset if I am right. I would be grateful if someone can help me with fixing this issue to obtain the correct gene list and plot.

Sample Distance Matrix

deseq2 cancer • 3.1k views

ADD COMMENT • link updated 5.2 years ago by Michael Love 43k • written 5.2 years ago by rezashiralmohammad • 0

0

Entering edit mode

A general comment: Yes, you are combining completely different experiments here, batch effects are almost certainly dominating any biological differences here. I doubt that this can meaningfully be corrected since you only have a single dataset with normals, therefore standard batch correction methods do not apply here. I'd just focus your DE analysis on this dataset. I realize that it is tempting to include more samples to have greater power but in situations like this that does more harm than good. I suggest that for the future (when having non-technical questions that require the developer's expert opinions towards how tools work under the hood) you ask at biostars.org since there is simply a larger user base, and this community here is mainly for technical support of the Bioc packages. There are also plenty of threads on batch correction and the problems that come up when having only one study with both conditions.

ADD REPLY • link 5.2 years ago ATpoint ★ 5.0k

score 0 · Answer 1 · 2020-09-04

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 days ago

United States

There's not a specific DESeq2 question here, so I don't have a response as the software maintainer.

"Although I set normal samples as the reference level, the sample distance matrix plot of all datasets clusters samples of one dataset together and samples of other datasets together"

Note that the EDA plots such as heatmaps or other plots using distances would not change based on which group is set as reference. This is unsupervised analysis.

ADD COMMENT • link 5.2 years ago Michael Love 43k

0

Entering edit mode

Thanks for your clarification. I have just got a little confused with my plots. If you look at my cook's distances boxplot, you can see datasets I am talking about seems to be detected as outliers. So my problem is actually with these samples. How should I deal with these samples without removing them? If depicting EDA plots have nothing to do with the reference level, then what is affecting them to behave like this, and how can I change it?

Thanks,

Cook's Distances Boxplot

ADD REPLY • link 5.2 years ago rezashiralmohammad • 0

0

Entering edit mode

if these are log Cook's distances, I don't see a problem here.

ADD REPLY • link 5.2 years ago Michael Love 43k

0

Entering edit mode

Yes, they are. Thanks for your help.

ADD REPLY • link 5.2 years ago rezashiralmohammad • 0