Differential expression analysis of multiple RNA-seq datasets using DESeq2
1
0
Entering edit mode
@rezashiralmohammad-21811
Last seen 12 months ago

Hi, I am working on differential expression analysis of multiple leukemia RNA-seq datasets retrieved from SRA. One of my datasets consists of both normal and leukemic samples, whereas the other two are only included leukemic samples. Although I set normal samples as the reference level, the sample distance matrix plot of all datasets clusters samples of one dataset together and samples of other datasets together, no matter they are normal or leukemic. Moreover, the list of significantly expressed genes produced by DESeq2 varies when I use samples of multiple datasets instead of one. I think this problem is rising from different library preparation and sequencing protocol (batch effects) of each dataset if I am right. I would be grateful if someone can help me with fixing this issue to obtain the correct gene list and plot.

Sample Distance Matrix

deseq2 cancer • 309 views
ADD COMMENT
0
Entering edit mode

A general comment: Yes, you are combining completely different experiments here, batch effects are almost certainly dominating any biological differences here. I doubt that this can meaningfully be corrected since you only have a single dataset with normals, therefore standard batch correction methods do not apply here. I'd just focus your DE analysis on this dataset. I realize that it is tempting to include more samples to have greater power but in situations like this that does more harm than good. I suggest that for the future (when having non-technical questions that require the developer's expert opinions towards how tools work under the hood) you ask at biostars.org since there is simply a larger user base, and this community here is mainly for technical support of the Bioc packages. There are also plenty of threads on batch correction and the problems that come up when having only one study with both conditions.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 54 minutes ago
United States

There's not a specific DESeq2 question here, so I don't have a response as the software maintainer.

"Although I set normal samples as the reference level, the sample distance matrix plot of all datasets clusters samples of one dataset together and samples of other datasets together"

Note that the EDA plots such as heatmaps or other plots using distances would not change based on which group is set as reference. This is unsupervised analysis.

ADD COMMENT
0
Entering edit mode

Thanks for your clarification. I have just got a little confused with my plots. If you look at my cook's distances boxplot, you can see datasets I am talking about seems to be detected as outliers. So my problem is actually with these samples. How should I deal with these samples without removing them? If depicting EDA plots have nothing to do with the reference level, then what is affecting them to behave like this, and how can I change it?

Thanks,

Cook's Distances Boxplot

ADD REPLY
0
Entering edit mode

if these are log Cook's distances, I don't see a problem here.

ADD REPLY
0
Entering edit mode

Yes, they are. Thanks for your help.

ADD REPLY

Login before adding your answer.

Traffic: 398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6