Very variable samples vs similar samples in the same dataset
1
0
Entering edit mode
@eduardogccm-22161
Last seen 2.0 years ago
United Kingdom

Hi! I am not sure if the following question has already been answered, but I haven't found it. Sorry if it is a reapeated question.

I am currently analysing a bulk RNAseq dataset. It contains 18 samples from 3 different patients (6 conditions per patient; 3 cell types and treatment/no-treatment). During the exploration of the data, I can see in the PC1 and PC2 that 2 of those conditions are much more similar to each other than any of the other conditions. As these 2 conditions are the ones that we are most interested in, I performed differential gene expression analysis with both edgeR and deseq2 both including the 18 samples or only the 6 of interest. I got different results doing that (expected) but I was surprised to see very few differentially expressed genes between both conditions when including all the samples to calculate the variance, especially with edgeR. I would imagine that this is due to an increase in the BCV when including the more variable samples, is this correct?

My question is: would it make sense to do the analysis using only the samples of interest for the BCV calculation? What if I would be interested in comparing how 2 cell types change differently before and after treatment? Could I do the ratio of the counts (or substraction of the log2 counts) manually and then calculate the BCV using those values?

Thanks for any help!

edger deseq2 • 500 views
ADD COMMENT
0
Entering edit mode

I would recommend to pick one pipeline for your analysis instead of analyzing with two different methods.

ADD REPLY
0
Entering edit mode

Hi. Yes, I am aware of that. From my understanding, adding all the samples in the DESeqDataSet (or edgeR equivalent) is better and provides more power (and that is what I have seen before working with microarray data in limma). However, in this case I got basically no differentially expressed genes when using this approach and because of the PCA results I decided to try including only the samples of interest in the DESeqDataSet. Doing this the number of differentially expressed genes was larger (of about 150 in edgeR and 200 in DESeq2).

This is from where my question comes from. I feel a bit concerned about dropping samples before including them in the DESeqDataSet but because the PC1 and PC2 suggest that 2 conditions are much more similar between them that any other sample in the dataset (even those of the same condition) I feel that the TCV/BCV calculation may be misleading. Would that make sense?

ADD REPLY
2
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

So, in our FAQ we discuss exactly this point, you can take a look there first. We recommend to look at the PCA and then if you are interested in a pair-wise comparison just go ahead and use the samples from those two groups for building the dataset.

ADD COMMENT
0
Entering edit mode

Ok, I didn't notice it in the FAQs, found it now!.

Thanks for pointing me back there and for taking the time for answering.

ADD REPLY

Login before adding your answer.

Traffic: 696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6