Question: Very variable samples vs similar samples in the same dataset
0
gravatar for eduardogccm
7 weeks ago by
eduardogccm0 wrote:

Hi! I am not sure if the following question has already been answered, but I haven't found it. Sorry if it is a reapeated question.

I am currently analysing a bulk RNAseq dataset. It contains 18 samples from 3 different patients (6 conditions per patient; 3 cell types and treatment/no-treatment). During the exploration of the data, I can see in the PC1 and PC2 that 2 of those conditions are much more similar to each other than any of the other conditions. As these 2 conditions are the ones that we are most interested in, I performed differential gene expression analysis with both edgeR and deseq2 both including the 18 samples or only the 6 of interest. I got different results doing that (expected) but I was surprised to see very few differentially expressed genes between both conditions when including all the samples to calculate the variance, especially with edgeR. I would imagine that this is due to an increase in the BCV when including the more variable samples, is this correct?

My question is: would it make sense to do the analysis using only the samples of interest for the BCV calculation? What if I would be interested in comparing how 2 cell types change differently before and after treatment? Could I do the ratio of the counts (or substraction of the log2 counts) manually and then calculate the BCV using those values?

Thanks for any help!

edger deseq2 • 96 views
ADD COMMENTlink modified 7 weeks ago by Michael Love26k • written 7 weeks ago by eduardogccm0

I would recommend to pick one pipeline for your analysis instead of analyzing with two different methods.

ADD REPLYlink written 7 weeks ago by Michael Love26k

Hi. Yes, I am aware of that. From my understanding, adding all the samples in the DESeqDataSet (or edgeR equivalent) is better and provides more power (and that is what I have seen before working with microarray data in limma). However, in this case I got basically no differentially expressed genes when using this approach and because of the PCA results I decided to try including only the samples of interest in the DESeqDataSet. Doing this the number of differentially expressed genes was larger (of about 150 in edgeR and 200 in DESeq2).

This is from where my question comes from. I feel a bit concerned about dropping samples before including them in the DESeqDataSet but because the PC1 and PC2 suggest that 2 conditions are much more similar between them that any other sample in the dataset (even those of the same condition) I feel that the TCV/BCV calculation may be misleading. Would that make sense?

ADD REPLYlink written 7 weeks ago by eduardogccm0
Answer: Very variable samples vs similar samples in the same dataset
2
gravatar for Michael Love
7 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

So, in our FAQ we discuss exactly this point, you can take a look there first. We recommend to look at the PCA and then if you are interested in a pair-wise comparison just go ahead and use the samples from those two groups for building the dataset.

ADD COMMENTlink written 7 weeks ago by Michael Love26k

Ok, I didn't notice it in the FAQs, found it now!.

Thanks for pointing me back there and for taking the time for answering.

ADD REPLYlink written 7 weeks ago by eduardogccm0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 156 users visited in the last hour