Comparing DGE between datasets with uneven sample sizes using DESeq2
1
0
Entering edit mode
@rebaduncan-10914
Last seen 9 months ago
United States

Hello!

I have seen some questions similar to mine, but there are some important differences so I want to see if anyone has any advice that is more specific to my situation. I am working on a paper where I found that when the same sample is sequenced with two different sequencing pipelines (i.e. different sequencing facility, different library prep), then VST-normalized expression between them is correlated only ~85%. I have done this for two different sample types (A & B) and three different sequencing pipelines and see the same pattern when I correlate normalized counts between each pair of pipelines for each sample. I want to see if the divergence in expression will affect differential expression between A and B (i.e. see if the different pipelines will identify the same differentially expressed genes). The idea I have is to run differential expression analyses between A and B for pipeline 1, pipeline 2, and pipeline 3, then calculate correlation coefficients for the LFC between each pair of pipelines. The problem is that I have uneven sample sizes between A and B, and between each pipeline:

Pipeline 1: A = 3, B = 10 Pipeline 2: A = 3, B = 11 Pipeline 3: A = 3, B = 3

From other questions that people have posted, It seems like the uneven sample sizes between A and B for each DE analysis is not a problem. But I am worried that the different sample sizes for B in the three different pipelines might be problematic for comparing between the different pipelines, since the pipelines where B has more replicates will identify more DE genes than pipeline 3 where B only has 3 replicates.

Does anyone have any advice? Some things I have thought about are - combining the datasets from each pipeline into one big DE analysis and using an interaction term to test for an interaction between pipeline and LFC, or running DE analyses separately for each pipeline but using different FDR thresholds for each one so that the number of DE genes is roughly equal. Someone suggested using shrinkage, but I don't understand how shrinking the LFC would help in this situation.

Many thanks!

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
unevensamplesizes ComparingDGEresults DESeq2 • 401 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

I'd recommend producing shrunken LFC from each pipeline separately, and then looking at pairs plot of these, to start.

You could also filter to genes that are detectable at a minimal amount in 2 or 3 out of the 3 pipelines.

ADD COMMENT

Login before adding your answer.

Traffic: 749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6