SVAseq and DESeq2 workflow
1
0
Entering edit mode
@capricygcapricyg-17892
Last seen 21 months ago
United States

Dear Bioconductor Team, 

We sequenced samples A with 4 replicates and B with 4 replicates last year, and C with 3 replicates this year. Since there are no overlapped samples, I can't include batch effect in the DESeq model.  So I tried to run SVAseq, and got 9 SVs.

However, the online workflow https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html, does not calculate the number of SVs, only used 2.

My question is: should I go with 2 or 9 for the subsequent DESeq2 analysis here?

Thanks for any inputs!

C.

sva deseq2 • 6.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 22 hours ago
United States

You can use the example code in the workflow with any number of surrogate variables (or FUVs etc).

The number of SVs seems too high here. 9 SVs for 11 samples?

ADD COMMENT
0
Entering edit mode

 

HI, Michael,

Thank you very much for the response! Sounds like you don't feel it is too high? I saw your example with 8 samples from 4 cell lines and 2 dex conditions, but only specified 2 SVc. I have 3 conditions, A, B and C, but SVAseq calculated 9 SVs. So I am very concerned. Also I read some discussion that suggested that most variance could be caught in the first two SVs... What do you think?

C.

 

ADD REPLY
0
Entering edit mode

I think it is too high, but I’ll leave it to the SVA developers to comment.

ADD REPLY
0
Entering edit mode

Hi, Michael,

I actually have a question about your workflow:

you plotted SV values versus cell line (stripchart(svseq$sv[, i] ~ dds$cell, vertical = TRUE, main = paste0("SV", i))).

So what do you expect from these figurse? How do you evaluate if the cell line effects are correctly adjusted?

Some of my SVs look like below (image link: https://imgur.com/8OIR76I). Here, "1" and "2" correspond to the batch of last year and batch of this year, respectively. Batch 2 are generally more spreading. Batch 1  pattern changes more as I plotted along.

Do you  think they are normal? 

 

 

ADD REPLY
0
Entering edit mode

I wouldn’t typically estimate batch if I had the batches annotated. I think we have some sentence about this in the workflow. We “pretend” as if we didn’t have batch annotation and see if we could estimate it. So that plot is particular in that you normally wouldn’t estimate batch if you already had it annotated.

ADD REPLY
0
Entering edit mode

My issue here is that, unlike your example, the two batches contains completely different samples. I can't use the annotated  batch in the DESeq2 design matrix. So I went to follow your section 8, to take the "sequencing year"   as some hidden and unwanted effects. Since there are many SVs, I just wonder if I need to select some of them based on the plots before final DE call is performed...

ADD REPLY
0
Entering edit mode

Oh sorry I missed that. You’re stuck a bit. If you try to control for batch, even by estimating hidden variables, you will potentially remove the biological signal. Meanwhile you won’t be able to know what you see if biological or technical.

ADD REPLY
0
Entering edit mode

I wonder what you would go with such type of data?

ADD REPLY
0
Entering edit mode

Rather than try to estimate the technical effects using factor analysis (which doesn’t work here because of confounding) you can instead try to remove technical artifacts with per sample bias correction. Two methods that work on count matrices are RUVSeq and cqn, both Bioconductor packages. To use these you need to do some work compiling the gene features: GC and average transcript length. I don’t have code for this in the workflow.

If you were starting from reads, I would recommend Salmon with GC bias correction followed by tximport. This pipeline accounts for per sample biases automatically by passing along offsets.

ADD REPLY
0
Entering edit mode

Thank you very much, Michael! I will try these packages and see how the results look like...

ADD REPLY

Login before adding your answer.

Traffic: 721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6