Search
Question: SVAseq and DESeq2 workflow
0
gravatar for capricygcapricyg
26 days ago by
capricygcapricyg0 wrote:

Dear Bioconductor Team, 

We sequenced samples A with 4 replicates and B with 4 replicates last year, and C with 3 replicates this year. Since there are no overlapped samples, I can't include batch effect in the DESeq model.  So I tried to run SVAseq, and got 9 SVs.

However, the online workflow https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html, does not calculate the number of SVs, only used 2.

My question is: should I go with 2 or 9 for the subsequent DESeq2 analysis here?

Thanks for any inputs!

C.

ADD COMMENTlink modified 26 days ago by Michael Love20k • written 26 days ago by capricygcapricyg0
0
gravatar for Michael Love
26 days ago by
Michael Love20k
United States
Michael Love20k wrote:

You can use the example code in the workflow with any number of surrogate variables (or FUVs etc).

The number of SVs seems too high here. 9 SVs for 11 samples?

ADD COMMENTlink modified 25 days ago • written 26 days ago by Michael Love20k

 

HI, Michael,

Thank you very much for the response! Sounds like you don't feel it is too high? I saw your example with 8 samples from 4 cell lines and 2 dex conditions, but only specified 2 SVc. I have 3 conditions, A, B and C, but SVAseq calculated 9 SVs. So I am very concerned. Also I read some discussion that suggested that most variance could be caught in the first two SVs... What do you think?

C.

 

ADD REPLYlink written 25 days ago by capricygcapricyg0

I think it is too high, but I’ll leave it to the SVA developers to comment.

ADD REPLYlink written 25 days ago by Michael Love20k

Hi, Michael,

I actually have a question about your workflow:

you plotted SV values versus cell line (stripchart(svseq$sv[, i] ~ dds$cell, vertical = TRUE, main = paste0("SV", i))).

So what do you expect from these figurse? How do you evaluate if the cell line effects are correctly adjusted?

Some of my SVs look like below (image link: https://imgur.com/8OIR76I). Here, "1" and "2" correspond to the batch of last year and batch of this year, respectively. Batch 2 are generally more spreading. Batch 1  pattern changes more as I plotted along.

Do you  think they are normal? 

 

 

ADD REPLYlink modified 25 days ago • written 25 days ago by capricygcapricyg0

I wouldn’t typically estimate batch if I had the batches annotated. I think we have some sentence about this in the workflow. We “pretend” as if we didn’t have batch annotation and see if we could estimate it. So that plot is particular in that you normally wouldn’t estimate batch if you already had it annotated.

ADD REPLYlink written 25 days ago by Michael Love20k

My issue here is that, unlike your example, the two batches contains completely different samples. I can't use the annotated  batch in the DESeq2 design matrix. So I went to follow your section 8, to take the "sequencing year"   as some hidden and unwanted effects. Since there are many SVs, I just wonder if I need to select some of them based on the plots before final DE call is performed...

ADD REPLYlink written 24 days ago by capricygcapricyg0

Oh sorry I missed that. You’re stuck a bit. If you try to control for batch, even by estimating hidden variables, you will potentially remove the biological signal. Meanwhile you won’t be able to know what you see if biological or technical.

ADD REPLYlink written 24 days ago by Michael Love20k

I wonder what you would go with such type of data?

ADD REPLYlink written 24 days ago by capricygcapricyg0

Rather than try to estimate the technical effects using factor analysis (which doesn’t work here because of confounding) you can instead try to remove technical artifacts with per sample bias correction. Two methods that work on count matrices are RUVSeq and cqn, both Bioconductor packages. To use these you need to do some work compiling the gene features: GC and average transcript length. I don’t have code for this in the workflow.

If you were starting from reads, I would recommend Salmon with GC bias correction followed by tximport. This pipeline accounts for per sample biases automatically by passing along offsets.

ADD REPLYlink written 24 days ago by Michael Love20k

Thank you very much, Michael! I will try these packages and see how the results look like...

ADD REPLYlink modified 24 days ago • written 24 days ago by capricygcapricyg0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 400 users visited in the last hour