Question

Using duplication rate as a covariate

0

Entering edit mode

rbutler • 0

@rbutler-20667

Last seen 4.7 years ago

Working with a workflow that uses Fastp -> Salmon -> Deseq2

Is it generally considered good practice to control for Fastp's read duplication rate and/or Salmon's percent mapped (from meta_info.json) when doing a Deseq DE analysis? I have noticed a fair amount of variability across a set of samples in the same prep batch and sequencing run (duplication rate, 22-52%; percent mapped, 82-92%). Duplication rate in particular seems relevant, as I didn't figure it would be that variable, and previous workflows I had done with STAR had me remove duplicate reads altogether.

I mean, it would be easy enough to do ~ read_dups + trt or ~ read_dups + map_rate + trt, but are there arguments to not do this (i.e., overfitting or removing true variation)?

deseq2 salmon fastp • 1.5k views

ADD COMMENT • link updated 5.2 years ago by Michael Love 42k • written 5.2 years ago by rbutler • 0

score 2 · Accepted Answer · 2019-09-10

2

Entering edit mode

Michael Love 42k

@mikelove

Last seen 22 hours ago

United States

I don't typically add in things like RIN or TIN or duplication or mapping rates.

My preferred approach to control for technical variation is either through Salmon's bias terms (GC, positional, etc.), or otherwise with RUV or SVA and providing these packages with the condition.

ADD COMMENT • link 5.2 years ago Michael Love 42k

0

Entering edit mode

The vignettes have examples that use 2 SVs. Do you ever use more than 2? using svaseq to estimate the number of factors with num.sv gets me a very high number. I tried sequentially plotting SV1, SV1+SV2, SV1+SV2+SV3, etc using cleaned matrices, but I don't know what I am looking for other than for the lowest number of SVs where the batch effect disappears.

ADD REPLY • link 5.2 years ago rbutler • 0

0

Entering edit mode

In that example there are three known batches, and so I know a priori to look for 2 SVs.

I would reach out to the SVA developers on advice on the number of SVs. Maybe a new post and tag the sva package.

ADD REPLY • link 5.2 years ago Michael Love 42k