Using duplication rate as a covariate
1
0
Entering edit mode
rbutler • 0
@rbutler-20667
Last seen 4.1 years ago

Working with a workflow that uses Fastp -> Salmon -> Deseq2

Is it generally considered good practice to control for Fastp's read duplication rate and/or Salmon's percent mapped (from meta_info.json) when doing a Deseq DE analysis? I have noticed a fair amount of variability across a set of samples in the same prep batch and sequencing run (duplication rate, 22-52%; percent mapped, 82-92%). Duplication rate in particular seems relevant, as I didn't figure it would be that variable, and previous workflows I had done with STAR had me remove duplicate reads altogether.

I mean, it would be easy enough to do ~ read_dups + trt or ~ read_dups + map_rate + trt, but are there arguments to not do this (i.e., overfitting or removing true variation)?

deseq2 salmon fastp • 1.3k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

I don't typically add in things like RIN or TIN or duplication or mapping rates.

My preferred approach to control for technical variation is either through Salmon's bias terms (GC, positional, etc.), or otherwise with RUV or SVA and providing these packages with the condition.

ADD COMMENT
0
Entering edit mode

The vignettes have examples that use 2 SVs. Do you ever use more than 2? using svaseq to estimate the number of factors with num.sv gets me a very high number. I tried sequentially plotting SV1, SV1+SV2, SV1+SV2+SV3, etc using cleaned matrices, but I don't know what I am looking for other than for the lowest number of SVs where the batch effect disappears.

ADD REPLY
0
Entering edit mode

In that example there are three known batches, and so I know a priori to look for 2 SVs.

I would reach out to the SVA developers on advice on the number of SVs. Maybe a new post and tag the sva package.

ADD REPLY

Login before adding your answer.

Traffic: 983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6