Question

DESeq2 normalization Factor Matrix to correct for uneven targeted sequencing coverage

0

Entering edit mode

Thomas_WILLEMS • 0

@thomas_willems-17632

Last seen 7.3 years ago

Hi everyone,

I'm trying to use DESeq2 to analyze some mutagenesis data and identify variants that confer a selective advantage. To do so, I'm comparing the counts of thousands of mutants before and after selection.

At a high level, our experiment examined mutants in 2 different regions of a protein. We used 2 sets of primers and PCR amplification to amplify each region and then performed NGS sequencing. A mutant's count is then just the number of reads that contain the mutation of interest.

One issue I'm having is that the total number of reads for the two regions isn't equal as I have 5x as many reads for Region A as I do for Region B. So due to Poisson sampling, my mutant frequencies for Region A are much more precise.

I want to explicitly account for the difference in sequencing coverage (instead of downsampling Region A's data) using DESeq2's gene-dependent normalization factor. One idea I had was to assign a factor of 1 to all of Region A's mutants and a factor of 0.2 for all of Region B's mutants to account for the 5x difference in regional coverage. So my normalization factor matrix would contain row's of all 0.2's or all 1.0's, depending on which region each mutant came from.

At a high level, this seems like this could work, but the DESeq2 manual suggests that "we recommend providing a matrix with row-wise geometric means of 1" by doing: normFactors <- normFactors / exp(rowMeans(log(normFactors)))

In my case, doing so would correct out exactly the correction I'm trying to apply and convert the matrix to all 1's.

So my questions are:

1. Does my normalization factor approach for uneven coverage make sense or am I way off here?

2. If it makes sense, is it okay that I don't "provide a matrix with row-wise geometric means of 1"

3. If it makes no sense, what would be a better way of correcting for the uneven sequencing coverage?

Thanks!

Thomas

deseq2 normalization • 1.1k views

ADD COMMENT • link updated 7.3 years ago by Michael Love 43k • written 7.3 years ago by Thomas_WILLEMS • 0

0

Entering edit mode

Are you sure it makes sense to run DESeq2 on this data? Isn't this data better suited to fitting a logistic regression model for each mutation?

ADD REPLY • link 7.3 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 1 · 2018-10-04

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

I don’t follow the data setup. How many rows and columns? How many replicates? Are the rows to be treated independently (like “genes”)?

ADD COMMENT • link 7.3 years ago Michael Love 43k