DESeq2 normalization Factor Matrix to correct for uneven targeted sequencing coverage
1
0
Entering edit mode
@thomas_willems-17632
Last seen 5.5 years ago

Hi everyone,

I'm trying to use DESeq2 to analyze some mutagenesis data and identify variants that confer a selective advantage. To do so, I'm comparing the counts of thousands of mutants before and after selection. 

At a high level, our experiment examined mutants in 2 different regions of a protein. We used 2 sets of primers and PCR amplification to amplify each region and then performed NGS sequencing. A mutant's count is then just the number of reads that contain the mutation of interest.

One issue I'm having is that the total number of reads for the two regions isn't equal as I have 5x as many reads for Region A as I do for Region B. So due to Poisson sampling, my mutant frequencies for Region A are much more precise. 

I want to explicitly account for the difference in sequencing coverage (instead of downsampling Region A's data) using DESeq2's gene-dependent normalization factor. One idea I had was to assign a factor of 1 to all of Region A's mutants and a factor of 0.2 for all of Region B's mutants to account for the 5x difference in regional coverage. So my normalization factor matrix would contain row's of all 0.2's or all 1.0's, depending on which region each mutant came from.

At a high level, this seems like this could work, but the DESeq2 manual suggests that "we recommend providing a matrix with row-wise geometric means of 1" by doing:    normFactors <- normFactors / exp(rowMeans(log(normFactors)))

In my case, doing so would correct out exactly the correction I'm trying to apply and convert the matrix to all 1's.

 

So my questions are:

 1. Does my normalization factor approach for uneven coverage make sense or am I way off here?

2. If it makes sense, is it okay that I don't  "provide a matrix with row-wise geometric means of 1"

3. If it makes no sense, what would be a better way of correcting for the uneven sequencing coverage?

 

Thanks!

Thomas

 

 

 

 

 

deseq2 normalization • 687 views
ADD COMMENT
0
Entering edit mode

Are you sure it makes sense to run DESeq2 on this data? Isn't this data better suited to fitting a logistic regression model for each mutation?

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 55 minutes ago
United States

I don’t follow the data setup. How many rows and columns? How many replicates? Are the rows to be treated independently (like “genes”)?

ADD COMMENT

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6