Search
Question: DESeq2 biais when genes are missing from the annotation?
0
gravatar for corend
6 days ago by
corend0
corend0 wrote:

As this concerns bioinformatics in general, I also posted here.

I am working on RNAseq data,

I made my count table using kallisto and then tximport to work with DESeq2.

My genes are a set of cDNAs, (supposed to be corresponding to all the genes of my species), but the annotation is quite bad, when I align on these cDNAs I get 60% of mapping, instead of 95% on total genome.

I have 2 conditions: (A and B) and 3 replicates in each condition.

My fear is: If a gene is over-expressed in A, not expressed in B, and not in my cDNA list, I expect to have less reads in A than is B and when the normalization by DESeq2 occurs, it could create a bias ?

Example:

A: 1 1 1 1 2 2 2 2 3 3

B: 1 1 1 1 2 3 3 3 3 3

3 is not annotated, then after normalization by DESeq2:

A: 1 1 1 1 1 2 2 2 2 2

B: 1 1 1 1 1 1 1 1 2 2

1 over-expressed in B, but it is not true.

How can I deal with this kind of problem?

Should I add a line in my table with "unmapped reads" to have a better normalization?

ADD COMMENTlink modified 5 days ago by Michael Love14k • written 6 days ago by corend0

Do you expect or observe that the proportion of unmapped reads is different across groups or samples? 

ADD REPLYlink written 6 days ago by Sean Davis21k

Yes indeed, I map 65% of my reads in condition B and 55% in condition A.

ADD REPLYlink written 6 days ago by corend0

And how about at the genomic level? 

ADD REPLYlink written 5 days ago by Sean Davis21k

90% condition B

93% condition A

 

ADD REPLYlink modified 5 days ago • written 5 days ago by corend0
3
gravatar for Michael Love
5 days ago by
Michael Love14k
United States
Michael Love14k wrote:

If I understand your question correctly, you are assuming that DESeq2 uses total count normalization, but it does not. DESeq2 (and all other methods in Bioconductor I can think of) use a robust method to estimate the scaling factors for each sample. You can read about the scaling method ("median ratio" normalization) in the DESeq2 paper.

ADD COMMENTlink written 5 days ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 163 users visited in the last hour