Question

How to deal with contaminating transcripts in TxImport/DESeq2?

0

Entering edit mode

bruno.saubamea • 0

@brunosaubamea-13693

Last seen 6.6 years ago

Dear all,

I do DGE analysis (sample A vs sample B) using Salmon then Tximport (without scaling) then DEseq2. Everything works fine but my problem is that sample A is contaminated with red blood cells so that half of the total read counts map on globin transcripts. So TPM/counts values in sample A and FC values (A vs B) are artificially lowered.

My question : can I manually remove the contaminating transcripts (less than 10 transcripts, easy to identify) from the TXI output file before loading into DESeq2? If doing so, should I use scaledTPM values?

Thank you for your help.

Bruno

deseq2 salmon tximport • 1.4k views

ADD COMMENT • link 6.7 years ago bruno.saubamea • 0

score 0 · Answer 1 · 2017-08-09

The point of the size factor normalization in DESeq2 and other RNA-seq software is such that, even though the counts for non-globin genes are lower in the blood samples, the median of ratios over all genes will balance this out (see DESeq2 paper for how size factor normalization is accomplished). You can take a look at an MA plot to confirm that the normalization "works", that the globin genes are "up" in blood, and the rest of the distribution is roughly centered on y=0. I would not recommend removing transcripts. The point of normalization is to deal with this.

score 0 · Answer 2 · 2017-08-09

OK. Just to make sure that I understand correctly, I reanalyzed the data by doing the following:

- remove the 8 transcripts corresponding to globin genes from Salmon output files (quant.sf)

- rerun Tximport with scaledTPM option

txi <- tximport(files, type="salmon", countsFromAbundance="scaledTPM", tx2gene=tx2gene)

- load the rounded txi$counts in DEseq2

dds <- DESeqDataSetFromMatrix(round(txi$counts), sampleTable, ~condition) )

The results:

- TXI count values are almost unchanged in the non-blood sample and almost doubled in the blood sample (which seems OK since blood genes made about 50% of the total read counts)

- FC values are however almost unchanged (quite unexepected to me!)

Is this related to your comment about the way normalization is made in DESeq2?

score 0 · Answer 3 · 2017-08-10

0

Entering edit mode

bruno.saubamea • 0

@brunosaubamea-13693

Last seen 6.6 years ago

Thank you for these explanations Michael!

ADD COMMENT • link 6.7 years ago bruno.saubamea • 0