How to deal with contaminating transcripts in TxImport/DESeq2?
3
0
Entering edit mode
@brunosaubamea-13693
Last seen 7.3 years ago

Dear all,

I do DGE analysis (sample A vs sample B) using Salmon then Tximport (without scaling) then DEseq2. Everything works fine but my problem is that sample A is contaminated with red blood cells so that half of the total read counts map on globin transcripts. So TPM/counts values in sample A and FC values (A vs B) are artificially lowered.

My question : can I manually remove the contaminating transcripts (less than 10 transcripts, easy to identify) from the TXI output file before loading into DESeq2? If doing so, should I use scaledTPM values?

Thank you for your help.

Bruno


 

deseq2 salmon tximport • 1.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

The point of the size factor normalization in DESeq2 and other RNA-seq software is such that, even though the counts for non-globin genes are lower in the blood samples, the median of ratios over all genes will balance this out (see DESeq2 paper for how size factor normalization is accomplished). You can take a look at an MA plot to confirm that the normalization "works", that the globin genes are "up" in blood, and the rest of the distribution is roughly centered on y=0. I would not recommend removing transcripts. The point of normalization is to deal with this.

ADD COMMENT
0
Entering edit mode
@brunosaubamea-13693
Last seen 7.3 years ago

OK. Just to make sure that I understand correctly, I reanalyzed the data by doing the following:

- remove the 8 transcripts corresponding to globin genes from Salmon output files (quant.sf)

- rerun Tximport with scaledTPM option

txi <- tximport(files, type="salmon", countsFromAbundance="scaledTPM", tx2gene=tx2gene)

- load the rounded txi$counts in DEseq2

dds <- DESeqDataSetFromMatrix(round(txi$counts), sampleTable, ~condition) )

 

The results:

- TXI count values are almost unchanged in the non-blood sample and almost doubled in the blood sample (which seems OK since blood genes made about 50% of the total read counts)

- FC values are however almost unchanged (quite unexepected to me!)

Is this related to your comment about the way normalization is made in DESeq2?

 

ADD COMMENT
0
Entering edit mode
Yes
ADD REPLY
0
Entering edit mode
@brunosaubamea-13693
Last seen 7.3 years ago

Thank you for these explanations Michael!

ADD COMMENT

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6