Question: How to deal with contaminating transcripts in TxImport/DESeq2?
0
21 months ago by
bruno.saubamea0 wrote:

Dear all,

I do DGE analysis (sample A vs sample B) using Salmon then Tximport (without scaling) then DEseq2. Everything works fine but my problem is that sample A is contaminated with red blood cells so that half of the total read counts map on globin transcripts. So TPM/counts values in sample A and FC values (A vs B) are artificially lowered.

My question : can I manually remove the contaminating transcripts (less than 10 transcripts, easy to identify) from the TXI output file before loading into DESeq2? If doing so, should I use scaledTPM values?

Bruno

deseq2 salmon tximport • 690 views
modified 21 months ago • written 21 months ago by bruno.saubamea0
Answer: How to deal with contaminating transcripts in TxImport/DESeq2?
0
21 months ago by
Michael Love23k
United States
Michael Love23k wrote:

The point of the size factor normalization in DESeq2 and other RNA-seq software is such that, even though the counts for non-globin genes are lower in the blood samples, the median of ratios over all genes will balance this out (see DESeq2 paper for how size factor normalization is accomplished). You can take a look at an MA plot to confirm that the normalization "works", that the globin genes are "up" in blood, and the rest of the distribution is roughly centered on y=0. I would not recommend removing transcripts. The point of normalization is to deal with this.

Answer: How to deal with contaminating transcripts in TxImport/DESeq2?
0
21 months ago by
bruno.saubamea0 wrote:

OK. Just to make sure that I understand correctly, I reanalyzed the data by doing the following:

- remove the 8 transcripts corresponding to globin genes from Salmon output files (quant.sf)

- rerun Tximport with scaledTPM option

txi <- tximport(files, type="salmon", countsFromAbundance="scaledTPM", tx2gene=tx2gene)

- load the rounded txi$counts in DEseq2 dds <- DESeqDataSetFromMatrix(round(txi$counts), sampleTable, ~condition) )

The results:

- TXI count values are almost unchanged in the non-blood sample and almost doubled in the blood sample (which seems OK since blood genes made about 50% of the total read counts)

- FC values are however almost unchanged (quite unexepected to me!)

Yes
Answer: How to deal with contaminating transcripts in TxImport/DESeq2?
0
21 months ago by
bruno.saubamea0 wrote:

Thank you for these explanations Michael!