Search
Question: How to deal with contaminating transcripts in TxImport/DESeq2?
0
gravatar for bruno.saubamea
3 months ago by
bruno.saubamea0 wrote:

Dear all,

I do DGE analysis (sample A vs sample B) using Salmon then Tximport (without scaling) then DEseq2. Everything works fine but my problem is that sample A is contaminated with red blood cells so that half of the total read counts map on globin transcripts. So TPM/counts values in sample A and FC values (A vs B) are artificially lowered.

My question : can I manually remove the contaminating transcripts (less than 10 transcripts, easy to identify) from the TXI output file before loading into DESeq2? If doing so, should I use scaledTPM values?

Thank you for your help.

Bruno


 

ADD COMMENTlink modified 3 months ago • written 3 months ago by bruno.saubamea0
0
gravatar for Michael Love
3 months ago by
Michael Love14k
United States
Michael Love14k wrote:

The point of the size factor normalization in DESeq2 and other RNA-seq software is such that, even though the counts for non-globin genes are lower in the blood samples, the median of ratios over all genes will balance this out (see DESeq2 paper for how size factor normalization is accomplished). You can take a look at an MA plot to confirm that the normalization "works", that the globin genes are "up" in blood, and the rest of the distribution is roughly centered on y=0. I would not recommend removing transcripts. The point of normalization is to deal with this.

ADD COMMENTlink written 3 months ago by Michael Love14k
0
gravatar for bruno.saubamea
3 months ago by
bruno.saubamea0 wrote:

OK. Just to make sure that I understand correctly, I reanalyzed the data by doing the following:

- remove the 8 transcripts corresponding to globin genes from Salmon output files (quant.sf)

- rerun Tximport with scaledTPM option

txi <- tximport(files, type="salmon", countsFromAbundance="scaledTPM", tx2gene=tx2gene)

- load the rounded txi$counts in DEseq2

dds <- DESeqDataSetFromMatrix(round(txi$counts), sampleTable, ~condition) )

 

The results:

- TXI count values are almost unchanged in the non-blood sample and almost doubled in the blood sample (which seems OK since blood genes made about 50% of the total read counts)

- FC values are however almost unchanged (quite unexepected to me!)

Is this related to your comment about the way normalization is made in DESeq2?

 

ADD COMMENTlink written 3 months ago by bruno.saubamea0
Yes
ADD REPLYlink written 3 months ago by Michael Love14k
0
gravatar for bruno.saubamea
3 months ago by
bruno.saubamea0 wrote:

Thank you for these explanations Michael!

ADD COMMENTlink written 3 months ago by bruno.saubamea0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 117 users visited in the last hour