Question

subtracting or normalizing RNAseq reads when samples have contaminating reads

0

Entering edit mode

Yuqia • 0

@yuqia-15072

Last seen 3.8 years ago

Switzerland

Hi everyone,

I have RNAseq data of cells sorted from mix cultures of 2 cell types as well as pure cell culture from only one of these 2 types. Read counts of certain transcripts specifically expressed in only one of the 2 cell types show that the sorted sample of one type always has a small amount of contamination from the other type, and vice versa. The contaminating read amounts vary from one replicate to another. Western blots confirm this contamination.

How do I normalize the transcript read counts for all other genes with this variable contamination? Does DESeq2, edgeR or any other software have a way to deal with this problem? If so, could you please point to the correct way of doing it?

Thank you very much!

Normalization RNASeq contamination edgeR DESeq2 • 1.5k views

ADD COMMENT • link 4.2 years ago Yuqia • 0

score 0 · Answer 1 · 2021-04-29

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 days ago

United States

In DESeq2 if you want to control sample A_i for amounts in sample B_i you can use a design with a term for i, e.g. as is done with ~donor + condition.

ADD COMMENT • link 4.2 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thank you for your answer. However, I'm not sure what "i" refers to in your example.

To make my questions more specific, below is an example of the raw read counts for 2 cell-specific genes in 2 biological replicates of each sample (I have 3 replicates and time course, but want to simplify the example):

gene alpha: expressed only in cell type A

gene beta: expressed only in cell type B

sample     A.1    A.2     Amix.1    Amix.2    B.1    B.2     Bmix.1    Bmix.2
alpha      2116   145     2091      179       0      0       45        13
beta       1      0       278       12754     1609   38043   1648      80947

In Bmix samples, the alpha counts are contamination from cells A. In Amix samples, the beta counts are contamination from cells B.

Since I have 2 batches (batch1: rep1, batch2: rep2+rep3), I have included this batch in the design : ~batch + condition

The conditions of this design are A, Amix, B, and Bmix. I'd like to obtain differentially expressed genes of Amix vs A, Bmix vs. B, and Amix vs Bmix.

1) Should I add two more terms in the design ('alpha' and 'beta') in the colData file? If yes, all mixed samples should be "yes" for both alpha & beta, whereas pure samples should be "yes" only for the corresponding cell-specific gene? Then the correct design formula is ~batch + alpha + beta + condition ?

2) In case I decide not to look at Amix vs. Bmix, is it better to do 2 separate analyses for Amix vs. A, and Bmix vs. B, using 2 separate count tables?

Thanks a lot in advance for your reply!

ADD REPLY • link 4.2 years ago Yuqia • 0

0

Entering edit mode

For selecting the design for use with DESeq2, I'd recommend to work with a statistician. There are many choices here, and these reflect assumptions about your experiment. I unfortunately don't have time right now to help with these decisions on the support site.