subtracting or normalizing RNAseq reads when samples have contaminating reads
1
0
Entering edit mode
Yuqia • 0
@yuqia-15072
Last seen 2.6 years ago
Switzerland

Hi everyone,

I have RNAseq data of cells sorted from mix cultures of 2 cell types as well as pure cell culture from only one of these 2 types. Read counts of certain transcripts specifically expressed in only one of the 2 cell types show that the sorted sample of one type always has a small amount of contamination from the other type, and vice versa. The contaminating read amounts vary from one replicate to another. Western blots confirm this contamination.

How do I normalize the transcript read counts for all other genes with this variable contamination? Does DESeq2, edgeR or any other software have a way to deal with this problem? If so, could you please point to the correct way of doing it?

Thank you very much!

Normalization RNASeq contamination edgeR DESeq2 • 1.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

In DESeq2 if you want to control sample A_i for amounts in sample B_i you can use a design with a term for i, e.g. as is done with ~donor + condition.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thank you for your answer. However, I'm not sure what "i" refers to in your example.

To make my questions more specific, below is an example of the raw read counts for 2 cell-specific genes in 2 biological replicates of each sample (I have 3 replicates and time course, but want to simplify the example):

  • gene alpha: expressed only in cell type A
  • gene beta: expressed only in cell type B

    sample     A.1    A.2     Amix.1    Amix.2    B.1    B.2     Bmix.1    Bmix.2
    alpha      2116   145     2091      179       0      0       45        13
    beta       1      0       278       12754     1609   38043   1648      80947
    

In Bmix samples, the alpha counts are contamination from cells A. In Amix samples, the beta counts are contamination from cells B.

Since I have 2 batches (batch1: rep1, batch2: rep2+rep3), I have included this batch in the design : ~batch + condition

The conditions of this design are A, Amix, B, and Bmix. I'd like to obtain differentially expressed genes of Amix vs A, Bmix vs. B, and Amix vs Bmix.

1) Should I add two more terms in the design ('alpha' and 'beta') in the colData file? If yes, all mixed samples should be "yes" for both alpha & beta, whereas pure samples should be "yes" only for the corresponding cell-specific gene? Then the correct design formula is ~batch + alpha + beta + condition ?

2) In case I decide not to look at Amix vs. Bmix, is it better to do 2 separate analyses for Amix vs. A, and Bmix vs. B, using 2 separate count tables?

Thanks a lot in advance for your reply!

ADD REPLY
0
Entering edit mode

For selecting the design for use with DESeq2, I'd recommend to work with a statistician. There are many choices here, and these reflect assumptions about your experiment. I unfortunately don't have time right now to help with these decisions on the support site.

ADD REPLY
0
Entering edit mode

Thank you Michael for your advice.

ADD REPLY

Login before adding your answer.

Traffic: 754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6