DESeq2 - design for TCGA level 3 data
1
1
Entering edit mode
anand m t ▴ 40
@anand-m-t-4859
Last seen 5.1 years ago
Singapore

Hi all,

I am using TCGA level 3 data (rsem raw counts) for samples with matched normal to analyse differential expression using DESeq2. I was wondering the design I am using is correct.

coldata looks like this.

DataFrame with 118 rows and 2 columns
condition          pid
<factor>     <factor>
TCGA.BJ.A28R.11A.11R.A16R.07 Normal_Tissue TCGA.BJ.A28R
TCGA.BJ.A28R.01A.11R.A16R.07 Primary_Tumor TCGA.BJ.A28R
TCGA.BJ.A28W.11A.11R.A32Y.07 Normal_Tissue TCGA.BJ.A28W
TCGA.BJ.A28W.01A.11R.A32Y.07 Primary_Tumor TCGA.BJ.A28W
TCGA.BJ.A28X.11A.11R.A22L.07 Normal_Tissue TCGA.BJ.A28X
...                                    ...          ...
TCGA.KS.A41I.01A.11R.A23N.07 Primary_Tumor TCGA.KS.A41I
TCGA.KS.A41J.11A.12R.A23N.07 Normal_Tissue TCGA.KS.A41J
TCGA.KS.A41J.01A.11R.A23N.07 Primary_Tumor TCGA.KS.A41J
TCGA.KS.A41L.11A.11R.A23N.07 Normal_Tissue TCGA.KS.A41L
TCGA.KS.A41L.01A.11R.A23N.07 Primary_Tumor TCGA.KS.A41L

design I am using is :

design = ~ condition + pid + pid:condition

Since each patient has a matched normal, I am putting an interaction in design. Is this the right way ?

Thanks.

2
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

I'd suggest:

~ pid + condition

This will test for the tumor vs normal effect, controlling for patient effect. The interaction term isn't really of interest here and additionally you don't have replicates to fit the dispersion after including the interaction term.

0
Entering edit mode

Hi Michael, I'm using a identical design formula, two samples per patient, normal and healthy tissue, wanting to test for the difference between normal and healthy while controlling for the patient effect, but it's telling me "factor levels were dropped which had no samples". In reading the vignette it says that 3 samples per unique combination are needed for controlling for count outliers so, in a situation like the one above does cooks distance need to be turned off? Would you expect that warning about factor levels for the above design formula. Thanks.

1
Entering edit mode

"factor levels were dropped which had no samples"

this is simply a message telling you that, for the factors in the design, there were levels which had no samples. You can continue. There is not a problem. (Unless you are surprised to find out that levels don't have samples, in which case you should figure out which samples might be missing and why.)

if you have your column data object, x, before constructing the DESeqDataSet, you can see what is happening:

levels(x$sample) levels(x$condition)

Where you should substitute 'sample' and 'condition' with the names of the appropriate columns in x.