Question

(how) can I make heatmaps/PCA in DESeq2 using normalized counts from cummeRbund?

0

Entering edit mode

Jon Bråte ▴ 250

@jon-brate-6263

Last seen 2.6 years ago

Norway

Hi,

I like to use DESeq2 for making PCA-plots and heat maps, but on a current dataset we only have count values from cufflinks/cummeRbund (exported using count() in cummeRbund). I know DESeq2 needs raw counts, but can I use these counts only for plotting/visualization? And can I perform the rlog-transformation on cufflinks normalized counts?

Thanks

deseq2 cummerbund cufflinks • 2.8k views

ADD COMMENT • link updated 8.7 years ago by Michael Love 41k • written 8.7 years ago by Jon Bråte ▴ 250

score 1 · Answer 1 · 2015-08-21

'Raw counts' from Tuxedo are not really raw counts, they're "raw pseudo counts" - So you won't get the type of data that DESeq2 excepts (short of rounding the values you get out of count in cummeRbund).

Assuming your output of count is called foo

counts_in <- ceiling(foo)
dds       <- DESeqDataSetFromMatrix(counts_in, 
                                    colData = data.frame(names=1:ncol(counts_in)), 
                                    design=~1)
rld       <- rlog(dds)
plotPCA(rld)

CummeRbund offers a method to perform a PCA of FPKM values, however if you want to use the DESeq2 methods, I'd recommend you follow the DESeq2 workflow: htSeq_Count from alignments -> DESeq2, rather than trying to manipulate the output of cummerbund.

score 1 · Answer 2 · 2015-08-21

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 14 hours ago

United States

We require integers as input to protect against users accidentally inputting FPKM or normalized counts (counts corrected for library size). In both of these cases, the precision has been altered from what is expected by the statistical model, so this really breaks the assumptions of our software. I will say that I've used the EDA and DE routines of DESeq2 on rounded estimated counts before, but only when I made sure that the value is an estimation of the count of fragments assigned to a gene (not transcript), and it has not been divided by a library size correction. One concern with this approach though, is if you use software which distributes fragments which could be assigned to many homologous genes, then if there is DE in one, it could be attributed to all the genes.

ADD COMMENT • link 8.7 years ago Michael Love 41k

0

Entering edit mode

Hello Michael:

Why is it not ok to round the counts assigned to transcripts?

Thanks,

Nik

ADD REPLY • link 8.4 years ago Nik Tuzov ▴ 80

1

Entering edit mode

I think rounded estimated *gene* counts are fine for DESeq2, but estimated transcript counts are negatively correlated within a gene -- there is a lot of additional variance from estimation uncertainty. DESeq2 is not built for transcript level analysis.