Question

Spearman correlation for RNA SEQ data

0

Entering edit mode

biom.andressa • 0

@biomandressa-23774

Last seen 3.8 years ago

Brazil

Hello,

I am analyzing TCGA data, RNA SEQ, of tumoral tissues. I would like to perform a correlation analysis with gene expression (to see if gene expression of Gene 1 correlates with Gene 2 in the sample, for example) and some clinical data (like alpha-fetoprotein levels, age, bilirubin levels...).

My doubt is: should I use FPKM data or normalized counts generated by Deseq2? Or something else?

Thank you!

deseq2 rnaseq statistic • 2.1k views

ADD COMMENT • link updated 3.8 years ago by Robert Castelo ★ 3.3k • written 3.8 years ago by biom.andressa • 0

score 0 · Answer 1 · 2020-07-06

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 13 hours ago

United States

If you want to use Spearman correlation, the SAMseq function implements this with resp.type="Quantitative"

https://www.rdocumentation.org/packages/samr/versions/3.0/topics/SAMseq

In DESeq2 you could add numeric covariates to the design, which assumes that unit changes in the covariate correspond to constant fold changes in the counts.

ADD COMMENT • link 3.8 years ago Michael Love 41k

0

Entering edit mode

Ok thanks! But my main concern is what type of data should I use for input for analysis in this case.

ADD REPLY • link 3.8 years ago biom.andressa • 0

1

Entering edit mode

Input to DESeq2 and SAMseq is original counts, not scaled counts ("normalized counts"), and not FPKM.

ADD REPLY • link 3.8 years ago Michael Love 41k

score 0 · Answer 2 · 2020-07-07

You may also use the edgeR package, whose starting point are also raw integer counts. Once you've built a DGEList object and calculated normalization factors with calcNormFactors(), the function cpm() can provide you continuous log-CPM units of expression suitable to be used for clustering and other gene-correlation purposes (see subsection 2.16 from the edgeR User's Guide). As a side note, you might also want to look at this preprint, which investigates proper ways of calculating correlations between genes in RNA-seq data, providing an R package called spqn that implements the approach in the preprint. Note, however, that the package is still not in Bioconductor and the preprint has still not gone through peer-review and therefore, you would have to contact directly the authors to get support in using their method, e.g., opening an issue in the GitHub repo.