Get read counts for TCGA samples via recount3
1
0
Entering edit mode
@mariokeller1988-13652
Last seen 1 day ago
Germany

Hi,

I created a RangedSummarizedExperiment object via the create_rse() function of the recount3 package (project == "LIHC" & file_source == "tcga"). The next step would be to extract the read counts for all available genes to use these counts for normalization and transformation via the estimateSizeFactors and vst function of DESeq2.

If I understand it correctly after using create_rse() I only have raw base-pair coverage counts stored in the "raw_counts" assay, which is not what I need.

In the quick start guide is written:

Using transform_counts() you can scale the counts and assign them to the “counts” assays slot to use them in downstream packages such as DESeq2 and limma.

So it looks like that transform_counts() is the function which gives me the gene read counts I need for my further analyses. But what about the compute_read_counts() function? What is the difference between transform_counts() and compute_read_counts()?

Mario

recount3 • 163 views
0
Entering edit mode
Last seen 10 days ago
United States

Hi,

Thank you for using recount3.

compute_read_counts() http://research.libd.org/recount3/reference/compute_read_counts.html gives you actual read counts similar to other RNA-seq processing pipelines you might be familiar with. transform_counts() http://research.libd.org/recount3/reference/transform_counts.html gives you scaled counts such that counts are based on a total library size of 40 million mapped reads (by default). You likely just need compute_read_counts() for your use case.

compute_read_counts() is equivalent to recount::read_counts() http://leekgroup.github.io/recount/reference/read_counts.html whereas transform_counts() is similar to recount::scale_counts() http://leekgroup.github.io/recount/reference/scale_counts.html. See this older related thread recount counts in the example experiment.

Best, Leo

0
Entering edit mode

Hi Leo,

thanks for the clarification. As I continued to work with the counts generated by transform_counts() I was wondering whether it is necessary to repeat all analyses with the counts generated by compute_read_counts() or if the results would in the end be the same.

If I understand it correctly the counts generated by transform_counts() are already normalized for sequencing depth, while the counts from compute_read_counts() are not. So if we assume I normalize the scaled (transform_counts) and unscaled (compute_read_counts) counts with the DESeq2 estimateSizeFactors() function, the size factors for scaled counts should not account for any differences in sequencing depth, while the size factors of the unscaled counts should. Shouldn't I expect that the normalized counts of the two approaches are roughly the same?

I hope you get my idea.

My overall aim is to have DESeq2 normalized and vst-stabilized expression estimates for correlation analyses.

Best Mario