i have imported from the R package curatedCRCdata, an RNA-Seq dataset regarding a specific type of cancer. In detail, after contacting the maintainers of the package, they kindely provided the information, that the data are IlluminaHiSeq_RNASeqV2 Level 3 data, which were quantified using RSEM (https://wiki.nci.nih.gov/display/tcga/rnaseq+version+2). When I load the dataset from the package:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20502 features, 195 samples
element names: exprs
sampleNames: TCGA.AA.3662 TCGA.A6.4105 ... TCGA.A6.6652 (195 total)
varLabels: unique_patient_ID alt_sample_name ... uncurated_author_metadata (59 total)
featureNames: ? A1BG ... ZZZ3 (20502 total)
fvarLabels: probeset gene
experimentData: use 'experimentData(object)'
head(exprs(TCGA.RNASeqV2_eset)) # a small output
TCGA.AA.3662 TCGA.A6.4105 TCGA.F4.6463 TCGA.F4.6806 TCGA.A6.6650 TCGA.AZ.6600
? 9.282712 9.933779 10.0443941 9.910121 10.088809 9.875006 9.893715
A1BG 6.027692 4.707000 3.6559242 5.592373 3.253914 5.622860 3.563683
A1CF 8.273718 7.445153 7.0927571 7.118704 7.855290 8.577953 7.774680
 0.00000 20.34961
My main questions are the following:
Because from a relative search in other posts/papers, the RSEM does not provide "essentially raw counts", but estimated counts (which are also not rounded). Thus:
- Is it possible to use for downstream analysis for RNA-Seq data like edgeR ? Or because the counts have e to be integer, other methodologies/packages are eligible for a simple differential expression analysis (for instance a two-group comparison) ?
- Secondly, also from the values above, it seems that the counts are also somehow normalized or transformed (perhaps it is the output of "rsem.genes.normalized_results" in the above link of wiki NCI). Unfortunately, i could not find further information about the above level 3 transformation (for how the rsem counts are normalized). Again, a proper normalization methodology (like TMM) should be necessary for any downstream analysis ?
Please excuse me for any naive questions, but im relatively new to RNA-Seq and at this point any suggestions or opinions would be essential !!