Entering edit mode
dydrkq777
•
0
@dydrkq777-16597
Last seen 6.4 years ago
Hi, i'm Yonggab.
I downloaded the TCGA Ovarian Cancer Data using TCGAbiolinks package.
(Transcriptome profiling / gene expression quantification / HTSeq-Count)
But, I found out that 5% of the data was filtered (losing).
Can you tell me why it was filtered?
And is there way to see this in R? ( I have whole gene set )
Hello,
To create a summarizedExperiemnt object in TCGAbiolinks we have to map genes, probes etc into genomic coordinates. We decide to map those to the most updated ENSEMBL patched version. For humans those versions are GRCh38.p12 (hg38) and GRCh19.p13 (hg19). Since the version we are using, some of the genes were retired (https://www.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000234536, https://www.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000230939, https://www.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000069712).
You can get an unmodified version if you set the argument SummarizedExperiment to FALSE (there is an example here http://rpubs.com/tiagochst/five_5_percent_doubt with a table of the genes we could not map in the most updated version)