I'm trying to work with TCGAbiolinks, but I get an error with TCGAanalyze_Normalization. I would appreciate any help troubleshooting this.
Commands:
>SKCM_query<-GDCquery("TCGA-SKCM",data.category="Transcriptome Profiling",data.type="Gene Expression Quantification",experimental.strategy="RNA-seq",access="open",sample.type=c("Primary solid Tumor","Metastatic"),workflow.type="HTSeq - Counts")
>GDCdownload(SKCM_query,directory="SKCM",method="client")
>SKCM_prepared<-GDCprepare(SKCM_query,save=TRUE,save.filename="SKCM.rda",directory="SKCM",summarizedExperiment=TRUE,remove.files.prepared=TRUE)
> SKCM_prepared
class: RangedSummarizedExperiment
dim: 57251 470
metadata(0):
assays(1): HTSeq - Counts
rownames(57251): ENSG00000000003 ENSG00000000005 ... ENSG00000281912 ENSG00000281920
rowData names(2): ensembl_gene_id external_gene_name
colnames(470): TCGA-EE-A2GO-06A-11R-A18S-07 TCGA-D9-A1X3-06A-11R-A18S-07 ... TCGA-EE-A2GN-06A-11R-A18S-07 TCGA-EE-A2GL-06A-11R-A18S-07
colData names(167): sample patient ... subtype_DIPYRIM.C.T.n.C.T..mut subtype_SHATTERSEEK_Chromothripsis_calls
>SKCM_preprocessed<-TCGAanalyze_Preprocessing(SKCM_prepared,cor.cut=0.6,filename="SKCM.png",datatype="HTSeq - Counts")
>SKCM_normalized<-TCGAanalyze_Normalization(tabDF="SKCM_preprocessed",geneInfo,method="gcContent")
Error in `rownames<-`(`*tmp*`, value = character(0)) :
attempt to set 'rownames' on an object with no dimensions
> traceback()
3: stop("attempt to set 'rownames' on an object with no dimensions")
2: `rownames<-`(`*tmp*`, value = character(0))
1: TCGAanalyze_Normalization(tabDF = "SKCM_preprocessed", geneInfo,
method = "gcContent")
> selectMethod("dimnames<-", c(class(SKCM_preprocessed), "list"))
function (x, value) .Primitive("dimnames<-")
Session Info:
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] SummarizedExperiment_1.2.3 Biobase_2.32.0 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 IRanges_2.6.1 S4Vectors_0.10.3
[7] BiocGenerics_0.18.0 TCGAbiolinks_2.0.13
I made a mistake in the command. The input for tabDF should not be in quotations, but now I have a new error. I think it may be that the geneInfo does not match the data set.
> SKCM_normalized<-TCGAanalyze_Normalization(tabDF=SKCM_preprocessed,geneInfo,method="gcContent")
I Need about 326 seconds for this Complete Normalization Upper Quantile [Processing 80k elements /s]
Step 1 of 4: newSeqExpressionSet ...
Step 2 of 4: withinLaneNormalization ...
Error in names(y) <- 1:length(y) :
'names' attribute [2] must be the same length as the vector [0]
Looks like that was the problem. Downloaded gene length and GC content based on the ENSEMBL gene IDs for this data. Works with no error.