Hi,
I am trying to follow the TCGAbiolinks Vignette in the "Analyze the data" section about differential expression and enrichment analysis, but I am running into a problem. When I load the dataset dataBRCA which is included in the TCGAbiolinks library, the enrichment analysis from the vignette works; When I download and use my own data using the functionality provided by TCGAbiolinks, it does not work (it says ResBP=NA, ResMF=NA, ResCC=NA, ResPat=NA). In the vignette, I noticed that the dataBRCA is normalized using the TCGAanalyze_Normalization function with the geneInfo = geneInfo option, while my data is HTSeq - counts, and must be normalized with the geneInfo = geneInfoHT option. What is the difference between the two options, and could this be the problem?
Here is my R code:
CancerProject <- "TCGA-LUSC"
query <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
samplesDown <- getResults(query,cols=c("cases"))
dataSmTPNT<-TCGAquery_MatchedCoupledSampleTypes(barcode = samplesDown,c("NT","TP"))
queryDown <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = dataSmTPNT)
dataPrep <- GDCprepare(query = queryDown,
save = TRUE,
save.filename="dataPrep.rda")
dataPrep2 <- TCGAanalyze_Preprocessing(object = dataPrep,
cor.cut = 0.6,
datatype = "HTSeq - Counts")
dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep2,
geneInfo = geneInfoHT,
method = "geneLength")
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm, method = "quantile", qnt.cut = 0.9)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,1:(length(dataSmTPNT)/2)],
mat2 = dataFilt[,(1+length(dataSmTPNT)/2):length(dataSmTPNT)],
Cond1type = "Normal",
Cond2type = "Tumor",
fdr.cut = 0.01 ,
logFC.cut = 1,
method = "glmLRT")
S <- TCGAquery_SampleTypes(colnames(dataFilt),"NT")
S2 <- TCGAquery_SampleTypes(colnames(dataFilt),"TP")
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,S],
mat2 = dataFilt[,S2],
Cond1type = "Normal",
Cond2type = "Tumor",
fdr.cut = 0.01 ,
logFC.cut = 1,
method = "glmLRT")
# DEGs table with expression values in normal and tumor samples
dataDEGsFiltLevel <- TCGAanalyze_LevelTab(dataDEGs,"Normal","Tumor",dataFilt[,S],dataFilt[,S2])
dataDEGsFiltLevel
# BLEOW DOESNT WORK
Genelist <- rownames(dataDEGsFiltLevel)
system.time(ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist))
# Enrichment Analysis EA (TCGAVisualize)
# Gene Ontology (GO) and Pathway enrichment barPlot
TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
GOBPTab = ansEA$ResBP,
GOCCTab = ansEA$ResCC,
GOMFTab = ansEA$ResMF,
PathTab = ansEA$ResPat,
nRGTab = Genelist,
nBar = 10)
Thanks David