Problem with TCGAbiolinks Enrichment analysis
2
1
Entering edit mode
david.tieri ▴ 10
@davidtieri-13174
Last seen 6.9 years ago

Hi,

I am trying to follow the TCGAbiolinks Vignette in the "Analyze the data" section about differential expression and enrichment analysis, but I am running into a problem. When I load the dataset dataBRCA which is included in the TCGAbiolinks library, the enrichment analysis from the vignette works; When I download and use my own data using the functionality provided by TCGAbiolinks, it does not work (it says ResBP=NA, ResMF=NA, ResCC=NA, ResPat=NA). In the vignette, I noticed that the dataBRCA is normalized using the TCGAanalyze_Normalization function with the geneInfo = geneInfo option, while my data is HTSeq - counts, and must be normalized with the geneInfo = geneInfoHT option. What is the difference between the two options, and could this be the problem?

 

Here is my R code:

CancerProject <- "TCGA-LUSC"

query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification",
                  workflow.type = "HTSeq - Counts")

samplesDown <- getResults(query,cols=c("cases"))

dataSmTPNT<-TCGAquery_MatchedCoupledSampleTypes(barcode = samplesDown,c("NT","TP"))

queryDown <- GDCquery(project = CancerProject,
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification",
                      workflow.type = "HTSeq - Counts",
                      barcode = dataSmTPNT)
        

dataPrep <- GDCprepare(query = queryDown,
                       save = TRUE,
                       save.filename="dataPrep.rda")

dataPrep2 <- TCGAanalyze_Preprocessing(object = dataPrep,
                                      cor.cut = 0.6,
                                      datatype = "HTSeq - Counts")                      

dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep2,
                                      geneInfo = geneInfoHT,
                                      method = "geneLength")

dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm, method = "quantile", qnt.cut =  0.9) 

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,1:(length(dataSmTPNT)/2)],
                            mat2 = dataFilt[,(1+length(dataSmTPNT)/2):length(dataSmTPNT)],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")  

S <- TCGAquery_SampleTypes(colnames(dataFilt),"NT")
S2 <- TCGAquery_SampleTypes(colnames(dataFilt),"TP")


dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,S],
                            mat2 = dataFilt[,S2],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")


# DEGs table with expression values in normal and tumor samples
dataDEGsFiltLevel <- TCGAanalyze_LevelTab(dataDEGs,"Normal","Tumor",dataFilt[,S],dataFilt[,S2])
dataDEGsFiltLevel

# BLEOW DOESNT WORK

Genelist <- rownames(dataDEGsFiltLevel)

system.time(ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist))

# Enrichment Analysis EA (TCGAVisualize)
# Gene Ontology (GO) and Pathway enrichment barPlot

TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
                        GOBPTab = ansEA$ResBP,
                        GOCCTab = ansEA$ResCC,
                        GOMFTab = ansEA$ResMF,
                        PathTab = ansEA$ResPat,
                        nRGTab = Genelist,
                        nBar = 10)

 

Thanks David

tcgabiolinks • 1.4k views
ADD COMMENT
0
Entering edit mode
david.tieri ▴ 10
@davidtieri-13174
Last seen 6.9 years ago

I think I figured it out. I changed my Genelist from:

Genelist <- rownames(dataDEGsFiltLevel)

to:

Genelist <- dataDEGsFiltLevel$external_gene_name

And it worked. The first way supplies the TCGAanalyze_EAcomplete with a list of ensemble id's, and the second way supplies it with a list of external gene names.

 

 

ADD COMMENT
0
Entering edit mode
conico • 0
@conico-13319
Last seen 6.8 years ago

Thank you very much David, it was very useful

 

ADD COMMENT

Login before adding your answer.

Traffic: 665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6