Search
Question: Problem with TCGAbiolinks Enrichment analysis
1
gravatar for david.tieri
5 months ago by
david.tieri10
david.tieri10 wrote:

Hi,

I am trying to follow the TCGAbiolinks Vignette in the "Analyze the data" section about differential expression and enrichment analysis, but I am running into a problem. When I load the dataset dataBRCA which is included in the TCGAbiolinks library, the enrichment analysis from the vignette works; When I download and use my own data using the functionality provided by TCGAbiolinks, it does not work (it says ResBP=NA, ResMF=NA, ResCC=NA, ResPat=NA). In the vignette, I noticed that the dataBRCA is normalized using the TCGAanalyze_Normalization function with the geneInfo = geneInfo option, while my data is HTSeq - counts, and must be normalized with the geneInfo = geneInfoHT option. What is the difference between the two options, and could this be the problem?

 

Here is my R code:

CancerProject <- "TCGA-LUSC"

query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification",
                  workflow.type = "HTSeq - Counts")

samplesDown <- getResults(query,cols=c("cases"))

dataSmTPNT<-TCGAquery_MatchedCoupledSampleTypes(barcode = samplesDown,c("NT","TP"))

queryDown <- GDCquery(project = CancerProject,
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification",
                      workflow.type = "HTSeq - Counts",
                      barcode = dataSmTPNT)
        

dataPrep <- GDCprepare(query = queryDown,
                       save = TRUE,
                       save.filename="dataPrep.rda")

dataPrep2 <- TCGAanalyze_Preprocessing(object = dataPrep,
                                      cor.cut = 0.6,
                                      datatype = "HTSeq - Counts")                      

dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep2,
                                      geneInfo = geneInfoHT,
                                      method = "geneLength")

dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm, method = "quantile", qnt.cut =  0.9) 

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,1:(length(dataSmTPNT)/2)],
                            mat2 = dataFilt[,(1+length(dataSmTPNT)/2):length(dataSmTPNT)],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")  

S <- TCGAquery_SampleTypes(colnames(dataFilt),"NT")
S2 <- TCGAquery_SampleTypes(colnames(dataFilt),"TP")


dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,S],
                            mat2 = dataFilt[,S2],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")


# DEGs table with expression values in normal and tumor samples
dataDEGsFiltLevel <- TCGAanalyze_LevelTab(dataDEGs,"Normal","Tumor",dataFilt[,S],dataFilt[,S2])
dataDEGsFiltLevel

# BLEOW DOESNT WORK

Genelist <- rownames(dataDEGsFiltLevel)

system.time(ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist))

# Enrichment Analysis EA (TCGAVisualize)
# Gene Ontology (GO) and Pathway enrichment barPlot

TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
                        GOBPTab = ansEA$ResBP,
                        GOCCTab = ansEA$ResCC,
                        GOMFTab = ansEA$ResMF,
                        PathTab = ansEA$ResPat,
                        nRGTab = Genelist,
                        nBar = 10)

 

Thanks David

ADD COMMENTlink modified 5 months ago by conico0 • written 5 months ago by david.tieri10
0
gravatar for david.tieri
5 months ago by
david.tieri10
david.tieri10 wrote:

I think I figured it out. I changed my Genelist from:

Genelist <- rownames(dataDEGsFiltLevel)

to:

Genelist <- dataDEGsFiltLevel$external_gene_name

And it worked. The first way supplies the TCGAanalyze_EAcomplete with a list of ensemble id's, and the second way supplies it with a list of external gene names.

 

 

ADD COMMENTlink written 5 months ago by david.tieri10
0
gravatar for conico
5 months ago by
conico0
conico0 wrote:

Thank you very much David, it was very useful

 

ADD COMMENTlink written 5 months ago by conico0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 129 users visited in the last hour