Question

Gene expression differential analysis using TCGA dataset

0

Entering edit mode

hudacdif • 0

@hudacdif-7474

Last seen 9.6 years ago

United Kingdom

Hi there, I am having difficuties to integrate the TCGA dataset into Bioconductor, particularly in order to use limma package for differential analysis of gene expression. Can anyone help me please?

microarray limma • 4.2k views

ADD COMMENT • link updated 9.7 years ago by jaro.slamecka ▴ 140 • written 9.7 years ago by hudacdif • 0

0

Entering edit mode

Can you be a bit more specific about what problems you are having? What have you tried? What errors did you encounter?

ADD REPLY • link 9.7 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean, Thanks for your response.I have already create a minimal expression set for the expression data. My problem is, how can I create the phenotypic data to integrate with my expression data? I have include some data (3 patient samples out of 489) of gene expression level 3 from TCGA and also the METADATA and annotations in the attachment. I was hoping maybe you can try to take a look at it and help me. On Tue, Mar 17, 2015 at 8:39 AM, Sean Davis [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Sean Davis <https: support.bioconductor.org="" u="" 490=""/> wrote Comment: > Gene expression differential analysis using TCGA dataset > <https: support.bioconductor.org="" p="" 65733="" #65734="">: > > Can you be a bit more specific about what problems you are having? What > have you tried? What errors did you encounter? > > ------------------------------ > > You may reply via email or visit > C: Gene expression differential analysis using TCGA dataset >

ADD REPLY • link 9.7 years ago hudacdif • 0

score 2 · Accepted Answer · 2015-03-17

2

Entering edit mode

jaro.slamecka ▴ 140

@jaroslamecka-7419

Last seen 2.2 years ago

Mitchell Cancer Institute, Mobile AL, U…

"My problem is, how can I create the phenotypic data to integrate with my expression data?"

I'm new to microarray data analysis but to address this particular issue I've created the data frame containing phenotype data in Excel. See the attached screenshot.

The number of rows has to match the number of columns (samples) in your ExpressionSet. Only sampleID is mandatory and the rest of the pheno data you can input to your liking. Once you have that typed into Excel, save it as a tab-delimited txt file. Then, you can plug the pheno data into your ExpressionSet using the function read.AnnotatedDataFrame. The last line is to check if the pheno data is there.

pd = read.AnnotatedDataFrame(filename="YourPhenoData.txt", stringsAsFactors = TRUE)
phenoData(YourExpressionSet) = pd
pData(YourExpressionSet)

ADD COMMENT • link 9.7 years ago jaro.slamecka ▴ 140

0

Entering edit mode

Thanks for your answer. Now I already have the phenotypic data. But now I have another problem. My TCGA gene expression dataset (GBM) is in level 3, which means it has already been normalized. The data consist of gene names and expression values for each patients. Atached is the example of my data. If referred to the Limma package, it will normalize the data first. How can I feed the data into bioconductor using Limma, with an already normalized data as below to get the differential analysis? On Tue, Mar 17, 2015 at 5:18 PM, jaro.slamecka [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User jaro.slamecka <https: support.bioconductor.org="" u="" 7419=""/> wrote Answer: > Gene expression differential analysis using TCGA dataset > <https: support.bioconductor.org="" p="" 65733="" #65759="">: > > "My problem is, how can I create the phenotypic data to integrate with my > expression data?" > > I'm new to microarray data analysis but to address this particular issue > I've created the data frame containing phenotype data in Excel. See the > attached screenshot. > > [image: screenshot] > > The number of rows has to match the number of columns (samples) in your > ExpressionSet. Only sampleID is mandatory and the rest of the pheno data > you can input to your liking. Once you have that typed into Excel, save it > as a tab-delimited txt file. Then, you can plug the pheno data into your > ExpressionSet using the function read.AnnotatedDataFrame. The last line is > to check if the pheno data is there. > > pd = read.AnnotatedDataFrame(filename="YourPhenoData.txt", stringsAsFactors = TRUE) > phenoData(YourExpressionSet) = pd > pData(YourExpressionSet) > > ------------------------------ > > You may reply via email or visit > A: Gene expression differential analysis using TCGA dataset >

ADD REPLY • link 9.7 years ago hudacdif • 0

1

Entering edit mode

I don't know what format you attached, but if you can manage to load the expression values into R as a matrix (named here as evals), then you can do something like:

require(limma)
fit <- lmFit(evals, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
de <- topTable(fit, coef=coef, number=Inf)

where design is the design matrix (constructed from your phenotypic data), and coef is the coefficient corresponding to your DE comparison of interest. Everything's modular, so you can just skip the normalization step.

ADD REPLY • link 9.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for the answer. I am a bit confused about the coefficient. How do I choose the coefficient? On Thu, Mar 19, 2015 at 10:29 PM, Aaron Lun [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Aaron Lun <https: support.bioconductor.org="" u="" 6732=""/> wrote Comment: > Gene expression differential analysis using TCGA dataset > <https: support.bioconductor.org="" p="" 65733="" #65840="">: > > I don't know what format you attached, but if you can manage to load the > expression values into R as a matrix (named here as evals), then you can > do something like: > > require(limma) > fit <- lmFit(evals, design) > fit <- eBayes(fit, trend=TRUE, robust=TRUE) > de <- topTable(fit, coef=coef, number=Inf) > > where design is the design matrix (constructed from your phenotypic > data), and coef is the coefficient corresponding to your DE comparison of > interest. In this way, you can skip the normalization step. > > ------------------------------ > > You may reply via email or visit > C: Gene expression differential analysis using TCGA dataset >

ADD REPLY • link 9.7 years ago hudacdif • 0

1

Entering edit mode

Well, that depends on your design matrix, and what comparisons you want to make. Read Chapter 9 of the limma user's guide (accessible by running limmaUsersGuide() once you've loaded limma).

ADD REPLY • link 9.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

I have read the Chapter 9 of the limma user guide but still kind of confused. Do I have to used the contrast matrix? My design matrix looks like this and my data in expression set is like this On Fri, Mar 20, 2015 at 5:11 AM, Aaron Lun [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Aaron Lun <https: support.bioconductor.org="" u="" 6732=""/> wrote Comment: > Gene expression differential analysis using TCGA dataset > <https: support.bioconductor.org="" p="" 65733="" #65846="">: > > Well, that depends on your design matrix, and what comparisons you want to > make. Read Chapter 9 of the limma user's guide (accessible by running > limmaUsersGuide() once you've loaded limma). > > ------------------------------ > > You may reply via email or visit > C: Gene expression differential analysis using TCGA dataset >

ADD REPLY • link 9.7 years ago hudacdif • 0

0

Entering edit mode

We cannot see your design matrix. While we can be helpful here, I would suggest trying to find someone locally to help you. The process will likely go a bit quicker for you.

ADD REPLY • link 9.7 years ago Sean Davis 21k