Hi there, I am having difficuties to integrate the TCGA dataset into Bioconductor, particularly in order to use limma package for differential analysis of gene expression. Can anyone help me please?
Hi there, I am having difficuties to integrate the TCGA dataset into Bioconductor, particularly in order to use limma package for differential analysis of gene expression. Can anyone help me please?
"My problem is, how can I create the phenotypic data to integrate with my expression data?"
I'm new to microarray data analysis but to address this particular issue I've created the data frame containing phenotype data in Excel. See the attached screenshot.
The number of rows has to match the number of columns (samples) in your ExpressionSet. Only sampleID is mandatory and the rest of the pheno data you can input to your liking. Once you have that typed into Excel, save it as a tab-delimited txt file. Then, you can plug the pheno data into your ExpressionSet using the function read.AnnotatedDataFrame. The last line is to check if the pheno data is there.
pd = read.AnnotatedDataFrame(filename="YourPhenoData.txt", stringsAsFactors = TRUE) phenoData(YourExpressionSet) = pd pData(YourExpressionSet)
I don't know what format you attached, but if you can manage to load the expression values into R as a matrix (named here as evals
), then you can do something like:
require(limma) fit <- lmFit(evals, design) fit <- eBayes(fit, trend=TRUE, robust=TRUE) de <- topTable(fit, coef=coef, number=Inf)
where design
is the design matrix (constructed from your phenotypic data), and coef
is the coefficient corresponding to your DE comparison of interest. Everything's modular, so you can just skip the normalization step.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you be a bit more specific about what problems you are having? What have you tried? What errors did you encounter?