Question

GOexpress: use DESeq2 data object as input

0

Entering edit mode

martin.hoelzer ▴ 20

@martinhoelzer-8847

Last seen 9.9 years ago

Germany

Hello,

I am trying to use the GOexpression package described here:

http://www.bioconductor.org/packages/release/bioc/vignettes/GOexpress/inst/doc/GOexpress-UsersGuide.pdf

I am followed the manual at first and everything works fine with the test data set (microarray data I suggest). Now I wanted to apply the package functionalities to my RNA-Seq data that I already analyzed using DESeq2.

I think the main problem is: how to came up with a

ExpressionSet data object

from a DESeq2 data object like produced by

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = "", design= ~ condition)

The manual here describes a ExpressionSet object:

https://bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf

but until now I was not able to figure out how I can build such a object from my RNA-Seq data for the use with GOexpress, like here with the test data used in the manual:

AlvMac_results <- GO_analyse(eSet = AlvMac, f = "Treatment")

Thanks for your help, best regards,
Martin

GOexpress deseq2 expressionset • 4.9k views

ADD COMMENT • link 10.3 years ago martin.hoelzer ▴ 20

0

Entering edit mode

kevin.rue ▴ 350

@kevinrue-6757

Last seen 21 months ago

University of Oxford

Dear Martin,

I am more a user of edgeR than DEseq2, please excuse me if my answer is slightly incorrect.

From a quick read of the DESeq2 manual (https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf), I see that the output of the DESeqDataSetFromHTSeqCount command is a "subclass of SummarizedExperiment", which is different from an ExpressionSet.

However, I am not quite certain that I understood properly the exact type of data that you have at the minute: is it *expression* or *differential expression* data? You mention having "already analyzed using DESeq2", which confuses me a bit.

Supporting SummarizedExperiment objects is on my to-do list, but I simply don't have the time to do this properly at the moment.

Kind regards

Kevin

ADD COMMENT • link 10.3 years ago kevin.rue ▴ 350

0

Entering edit mode

kevin.rue ▴ 350

@kevinrue-6757

Last seen 21 months ago

University of Oxford

As per separate email, the data necessary to build a minimal ExpressionSet compatible with GOexpress is:

assayData: a matrix (features by row, samples by column, with the value in each cell being the expression level of each feature in each sample)
phenoData: an AnnotatedDataFrame (samples by row, phenotypic covariates by column, with the value in each cell being the phenotype of each sample for each covariate)

Typically, I use the cpm() function of edgeR, to generate the assayData matrix, while the phenoData was prepared as an Excel/csv file defining the experimental design.

Given this, and appropriate annotation, GOexpress::GO_analyse() will then be able to compare the expression data between samples groups defined by a given covariate.

I hope that helps,otherwise please let me know how I can answer better your question.

Sincerely,

Kevin

ADD COMMENT • link 10.3 years ago kevin.rue ▴ 350

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 4 months ago

Germany

Hi,

a very good introduction to how to prepare an ExpressionSet can be found here - ExpressionSetIntroduction.pdf

Here you can find all what you need to create one.

Assa

ADD COMMENT • link 10.3 years ago Assa Yeroslaviz ★ 1.5k

score 2 · Accepted Answer · 2015-10-14

Dear Kevin,

thanks for your fast reply and sorry for the maybe confusing description.

Lets assume I have the following raw read counts of six mouse RNA-Seq samples in two conditions (lets say 'control' and 'treated', 3 replicates each), so a table like this:

	control_rep1	control_rep2	control_rep3	treated_rep1	treated_rep2	treated_rep3
ENSMUSG00000000001	6000	5754	6116	5560	5083	4952
ENSMUSG00000000003	0	0	0	0	0	0
ENSMUSG00000000028	868	881	844	952	840	818

that I can feed (together with information about the conditions, ...) into a DESeqDataSet which is a subclass of SummarizedExperiment, as you already mentioned.

From this DESeqDataSet I can run the DESeq() function to obtain again a DESeqDataSet including all of my input features by row and my samples by column, with the (normalized) expression values of each feature in each sample. If I understood you correctly, I should use this data matrix in the exprs slot of the ExpressionSet object.

Then I think I also need to build up an AnnotatedDataFrame holding each of my six samples by row and containing additional phenotypic information (for the phenodata slot of ExpressionSet). So regarding to the manual it would be enough to define two columns 'Treatment' and 'Timepoint' maybe like this:

	Treatment	Timepoint
control_rep1	control	0
control_rep2	control	0
control_rep3	control	0
treated_rep1	treated	0
treated_rep2	treated	0
treated_rep3	treated	0

So, if I understood this correctly until this point, I should be able to create a ExpressionSet object from this data and then put this into

GO_results <- GO_analyse(eSet = MyCustomizedSet, f = "Treatment")

Thanks for your help,
Martin

EDIT: since our last posts were somehow overlapping: yes, thanks for the information up to this point, I think I understand now what kind of data objects I have to prepare and will give it a try

score 2 · Accepted Answer · 2015-10-14

Dear Martin,

As far as I can see, you're right on all aspects in your latest post here.

Yes, I do recommend to input a matrix of pre-normalised values into the assayData slot of the ExpressionSet for GOexpress. (My point in developping GOexpress was not to compete with existing normalisation methods, it does not transform the input data in any way, but merely uses it for subsequent analyses such as machine learning and clustering).

Please do post a quick word here if this solves your problem! Otherwise, I will be happy to further help you, if necessary.

Regards

Kevin

score 2 · Accepted Answer · 2015-10-14

Hi,

I tested it now and it seems to be working, thanks! So what I did was just the following to create from my DESeqDataSet a simple ExpressionSet, so according to the example table above:

Start from:

dds <- DESeq(ddsHTSeq)

Followed by:

exprs <- counts(dds, normalized=T) # get normalized counts for each feature per sample
dimnames(exprs) = list(rownames(exprs), col_labels)  # set the sample names correctly

pData <- data.frame(Treatment=('control','control','control','treated','treated','treated'), Timepoint=c(0,0,0,0,0,0))

row.names(pData) <- colnames(exprs)

phenoData <- new("AnnotatedDataFrame", data=pData)

minimalSet <- ExpressionSet(assayData=exprs, phenoData=phenoData)

And now the minimalSet can be used with:

Go_results <- GO_analyse(eSet = minimalSet, f = "Treatment") #works now

Thanks for all your help, best,
Martin