GOexpress: use DESeq2 data object as input
6
0
Entering edit mode
@martinhoelzer-8847
Last seen 5.5 years ago
Germany

Hello,

I am trying to use the GOexpression package described here:

http://www.bioconductor.org/packages/release/bioc/vignettes/GOexpress/inst/doc/GOexpress-UsersGuide.pdf

I am followed the manual at first and everything works fine with the test data set (microarray data I suggest). Now I wanted to apply the package functionalities to my RNA-Seq data that I already analyzed using DESeq2.

I think the main problem is: how to came up with a

ExpressionSet data object

from a DESeq2 data object like produced by

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = "", design= ~ condition)

The manual here describes a ExpressionSet object:

https://bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf

but until now I was not able to figure out how I can build such a object from my RNA-Seq data for the use with GOexpress, like here with the test data used in the manual:

AlvMac_results <- GO_analyse(eSet = AlvMac, f = "Treatment")

Thanks for your help, best regards,
Martin

GOexpress deseq2 expressionset • 2.2k views
2
Entering edit mode
@martinhoelzer-8847
Last seen 5.5 years ago
Germany

Dear Kevin,

thanks for your fast reply and sorry for the maybe confusing description.

Lets assume I have the following raw read counts of six mouse RNA-Seq samples in two conditions (lets say 'control' and 'treated', 3 replicates each), so a table like this:

control_rep1 control_rep2 control_rep3 treated_rep1 treated_rep2 treated_rep3
ENSMUSG00000000001 6000 5754 6116 5560 5083 4952
ENSMUSG00000000003 0 0 0 0 0 0
ENSMUSG00000000028 868 881 844 952 840 818

that I can feed (together with information about the conditions, ...) into a DESeqDataSet which is a subclass of SummarizedExperiment, as you already mentioned.

From this DESeqDataSet I can run the DESeq() function to obtain again a  DESeqDataSet including all of my input features by row and my samples by column, with the (normalized) expression values of each feature in each sample. If I understood you correctly, I should use this data matrix in the exprs slot of the ExpressionSet object.

Then I think I also need to build up an AnnotatedDataFrame holding each of my six samples by row and containing additional phenotypic information (for the phenodata slot of ExpressionSet). So regarding to the manual it would be enough to define two columns 'Treatment' and 'Timepoint' maybe like this:

Treatment Timepoint
control_rep1 control 0
control_rep2 control 0
control_rep3 control 0
treated_rep1 treated 0
treated_rep2 treated 0
treated_rep3 treated 0

So, if I understood this correctly until this point, I should be able to create a ExpressionSet object from this data and then put this into

GO_results <- GO_analyse(eSet = MyCustomizedSet, f = "Treatment")

Martin

EDIT: since our last posts were somehow overlapping: yes, thanks for the information up to this point, I think I understand now what kind of data objects I have to prepare and will give it a try

2
Entering edit mode
kevin.rue ▴ 300
@kevinrue-6757
Last seen 4 months ago
University of Oxford

Dear Martin,

As far as I can see, you're right on all aspects in your latest post here.

Yes, I do recommend to input a matrix of pre-normalised values into the assayData slot of the ExpressionSet for GOexpress. (My point in developping GOexpress was not to compete with existing normalisation methods, it does not transform the input data in any way, but merely uses it for subsequent analyses such as machine learning and clustering).

Please do post a quick word here if this solves your problem! Otherwise, I will be happy to further help you, if necessary.

Regards

Kevin

2
Entering edit mode
@martinhoelzer-8847
Last seen 5.5 years ago
Germany

Hi,

I tested it now and it seems to be working, thanks! So what I did was just the following to create from my DESeqDataSet a simple ExpressionSet, so according to the example table above:

Start from:

dds <- DESeq(ddsHTSeq)

Followed by:

exprs <- counts(dds, normalized=T) # get normalized counts for each feature per sample
dimnames(exprs) = list(rownames(exprs), col_labels)  # set the sample names correctly

pData <- data.frame(Treatment=('control','control','control','treated','treated','treated'), Timepoint=c(0,0,0,0,0,0))

row.names(pData) <- colnames(exprs)

phenoData <- new("AnnotatedDataFrame", data=pData)

minimalSet <- ExpressionSet(assayData=exprs, phenoData=phenoData)

And now the minimalSet can be used with:

Go_results <- GO_analyse(eSet = minimalSet, f = "Treatment") #works now

​Thanks for all your help, best,
Martin

0
Entering edit mode

Could you please close this discussion by "accepting" the answer? Just to show how that the issue was resolved, if future users come around.

As I understand, there should be a button available for you to click, as you are the person who created this discussion.

Thanks!

0
Entering edit mode
kevin.rue ▴ 300
@kevinrue-6757
Last seen 4 months ago
University of Oxford

Dear Martin,

I am more a user of edgeR than DEseq2, please excuse me if my answer is slightly incorrect.

From a quick read of the DESeq2 manual (https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf), I see that the output of the DESeqDataSetFromHTSeqCount command is a "subclass of SummarizedExperiment", which is different from an ExpressionSet.

However, I am not quite certain that I understood properly the exact type of data that you have at the minute: is it *expression* or *differential expression* data? You mention having "already analyzed using DESeq2", which confuses me a bit.

Supporting SummarizedExperiment objects is on my to-do list, but I simply don't have the time to do this properly at the moment.

Kind regards

Kevin

0
Entering edit mode
kevin.rue ▴ 300
@kevinrue-6757
Last seen 4 months ago
University of Oxford

As per separate email, the data necessary to build a minimal ExpressionSet compatible with GOexpress is:

• assayData: a matrix (features by row, samples by column, with the value in each cell being the expression level of each feature in each sample)
• phenoData: an AnnotatedDataFrame (samples by row, phenotypic covariates by column, with the value in each cell being the phenotype of each sample for each covariate)

Typically, I use the cpm() function of edgeR, to generate the assayData matrix, while the phenoData was prepared as an Excel/csv file defining the experimental design.

Given this, and appropriate annotation, GOexpress::GO_analyse() will then be able to compare the expression data between samples groups defined by a given covariate.

Sincerely,

Kevin

0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 3 months ago
Munich, Germany

Hi,

a very good introduction to how to prepare an ExpressionSet can be found here - ExpressionSetIntroduction.pdf

Here you can find all what you need to create one.

Assa