Question

Merge different datasets regarding a common number of DE genes in R

1

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 13 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

i have recently analyzed two different microarray datasets using the same pipeline in R. Both of the datasets have the same variables and the comparizon was cancer vs normal samples in order to find DE genes. Moreover. both of the datasets are Affymetrix platform, but different genechip: hgu133a & hgu133plus2. Finally, after annotating results from both datasets, i found in common 278 genes with same probeIDs(from topTable in limma). My question is whether it is possible and applicable in someway to combine both datasets and the expressions about these specific genes, in order to infer common patterns or similar expressions in these genes ?(for instance with heatmap). Im concerned that merging different datasets includes many pitfalls or serious batch effects, but here my goal is to test the possibility to infer any important information that can be excluded from both of these datasets regarding colon cancer ?

bioconductor expresiondataset affymetrixchip differential gene expression • 3.5k views

ADD COMMENT • link 9.7 years ago svlachavas ▴ 840

1

Entering edit mode

I'm not sure what you mean by excluding important information. What are you trying to achieve by combining the two datasets? One obvious application would be to find genes that are significant in both datasets, and thus, more likely to be genuinely DE.

ADD REPLY • link 9.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yes, i would like to find common genes or groups of genes that "behave similarly" and have common expression patterns from both datasets. Thats why i posted the question to hear every possible idea

ADD REPLY • link 9.7 years ago svlachavas ▴ 840

score 1 · Answer 1 · 2015-03-16

Directly combining the two datasets into a single limma analysis would be unwise, due to batch effects and the fact that two different chips are involved. Instead, I'd suggest doing some sort of meta-analysis. As I mentioned before, the obvious approach would be to identify genes that are DE in both experiments. This can be done informally by intersecting the two DE lists, or more rigorously by using an intersection-union test.

Through this approach, you can identify genes that are consistently detected in both experiments. If the two experiments involve the same cancer type, then the intersected subset is unlikely to provide extra biological information than either DE list on its own; however, the genes in this subset are more likely to be genuinely DE. If the two experiments involve different cancers, then the subset might be biologically interesting, e.g., to find common genes that are dysregulated across different cancers.

Another strategy might be to use the DE list in one experiment to define a gene set. You can then use ROAST to test for DE for those genes in the other experiment. This will tell you whether the DE pattern is broadly similar between the two experiments.

score 1 · Answer 2 · 2015-03-16

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Another alternative is to use the GeneMeta package, which is intended for this sort of thing.

ADD COMMENT • link 9.7 years ago James W. MacDonald 67k

score 0 · Answer 3 · 2015-03-16

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 13 months ago

Germany/Heidelberg/German Cancer Resear…

Thank you both for your aswers and suggestions. Dear Aron both datasets refer to colon cancer, but if i require information about the patients, maybe there different subtypes of colon cancer-and thus could be more biologically interesting(although also more genuine DE). I have just used Microsoft Access because i didnt know how to intersect in R, and i found between the two dataframes of the DE genes from the two datasets(& from the different platforms): 281 common DE genes from the one dataset with 1248 DE genes(hgu133a) and from the other dataframe with 1149 DE genes(hgup133plus2). Moreover, the majority of these common genes showed a common behaviour in terms of logFC(upregulation or downregulation). So i guess this subset of genes is more genuine DE. Dear Mr MacDonald, i would definately check the above package as from the vignette it looks very interesting.

ADD COMMENT • link 9.7 years ago svlachavas ▴ 840

1

Entering edit mode

Check out the intersect function.

ADD REPLY • link 9.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

please excume me for writting after 8 weeks, but regarding the above methodology, as for the time being im trying also to learn and test other methodologies about comparing the two DEG lists from above to strengthen my results, i would like to ask you if it is possible(because im not experienced in R) how could i use roast about your above idea ? i have used mroast in the past with a help from a vignette but it was for testing differentially expressed KEGG pathways with pathview.

ADD REPLY • link 9.5 years ago svlachavas ▴ 840

0

Entering edit mode

You might be better off posting this as a separate question, to get some more activity.

ADD REPLY • link 9.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Ok i understand but i tried first to post it here for simplicity because you have proposed the idea and i have already posted the specific question- i will then also post it as a separate question

ADD REPLY • link 9.5 years ago svlachavas ▴ 840