Merge different datasets regarding a common number of DE genes in R
3
1
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

i have recently analyzed two different microarray datasets using the same pipeline in R. Both of the datasets have the same variables and the comparizon was cancer vs normal samples in order to find DE genes. Moreover. both of the datasets are Affymetrix platform, but different genechip: hgu133a & hgu133plus2. Finally, after annotating results from both datasets, i found in common 278 genes with same probeIDs(from topTable in limma). My question is whether it is possible and applicable in someway to combine both datasets and the expressions about these specific genes, in order to infer common patterns or similar expressions in these genes ?(for instance with heatmap). Im concerned that merging different datasets includes many pitfalls or serious batch effects, but here my goal is to test the possibility to infer any important information that can be excluded from both of these datasets regarding colon cancer ?

bioconductor expresiondataset affymetrixchip differential gene expression • 3.5k views
ADD COMMENT
1
Entering edit mode

I'm not sure what you mean by excluding important information. What are you trying to achieve by combining the two datasets? One obvious application would be to find genes that are significant in both datasets, and thus, more likely to be genuinely DE.

ADD REPLY
0
Entering edit mode

Yes, i would like to find common genes or groups of genes that "behave similarly" and have common expression patterns from both datasets. Thats why i posted the question to hear every possible idea

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 17 hours ago
The city by the bay

Directly combining the two datasets into a single limma analysis would be unwise, due to batch effects and the fact that two different chips are involved. Instead, I'd suggest doing some sort of meta-analysis. As I mentioned before, the obvious approach would be to identify genes that are DE in both experiments. This can be done informally by intersecting the two DE lists, or more rigorously by using an intersection-union test.

Through this approach, you can identify genes that are consistently detected in both experiments. If the two experiments involve the same cancer type, then the intersected subset is unlikely to provide extra biological information than either DE list on its own; however, the genes in this subset are more likely to be genuinely DE. If the two experiments involve different cancers, then the subset might be biologically interesting, e.g., to find common genes that are dysregulated across different cancers.

Another strategy might be to use the DE list in one experiment to define a gene set. You can then use ROAST to test for DE for those genes in the other experiment. This will tell you whether the DE pattern is broadly similar between the two experiments.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

Another alternative is to use the GeneMeta package, which is intended for this sort of thing.

ADD COMMENT
0
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…

Thank you both for your aswers and suggestions. Dear Aron both datasets refer to colon cancer, but if i require information about the patients, maybe there different subtypes of colon cancer-and thus could be more biologically interesting(although also more genuine DE). I have just used Microsoft Access because i didnt know how to intersect in R, and i found between the two dataframes of the DE genes from the two datasets(& from the different platforms): 281 common DE genes from the one dataset with 1248 DE genes(hgu133a) and from the other dataframe with 1149 DE genes(hgup133plus2). Moreover, the majority of these common genes showed a common behaviour in terms of logFC(upregulation or downregulation). So i guess this subset of genes is more genuine DE. Dear Mr MacDonald, i would definately check the above package as from the vignette it looks very interesting.

ADD COMMENT
1
Entering edit mode

Check out the intersect function.

ADD REPLY
0
Entering edit mode

Dear Aaron,

please excume me for writting after 8 weeks, but regarding the above methodology, as for the time being im trying also to learn and test other methodologies about comparing the two DEG lists from above to strengthen my results, i would like to ask you if it is possible(because im not experienced in R) how could i use roast about your above idea ? i have used mroast in the past with a help from a vignette but it was for testing differentially expressed KEGG pathways with pathview.

ADD REPLY
0
Entering edit mode

You might be better off posting this as a separate question, to get some more activity.

ADD REPLY
0
Entering edit mode

Ok i understand but i tried first to post it here for simplicity because you have proposed the idea and i have already posted the specific question- i will then also post it as a separate question

ADD REPLY

Login before adding your answer.

Traffic: 586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6