Research Project - Expanding search funtionality on the GEO database
0
0
Entering edit mode
gallantk5 • 0
@5a9e144a
Last seen 8 hours ago
Canada

Hi BioConductor community ,

I am working on a summer research project on GEO doing the following.

Currently, anyone can look at the results of any previous study and ask if their mRNA under study changed in the cells for that study. The current limitation, however, is that one can only search for studies with a particular mRNA. This would produce a large list which one would have to search through individually. There is no functionality to search all studies and obtain statistics on how much change has occurred.

The goal of the current proposal is to extend the current database with functionality to first extract which studies showed that the mRNA of interest changed (say by 2-fold ), then extract those studies, and then summarize how much the mRNA changed in those experiments.

The outcome of such an ability would be that scientists could discover at an accelerated rate the possible roles of the gene of interest in many cell functions and diseases. For example, say one quarried a gene called X that one knows little about. Say the new search ability shows that X commonly changes in experiments where cells were treated with UV. This might now implicate X in UV-associated diseases such as skin cancer.

Does anyone know how I can search for gene names within samples on GEO ?

Do I have to annotate the GEO sample files first ?

Could anyone give me any tips or steps I would have to take in order to achieve this ?

Thanks !

GEOquery GEO • 189 views
ADD COMMENT
0
Entering edit mode

Note that GEOquery is inherently limited to array data, are you aware of that? Generally, the problem is with such plans that it would require that submitted data have uniform metadata, so the automated analysis would know which samples belong to which group, and which contrasts for differential analysis make sense. Same goes for potentially important other covariates, such as batch. I don't want to discourage you, it's just a quite complex endeavour due to lack of standardization in terms of metadata submission (if any are submitted).

ADD REPLY
0
Entering edit mode

I have no biology background , I am a computer science student and I am currently concentrating on the series data and analyzing if there is a n-fold increase between samples in a series. I am working on normalizing and annotating .CEL files at the moment. Is there a way to automate annotating all affymetrix .CEL files ?

ADD REPLY
0
Entering edit mode

What do you mean by 'annotate'? Identify the array type, or map probeset IDs to the gene that is meant to be measured? Something else?

ADD REPLY
0
Entering edit mode

Hi James,

By annotate I mean map the probeset IDs to the gene names. Is there a way to automate that for all Affymetrix .CEL files ? I would like to have an excel file with the gene expression levels across all samples in a series and have the probeset IDs and gene names as well.

ADD REPLY
0
Entering edit mode

You can use the Bioconductor Annotation packages for the Affymetrix chips of interest.

And the AnnotationDbi package:

ADD REPLY
0
Entering edit mode

Thank you I will take a look at those !

ADD REPLY
0
Entering edit mode

gallantk5 , I think it might be challenging to walk through your project in detail on this support forum. If you and your advisor would like to meet, feel free to reach out to me directly via the bioconductor slack or via email.

ADD REPLY
0
Entering edit mode

I will send you an email and talk to my advisor about setting up a meeting. Thank you !

ADD REPLY
0
Entering edit mode

Totally agree with your comments, ATpoint . One small detail--GEOquery can handle any data in GEO, including RNA-seq data. There has been a trend toward submitting incomplete records (without processed data), but if such data are available, they can be processed by GEOquery.

ADD REPLY

Login before adding your answer.

Traffic: 379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6