Question

Get Gen-Expression values accross samples for ONE Gene given by Gene-Symbol

0

Entering edit mode

bi_Scholar • 0

@bi_scholar-11572

Last seen 7.4 years ago

Hello,

is there an easy way to query an annotated Expression-Set by Gene-Symbols etc. to extract the Expression-Values for a single gene?

I tried to annotate the row-names of the Expression-Matrix with a list of Gen-Symbols obtained from the annotation-file, but as some Gen-Symbols are duplicates, this causes some issues.

Thanks in advance!

expressionset bioconductor • 1.5k views

ADD COMMENT • link updated 7.6 years ago by polemiraza ▴ 70 • written 7.6 years ago by bi_Scholar • 0

score 1 · Answer 1 · 2016-10-01

1

Entering edit mode

polemiraza ▴ 70

@polemiraza-11428

Last seen 2.4 years ago

Poland

Hello bi_Scholar,

I think that collapsing (aggregating) several probe measurements corresponding to a single gene would be the best solution.

I recommend collapseRows function in WGCNA package (you can use matrix or data frame as an input). There are a number of useful options by which you can aggregate your data eg. collapseRows can pick up the probe with highest mean value or maximum variance across the samples...etc. It can obviously take average expression value of probes corresponding to a gene.

Cheers,

Pawel

ADD COMMENT • link 7.6 years ago polemiraza ▴ 70

0

Entering edit mode

Hey Pawel,

thanks alot for the answer. I looked into this function and it seems like it is exactly what I was looking for.
However, I have two questions regarding the function:

1. The method states, that collapsing methods "maxMean", "Average", etc. are unreliable for 5 or fewer samples, why is that?
2. The number of rows in the function-output is almost half the size of the original expression Matrix. Why are there so many _at probe-sets mapping to the same Gene? Is there any advantage emerging?

ADD REPLY • link 7.5 years ago bi_Scholar • 0

1

Entering edit mode

Hey bi_Scholar,

AD.2 Usually two or more probesets are homologus to a different regions of the same gene transript.

Nevertheless, to help you more efficiently I would like to know what is your array platform, how many samples and sample groups (phenotypes) do you have?

Cheers,

Pawel

ADD REPLY • link 7.5 years ago polemiraza ▴ 70

0

Entering edit mode

Hey Pawel,

thanks for taking time to answer my questions, I really appreciate it!
I'm working with an Affymetrix Dataset, downloaded from GEO on the GPL96 platform (HG-U133A).

The Samples are grouped by disease-state (healthy/infected) with 5 samples each.
I want to analyze the Data for Gene-Coexpression within a given group and therefore need to be able to query the Expression-Values by Gene-Symbol.
So far, your proposed method worked perfectly, I was just curious about the things above.

Cheers!

ADD REPLY • link 7.5 years ago bi_Scholar • 0

1

Entering edit mode

bi_Scholar,

If there is kind of restriction for small sample groups in collapseRows function I propose you even better solution.

You need to normalize your data (from the raw .cel files) using custom annotation package [ I recommend Brainarray package]. Here is explanation why:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-48

Link to custom CDFs (use latest version (v.20)):

http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp

In most of the cases after normalisation and annotation of probesets to gene symbols you will get matrix (or df) without duplicated gene names [there might be hovewer few duplicates - I used to remove them by hand].

All above will lead to removal of duplicates (as you whished) and obtaining more reliable expression values than with collapseRows function.

Cheers,

Pawel

ADD REPLY • link 7.5 years ago polemiraza ▴ 70