ExpressionSet expression values is negative from NCBI GSE data set
2
0
Entering edit mode
bharata1803 ▴ 60
@bharata1803-7698
Last seen 5.7 years ago
Japan

Hello,

I used GeoQuery library to get expression data fro, NCBI GSE. I have succeed to get the ExpressionSet data but notice something strange. It has negative value in it. It is impossible to get the log2 scale with negative value so I'm thinking whether it has been processed beforehand. Does it mean the value is already in log2 scale?

I'm not really familiar with NCBI SOFT or MINIML because I studied how to process microarray data from the raw type data (CEL file). Is it already processed data (log scale or normalization or other data processing method) or just raw data after CEL file is processed. From the boxplot, it seems the data has been processed. This is the link for boxplot image:

https://www.dropbox.com/s/602xf27jaincld2/geo2r.png?dl=0

Thank you for your answer.

 

Edit by moderator: this question relates to GEO series GSE51791.

microarray differential gene expression • 2.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

I don't know what quantity is being displayed in the boxplots, but it certainly does not look like either expression or log2 expression. The description of the data analysis on GEO is not very unhelpful -- if I had to guess, I would say that they have computed log-intensities in some form, then standardized to have mean 0 and variance 1 for each array.

Anyway, the processed values look suspect to me. Personally, I would re-analyze the data from the raw Feature Extraction files that are provided on GEO as supplementary files. Just go to the GEO page

  http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51791

and click on the link to supplementary files at the bottom of the page.

ADD COMMENT
0
Entering edit mode

The boxplot is the data I downloaded using GEOQuery. In the description, it said the data shown is normalized log intensity. So, I think it is already normalized in the form of log2 values.

ADD REPLY
0
Entering edit mode

In some way, yes, but they do not give any details *how* except that they used GeneSpring to normalize it.

Hence Gordon's recommendation to download the raw data via getGEOSuppFiles() and apply your normalization of choice to it so you know for sure what you're looking at...

ADD REPLY
0
Entering edit mode

Thank you. I don't know we can get the raw files. I will try to use that.

ADD REPLY
0
Entering edit mode
Axel Klenk ★ 1.0k
@axel-klenk-3224
Last seen 23 minutes ago
UPF, Barcelona, Spain

Dear bharata1803,

It's hard to know for sure since you're not giving us the GSE number.

If I had to guess: M-values, i.e. log2(ch2/ch1), from a two-colour array?

Cheers,

 - axel

ADD COMMENT
0
Entering edit mode

Clicking on the plot shows that it is GSE51791. It appears to be single channel Agilent.

ADD REPLY
0
Entering edit mode

True, and I missed it :-( of course the most reasonable thing is to follow your advice and start from the raw data.

ADD REPLY

Login before adding your answer.

Traffic: 959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6