Combining data from multiple illumina microarray experiments
2
0
Entering edit mode
chris86 ▴ 420
@chris86-8408
Last seen 4.4 years ago
UCL, United Kingdom

Hi

We have two experiments run on illumina arrays done by the same company. I want to combine the data. I normally use Limmas functions (read.ilmn, and neqc) to deal with normalisation of the data. However if I combine these two experiments what is the best way of going about doing this. I am aware they need background correcting and quantile normalisation using the negative and positive control probes on the arrays.

edit - platform is the same, but need to compare healthy in one group with diseased in the other group. why it was done like this - well i didnt have any say in it - i cant go into that. the company we use is highly efficient and everything is standardised so batch should not be much of a problem.

Best,

Chris

normalization limma • 1.7k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 10 minutes ago
WEHI, Melbourne, Australia

You can simply combine the datasets before normalization like this:

x1 <- read.ilmn( files for run1 )
x2 <- read.ilmn( files for run2 )
x <- cbind(x1,x2)
y <- neqc(x)

etc. This assumes that the probes for both runs are identical and in the same order.

ADD COMMENT
0
Entering edit mode

Thanks that is what I wanted.

ADD REPLY
0
Entering edit mode

My results from this look way off (too much DE). I deleted probes in the control files that did not match in both control probe exports, could this have this effect? Unless this is a batch issue.

ADD REPLY
0
Entering edit mode

You said above "My results from this look way off (too much DE)"---that's sounds very arbitary at all---what do you mean too "much DE" ? By which cutoffs or comparisons ? What is your biological question of interest ? Did you also created any diagnostic plots, such as a MDS plot to see if your samples are separated based on your biological question ? And there are indeed differences in expression to be suggested ? Or if there is any batch effect present ? (which could also be invastigated from a hierarchical clustering of your samples, a PCA plot etc.). A brief review of your code used might be more appropriate to further help or suggestions.

ADD REPLY
1
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 6 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Chris,

actually the the task of combining 2 or more datasets is always challenging, and also with many parameters or assumptions to take into account !! Firstly, the illumina arrays are on the same platform, or different ? Secondly, the phenotype information or the experimental design is the same ? For istance, the same cell line used or different? Then, it would also be the presence of a possible batch effect, that you will have to take into account. Thus, firstly -in my personal opinion- the safest way is to analyze separately the two datasets, and then:

1) Compare your final DE lists, to identify common genuine DE genes among your "similar comparisons";

2) Identify common pathways or ontologies "enriched" in both datasets;

3) Also perform other exploratory plots of the DE statistics, such as a scatter plot of Log-FCs.

Then, it might be interesting for other reasons (i.e. for gaining theoretically more statistical power) to combine both datasets. But still, there are a lot of issues that you will have to take into account, in order even for the combination to be valid. I hope that helps.

Best,

Efstathios

ADD COMMENT
0
Entering edit mode

Sorry I should have said the platform is identical we have so far not seen batch effects come out of this service we are provided with - although between entirely different experiments we cannot establish this because one group is diseased with arthritis and the other, is healthy. we need to do a direct comparison between healthy and arthritis, the reasons why the experiment was done like this are complex and I suppose we would have to ignore any possible batch effects and do it anyway.

ADD REPLY
0
Entering edit mode

Well, you mean that the one dataset is comprised only by patient samples, while the other is of healthy individuals ? To ignore generally a batch effect would be erroneous, but anyway--and in any case---you should create some EDA plots, like an MDS plot, a hierarchical clustering of your samples, etc to inspect how your groups are clustered.

ADD REPLY
0
Entering edit mode

Yes that is right. As far as I understand I will not be able to tell the difference between a batch effect and healthy/disease status anyway because they are confounded? I am trying to integrate the files together from genome studio at the moment - not cluster, pcas etc. thanks.

ADD REPLY
0
Entering edit mode

Well, this is weird to create and run in different experiments the healthy and the diseased individuals-anyway, a first possible naive solution, is to just merge both datsets on the common probeIDs-after normalization and based that you have the same platform-and then perform some EDA plots to examine the possible batch effect. It is not something great, but again it might be something

ADD REPLY
0
Entering edit mode

we were not intending to compare them originally. there is a long story behind it. it is all a bit annoying for me!

ADD REPLY

Login before adding your answer.

Traffic: 815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6