Hi,
We recently received a dataset of 48 rat samples on 6 Agilent G3 GEx Rat V2 chips. I have no experience with Agilent arrays so I’m really struggling with this one.
A couple of questions I was hoping someone could help me with:
1) This dataset contains samples from 3 different tissues that were randomized across all chips. I was wondering whether it is ok to read in all the data in at once and then normalize or whether it would be better to normalize per tissue? If that is even ok to do so..
2) I had some issues reading in the datasets that were provided to me so I used the command below to read in the files but I’m not sure if this is ok:
targets <- readTargets("targets.txt", row.names="FileName")
x <- read.maimages(targets$FileName, source="agilent", columns=list(R="F635 Median", Rb="B635 Median"), annotation=c("Row","Column","ID","Name"))
3) The chip contains ERCC spike in probes but I’m not sure how to handle these in the analysis. I tried to analyze the data using commands below
bc<-backgroundCorrect.matrix(x, method="normexp")
E <- normalizeBetweenArrays(bc, method="quantile")
ct <- factor(targets$Type)
design <- model.matrix(~0+ct)
colnames(design) <- levels(ct)
fit <- lmFit (E,design)
contrasts <- makeContrasts (treatment-control, levels=design)
contrasts.fit <- eBayes(contrasts.fit(fit, contrasts))
summary(decideTests(contrasts.fit, method="global"))
a<-topTable(contrasts.fit, coef=1)
write.table(a, file="treatment-control.txt", sep="\t")
But I guess that is not the correct way to do so because the spike ins turned up being differentially expressed.. Can anyone help me on this? Thanks so much..
Best, Annelies
Thank you so much for your reply. Greatly appreciated.
To clarify, we have sham, treated and untreated samples and we always investigate the influence of treatment in different tissues. I'm just not sure why they were now all randomized over all the chips. I would probably put sham, treated and untreated together per tissue. Do you think it is ok to normalize them all together? Different tissues have different RNA contents..
When I try your suggested read in command, I get following error:
Error in readGenericHeader(fullname, columns = columns, sep = sep) : Specified column headings not found in file
I'm not sure how the header specifically should look like for that command to work.
It also doesn't want to read in .gpr files. The only thing that has worked so far is:
x <- read.maimages(targets$FileName, source="agilent", columns=list(R="F635 Median", Rb="B635 Median"), annotation=c("Row","Column","ID","Name"))
Thanks for your advice on the spike-ins.
Kind regards, Annelies
What do you mean by 'randomized over all the chips'? If these are Agilent single color arrays, then there is one sample per array, so I can't interpret what you mean by that.
Anyway, you shouldn't be reading the gpr files anyway, you should be reading the txt files. There should be something like 9 garbage lines at the top and then the data. I have some Agilent single color arrays, and they look like this:
And the arguments for
read.maimages
should pick out the gMedianSignal and the gBMedianSignal, which are the green foreground and background columns, respectively, for an Agilent array.As for putting them all together, the assumption for the quantile normalization (as with most microarray normalization methods) is that the bulk of the genes aren't changing between samples. There is always a tradeoff that you have to make - more samples are probably better, particularly for the model fitting step, but maybe the tissues are so different that you can't really normalize well - and it's not usually possible to know, really, what the underlying truth is. So as an analyst you have to make decisions without knowing what the underlying truth is, and have to be able to credibly defend what you have done.
Yes, these are Agilent single color array, one sample per array. One chip contains 8 arrays and we used a total of 6 chips (48 samples). If it turns out that there is too much variation between the tissues, can I normalize per tissue although some sample might be on one chip and others on the the other chips or do I need to take variation between the chips into account? Sorry might be a stupid question but I'm quite new at this and want to make sure I'm doing this right.
Thanks a lot for header example, helps a lot!
I would look at a PCA plot and a plot of the densities (
plotDensities
from limma) prior to normalizing in order to say if it's reasonable to normalize together. You can also use PCA plots to see if there is an array-specific signal, which will probably be in the higher principal components, so you should look at PC3 vs PC4, etc to check that out. A super easy way to do that is to use theGlimma
package, and you can even add in group identifiers so you can change the plotting colors on the fly. I normally useglMDSplot
, which is the original function for that sort of thing (note that using gene.selection = "common" is how you get a PCA plot). There is also a new function calledglimmaMDS
that might be better - I'm all old school and stuff so haven't even tried it yet - but either way it's nice to be able to have an interactive plot to play around with.Anyway, if there does seem to be a plate-specific signal, then you will probably want to account for that in your model. If the samples were randomized well, the plate should be orthogonal to the treatment and you could hypothetically just add in a fixed factor for plate. But from what you've said that sounds unlikely, so you might need to use
duplicateCorrelation
instead, to fit a linear mixed model.Given that you are new to this, you should read the limma User's Guide like five times or so. It's too long and complex to get in one shot, so I would read it several times and do your best to understand. It's pretty clear and comprehensive, but IMO it's hard for someone as smart/experienced as Gordon Smyth to write things in a completely approachable manner for new people. So flounder through it until it starts to stick.
I'll definitely do so.. Again, thanks so much for your help..
All the best, Annelies