Hello,
I am using the limma package to detect differentially expressed probesets between three groups of samples (knockdown, rescue, and control). When I pass my topTable arguement, probesets with the same gene symbol identifier are returned which also have (near) identical fold changes + p.values. I would like to remove multiplicates of these probesets such that these genes are represented by only one probeset. I am unsure on how to proceed with the analysis - these probesets do not have different accession numbers and keeping multiples does not seem informative. Can anyone provide me with a means to remove these "extra" probesets or provide a reference to help me solve this issue? The code I am using is attached below as well as an example of the topTable results. Thanks for any help you can provide.
- Matt
library(limma)
design = model.matrix(~ 0 + f)
colnames(design)=c("control","morphant","rescue")
contrast.matrix <- makeContrasts(morphant-control,rescue-morphant,rescue-control,levels=design)
data.fit.con <- contrasts.fit(data.fit,contrast.matrix)
data.fit.eb <- eBayes(data.fit.con,trend=TRUE)
MOtabWT <- topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=0.5)
PROBEID | ID | SYMBOL | GENENAME | ENTREZID | logFC | AveExpr | t | P.Value | adj.P.Val | B |
13217667 | BC067708 | aamp | angio-associated, migratory cell protein | 405874 | 0.81886 | 6.472058 | 5.203286 | 8.44E-05 | 0.003206 | 1.561555 |
13063164 | NM_001044310 | aars | alanyl-tRNA synthetase | 324940 | 1.755493 | 8.443648 | 8.862132 | 1.33E-07 | 5.31E-05 | 7.861228 |
13282364 | BC074030 | abca4a | ATP-binding cassette, sub-family A (ABC1), member 4a | 798993 | -1.12315 | 5.229148 | -6.05818 | 1.59E-05 | 0.001102 | 3.2049 |
13284182 | BC074030 | abca4a | ATP-binding cassette, sub-family A (ABC1), member 4a | 798993 | -1.12315 | 5.229148 | -6.05818 | 1.59E-05 | 0.001102 | 3.2049 |
13079785 | XM_678031 | abca4b | ATP-binding cassette, sub-family A (ABC1), member 4b | 555506 | -1.30577 | 5.217277 | -5.98837 | 1.82E-05 | 0.001187 | 3.074387 |
13156949 | NM_001172647 | abcc8 | ATP-binding cassette, sub-family C (CFTR/MRP), member 8 | 553281 | -0.88072 | 6.273975 | -4.98922 | 0.00013 | 0.004351 | 1.135853 |
13075730 | BC068351 | abcf1 | ATP-binding cassette, sub-family F (GCN20), member 1 | 406467 | 1.968939 | 7.719813 | 4.911665 | 0.000152 | 0.004842 | 0.980408 |
13018254 | BC139542 | abcg1 | ATP-binding cassette, sub-family G (WHITE), member 1 | 556979 | -0.74389 | 4.904658 | -4.57459 | 0.000304 | 0.007725 | 0.298189 |
13161486 | BC124444 | abhd2b | abhydrolase domain containing 2b | 559290 | -0.87335 | 5.137598 | -4.58277 | 0.000299 | 0.007636 | 0.314847 |
13281306 | ENSDART00000143986 | ABI3BP (2 of 2) | ABI family, member 3 (NESH) binding protein | #N/A | -1.2779 | 5.590848 | -5.43329 | 5.34E-05 | 0.002326 | 2.013004 |
13276814 | ENSDART00000133367 | ablim1b | actin binding LIM protein 1b | 541550 | 0.892601 | 8.408559 | 6.322132 | 9.69E-06 | 0.000794 | 3.692016 |
13276806 | ENSDART00000133367 | ablim1b | actin binding LIM protein 1b | 541550 | 0.839638 | 8.040964 | 4.526412 | 0.000336 | 0.008245 | 0.199901 |
Can you please explain what microarray platform this is and how it has been processed? For most microarray platforms it is virtually impossible to get identical results for two different probes, as you seem to have here, even if they relate to the same gene.
Also I note that the table of DE results you show cannot be the output from the topTable() call immediately above it, because the table is sorted alphabetically by symbol instead of by significance.