Dear Bioconductor Community,
i have 10 CEL files of the primeview PM-only affymetrix array, and i have pre-prossessed and normalized with rma. I would like to perform Present/Absent calls for non-specific intensity filtering, to remove probesets which are consistenly non expressed(i.e. "absent" in more than 5 of the 10 samples). As i cannot install xps due to the fact that is not available to my current R version( i also tried on 32-bit version on my windows 8 but it didnt worked either), i found the function paCalls in the oligo package: which has the option "DABG" or "PSDABG".
Thus, my main question is that should i use on of the above options with paCalls, or these are inapropriate because in the first place DABG was designed for exon arrays?
On the other hand, if i would like to perform a more general filtering on mean log-expression, could i use the genefilter package and try something like this: i.e., filter <- pOverA(0.5, log2(100)) ? Or is too strict ??
Just to mention after rma normalization i found a lot of probesets with very low log2 intensity(from 0-2) and some with negative values.
Any suggestion or proposal would be essential !!!
Dear Stephen,
thank you for your answer !! i didnt know about the specific package, so i would like to ask you if i can use it after the normalization on the probeset level ? or it requires other imputs ?
The simplest way would be to use UPC for normalization. You can input .CEL files, and it will output a UPC value for each probeset. Alternatively, you could apply UPC (using the "generic" functions in SCAN.UPC) to prenormalized data. But since you are working with Affy data, it seems simpler to start with the .CEL files.
Please excuse me for one more question, but i had a first look on the vignette--thus in order to use the UPC function i have to use only the normalization implemented by UPC? Moreover, if im not mistaken is a normalization process for each CEL file(-array) separately ?
Yes. UPC processes each CEL file separately. But you can specify wildcards, so it is easy to process multiple CEL files at a time. Even though it processes one sample at a time, we believe it performs as well as or better than methods such as RMA that combine data across samples (see our papers).
Yes, i found the paper on Elzevier so it would be very interesting to learn more and implement your approach, and more importantly your methodology--if i have more questions especially on the UPC methodology i will return with more questions
One main question regarding the vignette, as you mentioned earlier, i have the celfile directory with the 10 CEL files. When i naively tried SCAN with the celfilePatern=datadir (where datadir the directory of the CEL files), i got the following error and also a warning:
Error in { : task 1 failed - "These are directories:
C:/Users/Efstathios/Desktop/totalcel"
In addition: Warning message:
executing %dopar% sequentially: no parallel backend registered
Thus, how i could specify the wildcards you mentioned earlier ? Thank you again for your time !!
Assuming I understand your question correctly, you would run it something like this:
UPC("C:/Users/Efstathios/Desktop/totalcel/*.CEL", outFilePath="myoutput.txt")