Dear BioC community,
I have a list of probe identifiers extracted from the file Clariom_D_Human-hg38-main-probes-tab.zip (field probe_id), downloaded from thermofisher (Affy microarrays).
I want to exclude that list of probes before normalizing the HTAFeatureSet object with rma function, otherwise there's no other way to remove those probes in the downstream analysis:
raw_data <- read.celfiles(filenames = celfiles) raw_data <- raw_data[!featureNames(raw_data) %in% probes2exclude] norm_data <- rma(raw_data)
However, when normalizing with rma, I get the following error:
Error in exprs(object)[pmi, , drop = FALSE] : subscript out of bounds
Does anybody know how to exclude such probes and normalize the HTAFeatureSet without errors?
Thanks in advance
I loaded the
pmSequence
object from pmSequence.rda file located in thepd.clarium.d.human
package directory (subdir data).Next I filtered out all probes from my list from the
pmSequence
object and replace thepmSequence.rda
file with the new set of probes.Then, I reload the
pd.clariom.d.human
package in R and get the info about the probes after reading my CEL files. However, it does still contain all probes I excluded in the pmSequence.rda file.What did I do wrong? What is the pmfeature table in the pd.clariom.d.human package?
Thanks a lot
The pm.clariom.d.human package has a SQLite database in it that is used to say where all the probes for each probeset are. The particular table that does that is called the pmfeature table.
You can query the database by opening a connection to it using
and then using functions in the DBI package like
dbGetQuery
, you can then send SQL commands to the database to drop certain rows from the pmfeature table.Where you went wrong was thinking that pmSequence is the same thing as pmfeature. Those words aren't even remotely similar, so I don't know why you thought I meant one when I said the other. That's a problem in and of itself - I said something exact, and you used fuzzy matching to decide what you thought I meant.
I'll repeat what I said in my original post - you want to do high-level things, going into the underlying database and making changes. That is a non-trivial endeavor, and you will have to figure out how to craft SQL statements to do what you want, and then make sure that the end result matches what you think it should be.
How to do that is far beyond the scope of this forum, so I'll also repeat something else I said - you are on your own in this project, and it's up to you to figure out how to do what you want to do.
Hi James, thanks for the answer. This is what I've done:
I guess I have filtered out all probes from my list
ep
(whenever fid matches the probe_id field in the Clariom_D_Human.hg38.main.probes file). Then, I replace the pmfeature table in the pd.clariom.d.human library/database. As a last step, I would like to rename the pd.clariom.d.human library so that I have one with and another one w/o the probes I need to filter out. Is it possible to do it with the DBI package? I haven't found so far a way.Wouldn't it be easier to exclude the probes in my list from the CDF (Clariom_D_Human.r1.clf) and probe sequence information (Clariom_D_Human.hg38.main.probes) files and then to build a PDInfo package with pdInfoBuilder?
Thanks
I downloaded the Clariom D Array, human Analysis Files, r1 and filter out the probes from the clf and pgf files. Next, I create the pd.clariom.d.human package starting from those (modified) files. Then, I read the CEL files (R automatically loads the pd.clariom.d.human package, i.e. the package I made) and normalize the raw data using rma (at transcript cluster level).
At this point, I would not expect to find any expression for a transcript cluster whose probes (all of them) have been excluded from the pgf and clf files. This happens though, but it makes no sense for me. Such transcript cluster was not excluded from the annotation files, but I can't figure out how there's some expression for somithing that is not present.
Indeed, I compared the
chunk
andchunk2
variables from the previous comment for the new pd.clariom.d.human package I created, and they were identical, so the probes I filtered out are actually not present in the library.Could anyone give me any hint please?
Thanks
And what about setting the intensity values of the probes I want to remove to the minimum value for the respective chip prior to normalization?