Question: Remove a list of specific probes from an HTAFeatureSet
0
2.3 years ago by
jacorvar40
European Union
jacorvar40 wrote:

Dear BioC community,

I have a list of probe identifiers extracted from the file Clariom_D_Human-hg38-main-probes-tab.zip (field probe_id), downloaded from thermofisher (Affy microarrays).

I want to exclude that list of probes before normalizing the HTAFeatureSet object with rma function, otherwise there's no other way to remove those probes in the downstream analysis:

raw_data <- read.celfiles(filenames = celfiles)

raw_data <- raw_data[!featureNames(raw_data) %in% probes2exclude]

norm_data <- rma(raw_data)

However, when normalizing with rma, I get the following error:

Error in exprs(object)[pmi, , drop = FALSE] :

subscript out of bounds

Does anybody know how to exclude such probes and normalize the HTAFeatureSet without errors?

probe affy pd.clariom.d.human • 529 views
modified 2.3 years ago by James W. MacDonald50k • written 2.3 years ago by jacorvar40
Answer: Remove a list of specific probes from an HTAFeatureSet
0
2.3 years ago by
United States
James W. MacDonald50k wrote:

There is no easy way to do that. There is a hard way to do so - just drop those probes from the pmfeature table in the pd.clariom.d.human package. That should do it, but you might need to play around with it a bit.

But do note that if you want to do something like that, you are on your own. That's the peril and the promise of Open Source software - you can do whatever you want, so long as you are willing to educate yourself enough that you don't botch the job.

I loaded the pmSequence object from pmSequence.rda file located in the pd.clarium.d.human package directory (subdir data).

Next I filtered out all probes from my list from the pmSequence object and replace the pmSequence.rda file with the new set of probes.

Then, I reload the pd.clariom.d.human package in R and get the info about the probes after reading my CEL files. However, it does still contain all probes I excluded in the pmSequence.rda file.

What did I do wrong? What is the pmfeature table in the pd.clariom.d.human package?

Thanks a lot

1

The pm.clariom.d.human package has a SQLite database in it that is used to say where all the probes for each probeset are. The particular table that does that is called the pmfeature table.

You can query the database by opening a connection to it using

con <- db(pd.clariom.d.human)

and then using functions in the DBI package like dbGetQuery, you can then send SQL commands to the database to drop certain rows from the pmfeature table.

Where you went wrong was thinking that pmSequence is the same thing as pmfeature. Those words aren't even remotely similar, so I don't know why you thought I meant one when I said the other. That's a problem in and of itself - I said something exact, and you used fuzzy matching to decide what you thought I meant.

I'll repeat what I said in my original post - you want to do high-level things, going into the underlying database and making changes. That is a non-trivial endeavor, and you will have to figure out how to craft SQL statements to do what you want, and then make sure that the end result matches what you think it should be.

How to do that is far beyond the scope of this forum, so I'll also repeat something else I said - you are on your own in this project, and it's up to you to figure out how to do what you want to do.

Hi James, thanks for the answer. This is what I've done:

con <- db(pd.clariom.d.human)
sql <- "select * from pmfeature"
rs <- dbSendQuery(con, sql)
chunk <- fetch(rs)
dbClearResult(rs)
ep <- unique(sort(as.integer(ep)))
chunk2 <- chunk[!chunk\$fid %in% ep,]
dbRemoveTable(con, 'pmfeature')
dbWriteTable(con, 'pmfeature', chunk2)

I guess I have filtered out all probes  from my list ep (whenever fid matches the probe_id field in the Clariom_D_Human.hg38.main.probes file). Then, I replace the pmfeature table in the pd.clariom.d.human library/database. As a last step, I would like to rename the pd.clariom.d.human library so that I have one with and another one w/o the probes I need to filter out. Is it possible to do it with the DBI package? I haven't found so far a way.

Wouldn't it be easier to exclude the probes in my list from the CDF (Clariom_D_Human.r1.clf) and probe sequence information (Clariom_D_Human.hg38.main.probes) files and then to build a PDInfo package with pdInfoBuilder?

Thanks

I downloaded the Clariom D Array, human Analysis Files, r1 and filter out the probes from the clf and pgf files. Next, I create the pd.clariom.d.human package starting from those (modified) files. Then, I read the CEL files (R automatically loads the pd.clariom.d.human package, i.e. the package I made) and normalize the raw data using rma (at transcript cluster level).

At this point, I would not expect to find any expression for a transcript cluster whose probes (all of them) have been excluded from the pgf and clf files. This happens though, but it makes no sense for me. Such transcript cluster was not excluded from the annotation files, but I can't figure out how there's some expression for somithing that is not present.

Indeed, I compared the chunk and chunk2 variables from the previous comment for the new pd.clariom.d.human package I created, and they were identical, so the probes I filtered out are actually not present in the library.

Could anyone give me any hint please?

Thanks