Question: sample ids not matching when creating SeqVarData object
1
gravatar for Stephanie M. Gogarten
4 weeks ago by
University of Washington
Stephanie M. Gogarten670 wrote:

Question submitted by email:

I’m trying to use the regression function in SeqVarTools but I keep getting error about sample id’s not matching when creating SeqVarData, e.g.

# Get the sample ids and force them to match
sampleIds <- data.frame( sample.id=seqGetData(gds, 'sample.id') )
phenos <- merge(sampleIds, phenos, sort=F)
seqSetFilter(gds, sample.id=phenos$sample.id)
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)

It looks like the SeqVarData is ignoring the seqSetFilter command when creating SeqVarData. Is there a way to get around this?

seqvartools • 74 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Stephanie M. Gogarten670
Answer: sample ids not matching when creating SeqVarData object
0
gravatar for Stephanie M. Gogarten
4 weeks ago by
University of Washington
Stephanie M. Gogarten670 wrote:

You can't use the seqSetFilter function to change the order of samples read from the GDS file, or to assign sampleData to only a subset of samples in the GDS file. The AnnotatedDataFrame you use to create a SeqVarData object must contain the same samples, in the same order, as the GDS file.

The problem you're having is that merge reorders rows in a data.frame even when sort=FALSE, and drops rows that don't match by default. You can get around this by using the all.x=TRUE argument to merge, and then explicitly reordering afterwards:

sample.id <- seqGetData(gds, 'sample.id')
sampleIds <- data.frame( sample.id, stringsAsFactors=FALSE )
phenos <- merge(sampleIds, phenos, sort=FALSE, all.x=TRUE)
phenos <- phenos[matchsample.id, phenos$sample.id),]
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)

Alternatively, you can use dplyr::left_join, which never re-orders its first argument:

phenos <- dplyr::left_join(sampleIds, phenos)
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Stephanie M. Gogarten670

Thanks for the response, however, I guess I wasn't clear about the problem. First, I have verified that the merge isn't the issue, and the sort doesn't reorder the rows in the data.frame and works as I expected. To clarify, the issue is the opposite of what you're suggesting, i.e. there are extra sample ids in the gds that I do not want, and I think the seqFilter to remove these ids is ignored by SeqVarData. So, this proposed solution doesn't work. The reason I think that the seqFilter to remove sample id's is ignored by SeqVarData (similar to some snpgds functions), is because if I do a seqExport after the filter and then reload the new gds, then my code in the parent posting above works fine.

So, my workaround that works fine, is to seqFilter, seqExport, and then seqOpen the new file. I was hoping there was a better way though, because the exporting is incredibly slow.

Thanks again for any help.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Larry N. Singh0
1

Sorry, I think I was misunderstanding your question. You can use seqSetFilter to select a subset of samples for any analysis; you just have to do it after creating the SeqVarData object. To run regression on a subset of samples, you would do the following:

idsToKeep <- phenos$sample.id
sampleIds <- data.frame( sample.id=seqGetData(gds, 'sample.id') )
phenos <- dplyr::left_join(sampleIds, phenos)
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)
seqSetFilter(assocData, sample.id=idsToKeep)
myReg <- regression(assocData, "outcome") # only uses idsToKeep

Any function that works on a SeqVarGDSClass object (what you get from seqOpen) will also work on a SeqVarData object. This is why the AnnotatedDataFrame has to match the GDS exactly (even it requires filling in missing values when you create it, as you do here). Otherwise it would be possible to subsequently set a filter including samples that were not in your original data.frame.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Stephanie M. Gogarten670

Ah I see, so I fill in place-holders and then remove them basically.

Thanks a bunch, this is most helpful!

ADD REPLYlink written 4 weeks ago by Larry N. Singh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 275 users visited in the last hour