Question: sample ids not matching when creating SeqVarData object
1
gravatar for Stephanie M. Gogarten
4 months ago by
University of Washington
Stephanie M. Gogarten740 wrote:

Question submitted by email:

I’m trying to use the regression function in SeqVarTools but I keep getting error about sample id’s not matching when creating SeqVarData, e.g.

# Get the sample ids and force them to match
sampleIds <- data.frame( sample.id=seqGetData(gds, 'sample.id') )
phenos <- merge(sampleIds, phenos, sort=F)
seqSetFilter(gds, sample.id=phenos$sample.id)
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)

It looks like the SeqVarData is ignoring the seqSetFilter command when creating SeqVarData. Is there a way to get around this?

seqvartools • 135 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Stephanie M. Gogarten740
Answer: sample ids not matching when creating SeqVarData object
0
gravatar for Stephanie M. Gogarten
4 months ago by
University of Washington
Stephanie M. Gogarten740 wrote:

You can't use the seqSetFilter function to change the order of samples read from the GDS file, or to assign sampleData to only a subset of samples in the GDS file. The AnnotatedDataFrame you use to create a SeqVarData object must contain the same samples, in the same order, as the GDS file.

The problem you're having is that merge reorders rows in a data.frame even when sort=FALSE, and drops rows that don't match by default. You can get around this by using the all.x=TRUE argument to merge, and then explicitly reordering afterwards:

sample.id <- seqGetData(gds, 'sample.id')
sampleIds <- data.frame( sample.id, stringsAsFactors=FALSE )
phenos <- merge(sampleIds, phenos, sort=FALSE, all.x=TRUE)
phenos <- phenos[matchsample.id, phenos$sample.id),]
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)

Alternatively, you can use dplyr::left_join, which never re-orders its first argument:

phenos <- dplyr::left_join(sampleIds, phenos)
ADD COMMENTlink modified 4 months ago • written 4 months ago by Stephanie M. Gogarten740

Thanks for the response, however, I guess I wasn't clear about the problem. First, I have verified that the merge isn't the issue, and the sort doesn't reorder the rows in the data.frame and works as I expected. To clarify, the issue is the opposite of what you're suggesting, i.e. there are extra sample ids in the gds that I do not want, and I think the seqFilter to remove these ids is ignored by SeqVarData. So, this proposed solution doesn't work. The reason I think that the seqFilter to remove sample id's is ignored by SeqVarData (similar to some snpgds functions), is because if I do a seqExport after the filter and then reload the new gds, then my code in the parent posting above works fine.

So, my workaround that works fine, is to seqFilter, seqExport, and then seqOpen the new file. I was hoping there was a better way though, because the exporting is incredibly slow.

Thanks again for any help.

ADD REPLYlink modified 4 months ago • written 4 months ago by Larry N. Singh0
1

Sorry, I think I was misunderstanding your question. You can use seqSetFilter to select a subset of samples for any analysis; you just have to do it after creating the SeqVarData object. To run regression on a subset of samples, you would do the following:

idsToKeep <- phenos$sample.id
sampleIds <- data.frame( sample.id=seqGetData(gds, 'sample.id') )
phenos <- dplyr::left_join(sampleIds, phenos)
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)
seqSetFilter(assocData, sample.id=idsToKeep)
myReg <- regression(assocData, "outcome") # only uses idsToKeep

Any function that works on a SeqVarGDSClass object (what you get from seqOpen) will also work on a SeqVarData object. This is why the AnnotatedDataFrame has to match the GDS exactly (even it requires filling in missing values when you create it, as you do here). Otherwise it would be possible to subsequently set a filter including samples that were not in your original data.frame.

ADD REPLYlink modified 4 months ago • written 4 months ago by Stephanie M. Gogarten740

Ah I see, so I fill in place-holders and then remove them basically.

Thanks a bunch, this is most helpful!

ADD REPLYlink written 4 months ago by Larry N. Singh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 335 users visited in the last hour