Question submitted by email:
I’m trying to use the regression function in SeqVarTools but I keep getting error about sample id’s not matching when creating SeqVarData, e.g.
# Get the sample ids and force them to match
sampleIds <- data.frame( sample.id=seqGetData(gds, 'sample.id') )
phenos <- merge(sampleIds, phenos, sort=F)
seqSetFilter(gds, sample.id=phenos$sample.id)
phenosDF <- AnnotatedDataFrame(phenos)
assocData <- SeqVarData(gds, phenosDF)
It looks like the SeqVarData is ignoring the seqSetFilter command when creating SeqVarData. Is there a way to get around this?
Thanks for the response, however, I guess I wasn't clear about the problem. First, I have verified that the merge isn't the issue, and the sort doesn't reorder the rows in the data.frame and works as I expected. To clarify, the issue is the opposite of what you're suggesting, i.e. there are extra sample ids in the gds that I do not want, and I think the seqFilter to remove these ids is ignored by SeqVarData. So, this proposed solution doesn't work. The reason I think that the seqFilter to remove sample id's is ignored by SeqVarData (similar to some snpgds functions), is because if I do a seqExport after the filter and then reload the new gds, then my code in the parent posting above works fine.
So, my workaround that works fine, is to seqFilter, seqExport, and then seqOpen the new file. I was hoping there was a better way though, because the exporting is incredibly slow.
Thanks again for any help.
Sorry, I think I was misunderstanding your question. You can use
seqSetFilter
to select a subset of samples for any analysis; you just have to do it after creating theSeqVarData
object. To run regression on a subset of samples, you would do the following:Any function that works on a
SeqVarGDSClass
object (what you get fromseqOpen
) will also work on aSeqVarData
object. This is why theAnnotatedDataFrame
has to match the GDS exactly (even it requires filling in missing values when you create it, as you do here). Otherwise it would be possible to subsequently set a filter including samples that were not in your original data.frame.Ah I see, so I fill in place-holders and then remove them basically.
Thanks a bunch, this is most helpful!