I’m wondering if there is a way to subset the results from the analyzeVariants() function by different categories of splicing events, such as retained introns, Alternative first exons, Alternative last exons, skipped exons, etc.
SGVariants objects have a column called variantType that encodes this information. You can check the manual page for annotateSGVariants for an explanation of possible types. If you want to subset analyzeVariants() results to e.g. variants that describe skipped exon events (SE) you can do something like
In general you can subset objects in R by using integer vectors (with indices for the entries you want to keep) or logical vectors (TRUE for entries you want to keep).
SGVariants is a special case of a GRangesList object. GRangesLists have metadata columns stored as "mcols". This can be any type of metadata associated with the ranges. In the case of SGVariants it is specified what "mcols" must look like. Among other information it includes the column "variantType". You can access it by typing mcols(sgv)$variantType or for convenience there is an accessor function variantType() that returns the same information.
SGVariantCounts is a special case of a SummarizedExperiment object. SummarizedExperiment objects have counts or expression values stored as "assays" and sample and row information stored as "colData" and "rowData" respectively. For an SGVariantCounts object "rowData" has to be an SGVariants object. The accessor function variantType() also works on SGVariantCounts objects.
Finally variantType is a "CharacterList" because a variant can be part of more than one canonical event. For example Fig. 6I in our paper shows a case where variant 1 can be considered a mutually exclusive exon with respect to variant 2 or it can be considered a cassette exon that can be skipped (variant 3). Because variantType is a CharacterList it is more complicated to do the subsetting. Here are more detailed step-by-step instructions
> ## extract the variant type information
> vt <- variantType(sgvc_pred)
>
> ## select variants for which at least one entry matches "SE"
> i <- sapply(vt, function(x) { any(grepl("SE", x)) })
>
> ## subset the SGVariantCounts object
> sgvc_SE <- sgvc_pred[i, ]
I hope this helps let me know if you have more questions.
This is more of an R question: I have a SGVariantCounts object, and I’d like to subset that by “variantType”. I can see the “variantType” column using mcols(sgvc), but I’m not sure how to subset a SGVariantCounts object like you have in your example with a variants object. Sorry for such a basic question…