question about subsetting SGSeq results by different categories of splice events
3
0
Entering edit mode
@amandajoyprice-13308
Last seen 3.6 years ago

I’m wondering if there is a way to subset the results from the analyzeVariants() function by different categories of splicing events, such as retained introns, Alternative first exons, Alternative last exons, skipped exons, etc.

 

Thanks for the help,

 

Amanda

sgseq • 563 views
ADD COMMENT
0
Entering edit mode
@leonard-goldstein-6845
Last seen 12 months ago
United States

Hi Amanda, 

SGVariants objects have a column called variantType that encodes this information. You can check the manual page for annotateSGVariants for an explanation of possible types. If you want to subset analyzeVariants() results  to e.g. variants that describe skipped exon events (SE) you can do something like

sgvc_pred[any(grepl("SE", variantType(sgvc_pred))), ]

Let me know if you run into any problems. 

Leonard

ADD COMMENT
0
Entering edit mode
@amandajoyprice-13308
Last seen 3.6 years ago

This is more of an R question: I have a SGVariantCounts object, and I’d like to subset that by “variantType”. I can see the “variantType” column using mcols(sgvc), but I’m not sure how to subset a SGVariantCounts object like you have in your example with a variants object. Sorry for such a basic question…

Thanks,

Amanda

ADD COMMENT
0
Entering edit mode
@leonard-goldstein-6845
Last seen 12 months ago
United States

In general you can subset objects in R by using integer vectors (with indices for the entries you want to keep) or logical vectors (TRUE for entries you want to keep). 

SGVariants is a special case of a GRangesList object. GRangesLists have metadata columns stored as "mcols". This can be any type of metadata associated with the ranges. In the case of SGVariants it is specified what "mcols" must look like. Among other information it includes the column "variantType". You can access it by typing mcols(sgv)$variantType or for convenience there is an accessor function variantType() that returns the same information. 

SGVariantCounts is a special case of a SummarizedExperiment object. SummarizedExperiment objects have counts or expression values stored as "assays" and sample and row information stored as "colData" and "rowData" respectively. For an SGVariantCounts object "rowData" has to be an SGVariants object. The accessor function variantType() also works on SGVariantCounts objects. 

Finally variantType is a "CharacterList" because a variant can be part of more than one canonical event. For example Fig. 6I in our paper shows a case where variant 1 can be considered a mutually exclusive exon with respect to variant 2 or it can be considered a cassette exon that can be skipped (variant 3). Because variantType is a CharacterList it is more complicated to do the subsetting. Here are more detailed step-by-step instructions

> ## extract the variant type information
> vt <- variantType(sgvc_pred)
> 
> ## select variants for which at least one entry matches "SE"
> i <- sapply(vt, function(x) { any(grepl("SE", x)) })
> 
> ## subset the SGVariantCounts object
> sgvc_SE <- sgvc_pred[i, ]

I hope this helps let me know if you have more questions. 

Leonard

 

ADD COMMENT

Login before adding your answer.

Traffic: 356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6