Adding SummarizedExperiment function
2
0
Entering edit mode
rbronste ▴ 60
@rbronste-12189
Last seen 4.4 years ago

Kind of a basic question, whats the easiest way to add one of these that represents a column in a custom GRanges object? Thanks.

summarizedexperiment • 1.1k views
ADD COMMENT
0
Entering edit mode

What does 'add one of these' mean in this context?

ADD REPLY
0
Entering edit mode

I just mean in terms of a GRanges that has for instance columns like seqnames, start, end - and can be queried 

with for instance:  stuff <- stuff.DB[seqnames(stuff.DB) == 'chrY']

Trying to figure out how to do the same for other columns like FDR etc, that are not in SummarizedExperiments 

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 24 minutes ago
United States

You can add anything you want in the mcols of the GRanges object and query on that at will. As a test, let's use the example for SummarizedExperiment:

> library(SummarizedExperiment)
> example("SummarizedExperiment")
> rse
class: RangedSummarizedExperiment
dim: 200 6
metadata(0):
assays(1): counts
rownames: NULL
rowData names(1): feature_id
colnames(6): A B ... E F
colData names(1): Treatment
> rowRanges(rse)
GRanges object with 200 ranges and 1 metadata column:
        seqnames           ranges strand |  feature_id
           <Rle>        <IRanges>  <Rle> | <character>
    [1]     chr1 [556101, 556200]      - |       ID001
    [2]     chr1 [792975, 793074]      - |       ID002
    [3]     chr1 [263755, 263854]      - |       ID003
    [4]     chr1 [714331, 714430]      + |       ID004
    [5]     chr1 [900677, 900776]      - |       ID005
    ...      ...              ...    ... .         ...
  [196]     chr2 [495890, 495989]      - |       ID196
  [197]     chr2 [222582, 222681]      - |       ID197
  [198]     chr2 [666857, 666956]      + |       ID198
  [199]     chr2 [404246, 404345]      - |       ID199
  [200]     chr2 [540493, 540592]      - |       ID200
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> z <- rse[mcols(rse)$feature_id %in% paste0("ID", sprintf("%03d", 1:5)),]
> rowRanges(z)
GRanges object with 5 ranges and 1 metadata column:
      seqnames           ranges strand |  feature_id
         <Rle>        <IRanges>  <Rle> | <character>
  [1]     chr1 [556101, 556200]      - |       ID001
  [2]     chr1 [792975, 793074]      - |       ID002
  [3]     chr1 [263755, 263854]      - |       ID003
  [4]     chr1 [714331, 714430]      + |       ID004
  [5]     chr1 [900677, 900776]      - |       ID005
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> assays(z)[[1]]
            A        B        C        D        E        F
[1,] 9.390704 9.088845 9.726846 9.569678 9.744423 9.664979
[2,] 9.823552 7.222012 5.752299 9.486667 9.746595 8.313257
[3,] 9.496478 7.672814 9.604351 8.800272 8.292126 9.857548
[4,] 8.580828 9.613288 9.681698 9.270826 8.690414 9.233475
[5,] 9.596227 8.729721 9.739728 8.628168 8.309004 6.797500
> colData(z)
DataFrame with 6 rows and 1 column
    Treatment
  <character>
A        ChIP
B       Input
C        ChIP
D       Input
E        ChIP
F       Input

And you can have as many columns in the mcols slot, and add them whenever

> mcols(rse)$whatevs <- rnorm(nrow(rse))
> mcols(rse)$addonemore <- rnorm(nrow(rse))
> rowRanges(rse)
GRanges object with 200 ranges and 3 metadata columns:
        seqnames           ranges strand |  feature_id     whatevs  addonemore
           <Rle>        <IRanges>  <Rle> | <character>   <numeric>   <numeric>
    [1]     chr1 [556101, 556200]      - |       ID001 -0.05584487  -0.6773722
    [2]     chr1 [792975, 793074]      - |       ID002  1.01721394  -0.8628047
    [3]     chr1 [263755, 263854]      - |       ID003  0.67180836   0.4902122
    [4]     chr1 [714331, 714430]      + |       ID004  0.03497479  -2.5660873
    [5]     chr1 [900677, 900776]      - |       ID005 -1.58957034   1.3208983
    ...      ...              ...    ... .         ...         ...         ...
  [196]     chr2 [495890, 495989]      - |       ID196 -0.06389269 -2.75149592
  [197]     chr2 [222582, 222681]      - |       ID197 -1.55996247  1.27020433
  [198]     chr2 [666857, 666956]      + |       ID198  0.36173020  0.49610959
  [199]     chr2 [404246, 404345]      - |       ID199 -1.24144376 -0.31007126
  [200]     chr2 [540493, 540592]      - |       ID200 -0.60194563  0.02290882
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

 

ADD COMMENT
0
Entering edit mode
rbronste ▴ 60
@rbronste-12189
Last seen 4.4 years ago

I guess I am still a little confused. I am using a DiffBind output that has a number of columns according to they sampleSheet. The basic thing I want to do is to be able to filter and sort by any specific column or multiple columns simultaneously - such as FDR and fold change.

 

ADD COMMENT
0
Entering edit mode

If you want to make a comment, please use the ADD COMMENT button rather than the 'Add your answer' box, which is intended for answers, not comments.

Your questions are needlessly mysterious. If you have an example of what you are trying to do, then maybe we can give some pointers.

But so far you are asking generalized questions like 'I want to filter and sort' which are just basic R manipulations. If you are having problems with basic R stuff, you should read 'An Introduction to R', and note that a SummarizedExperiment is intended to act as if it were a data.frame, so anything you can do with a data.frame will work pretty much the same way.

ADD REPLY

Login before adding your answer.

Traffic: 873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6