Question

ColData use only certain variable within

0

Entering edit mode

Bine ▴ 40

@bine-23912

Last seen 12 days ago

UK

Dear all,

I have been trying many things now, but I think there must be an easier way for my quite simple problem.

My colData has Sample.Site = Heart, Liver, ...

Now for my analysis I just want to compare Sample.Site=Heart between Males and femals. How can I filter the colData in a way that I only have Sample.Site=Heart?

Thank you so much for any idea, Bine

DESeq2 • 1.8k views

ADD COMMENT • link 2.9 years ago Bine ▴ 40

score 1 · Answer 1 · 2021-05-26

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

The SummarizedExperiment class works just like a data.frame or matrix when you use the [ operator. So you would do just what one would expect. Using the output from ?SummarizedExperiment

> colData(se)
DataFrame with 6 rows and 1 column
    Treatment
  <character>
A        ChIP
B       Input
C        ChIP
D       Input
E        ChIP
F       Input

> se_sub <- se[,colData(se)$Treatment == "Input"]
> se_sub
class: RangedSummarizedExperiment 
dim: 200 3 
metadata(0):
assays(1): counts
rownames: NULL
rowData names(1): feature_id
colnames(3): B D F
colData names(1): Treatment
> colData(se_sub)
DataFrame with 3 rows and 1 column
    Treatment
  <character>
B       Input
D       Input
F       Input

You can also make different assumptions, such as assuming (or, like, checking) that the within-group variability is pretty consistent across groups, and just fitting the model in such a way that you can make the between-sex comparisons for heart using all your data. Or alternatively, you can specify a design that stratifies your model fit internally, using the / operator, so something like

design <- model.matrix(~Sample.Site/sex + othercovariates - 1, colData(se))

So an explicit stratification of your data is not always necessary or desirable.

ADD COMMENT • link 2.9 years ago James W. MacDonald 65k

1

Entering edit mode

Ah, I thought there was a shortcut... You don't need to extract the colData to subset

> se_sub <- se[,se$Treatment == "Input"]

Does the same thing

ADD REPLY • link 2.9 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you. I wonder how would I remove the samples which are not from heart from the count data then?

ADD REPLY • link 2.9 years ago Bine ▴ 40

1

Entering edit mode

Please re-read my answer more carefully. It makes no sense to have a function that subsets one part of the SummarizedExperiment object and leaves another part unchanged. The whole idea behind encapsulating the data in the SummarizedExperiment object is to allow end users to be able to easily subset the object without having to worry if the colData still match up with the columns of the assays, or if the rowRanges still match up with the rows of the assays.

ADD REPLY • link 2.9 years ago James W. MacDonald 65k

0

Entering edit mode

ok you are saying like this my countdata is already taken care of and I do not need to manipulate it separately. Sorry i am still very new to these summarized experiment.

ADD REPLY • link 2.9 years ago Bine ▴ 40

1

Entering edit mode

There is a vignette for the SummarizedExperiment package. I often make the point that Open Source software is free in the sense that you don't have to pay money, but there is a cost in your time and effort to understand how the software works. If you plan on using R/Bioconductor to any extent, then you will need to get accustomed to seeking out and reading the information that is made available to you, and the vignette is the very first thing you should read. Do note that the point I make above is in the second paragraph of the introduction! So you wouldn't have to read far to already know this.

ADD REPLY • link 2.9 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you, got it. Just confirming for another person who might read this, my last comment is correct.

ADD REPLY • link 2.9 years ago Bine ▴ 40

0

Entering edit mode

One more question, since you said it works like a dataframe I assume I can do this in case I want to use two heart + lung:

dds0_sub <- dds0[,dds0$Sample.Site == "heart"]
dds0_sub_1 <- dds0[,dds0$Sample.Site == "lung"]

dds0_sub<- cbind(dds0_sub,dds0_sub_1)

ADD REPLY • link 2.9 years ago Bine ▴ 40

1

Entering edit mode

Or, rather just:

ddsub <- dds0[, dds0$Sample.Site %in% c("heart", "lung")]

ADD REPLY • link 2.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

thank you :)

ADD REPLY • link 2.9 years ago Bine ▴ 40

0

Entering edit mode

I was today years old when I learned that / did something in R's model.matrix formula.

I guess I shouldn't be surprised since * is a thing, but man ... still wet behind the ears, I guess ...

Thanks for the tutelage, Jim!

ADD REPLY • link 2.9 years ago Steve Lianoglou ★ 13k

1

Entering edit mode

Of late I have been working with Epidemiologists, and if there is something they like more than stratifying I have yet to find it. Oh, wait. Power calculations. So like I was saying, other than power calculations nothing pleases an Epidemiologist more than stratifying. And nothing pleases me less than cutting data up into ever smaller chunks, so...

Notably, the / operator isn't even mentioned in ?formula. I know about it from Modern Applied Statistics with S, because V&R are old school legit.

ADD REPLY • link 2.9 years ago James W. MacDonald 65k