likely bug in cbind() for DataFrame

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 10 months ago

United States

> df1 = DataFrame(A = c(1,2)) > df2 = DataFrame(B = c(1,2)) > rownames(df1) = c("a", "b") > df1 DataFrame with 2 rows and 1 column A <numeric> a 1 b 2 > cbind(df1, df2) DataFrame with 2 rows and 2 columns A B <numeric> <numeric> 1 1 1 2 2 2 rownames are removed. This does not happen for data.frame's. Kasper [[alternative HTML version deleted]]

• 863 views

ADD COMMENT • link updated 10.7 years ago by Michael Lawrence ★ 11k • written 10.7 years ago by Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

In general, DataFrame does less with rownames compared to data.frame. This was for simplicity and performance. So we could add that for the sake of consistency; it was just ignored at the beginning. On Thu, Aug 1, 2013 at 12:35 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > > df1 = DataFrame(A = c(1,2)) > > df2 = DataFrame(B = c(1,2)) > > rownames(df1) = c("a", "b") > > df1 > DataFrame with 2 rows and 1 column > A > <numeric> > a 1 > b 2 > > cbind(df1, df2) > DataFrame with 2 rows and 2 columns > A B > <numeric> <numeric> > 1 1 1 > 2 2 2 > > rownames are removed. This does not happen for data.frame's. > > Kasper > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Well, I was bitten by this by some custom code for manipulating the colData slot of a SummarizedExperiment - for this class, the sampleNames are stored exactly as the rownames of @colData so it becomes pretty important to keep. Now that I know, it is easy to work around in the code I had - I just save the rownames. Still, I was very surprised by this, and I think we should keep the rownames. Best, Kasper On Thu, Aug 1, 2013 at 3:58 PM, Michael Lawrence <lawrence.michael@gene.com>wrote: > In general, DataFrame does less with rownames compared to data.frame. This > was for simplicity and performance. So we could add that for the sake of > consistency; it was just ignored at the beginning. > > > On Thu, Aug 1, 2013 at 12:35 PM, Kasper Daniel Hansen < > kasperdanielhansen@gmail.com> wrote: > >> > df1 = DataFrame(A = c(1,2)) >> > df2 = DataFrame(B = c(1,2)) >> > rownames(df1) = c("a", "b") >> > df1 >> DataFrame with 2 rows and 1 column >> A >> <numeric> >> a 1 >> b 2 >> > cbind(df1, df2) >> DataFrame with 2 rows and 2 columns >> A B >> <numeric> <numeric> >> 1 1 1 >> 2 2 2 >> >> rownames are removed. This does not happen for data.frame's. >> >> Kasper >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 10.7 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Please try 1.19.20. This is a pretty big change, so hopefully it plays well with everyone's code. There's some chance that duplicate row names can be introduced (and throw an error) while before they were simply dropped. This is because we take the names from any vector argument as potential rownames, just like data.frame. On Thu, Aug 1, 2013 at 1:07 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > Well, I was bitten by this by some custom code for manipulating the > colData slot of a SummarizedExperiment - for this class, the sampleNames > are stored exactly as the rownames of @colData so it becomes pretty > important to keep. Now that I know, it is easy to work around in the code > I had - I just save the rownames. > > Still, I was very surprised by this, and I think we should keep the > rownames. > > Best, > Kasper > > > On Thu, Aug 1, 2013 at 3:58 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> In general, DataFrame does less with rownames compared to data.frame. >> This was for simplicity and performance. So we could add that for the sake >> of consistency; it was just ignored at the beginning. >> >> >> On Thu, Aug 1, 2013 at 12:35 PM, Kasper Daniel Hansen < >> kasperdanielhansen@gmail.com> wrote: >> >>> > df1 = DataFrame(A = c(1,2)) >>> > df2 = DataFrame(B = c(1,2)) >>> > rownames(df1) = c("a", "b") >>> > df1 >>> DataFrame with 2 rows and 1 column >>> A >>> <numeric> >>> a 1 >>> b 2 >>> > cbind(df1, df2) >>> DataFrame with 2 rows and 2 columns >>> A B >>> <numeric> <numeric> >>> 1 1 1 >>> 2 2 2 >>> >>> rownames are removed. This does not happen for data.frame's. >>> >>> Kasper >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 10.7 years ago Michael Lawrence ★ 11k

Login before adding your answer.