Question

How to create an aldex.clr object from a "selex" data table

0

Entering edit mode

Greg • 0

@6d040b9c

Last seen 3.0 years ago

United States

I'm new to using ALDEx2, so have a very simple question about how to create the aldex.clr object described here https://www.bioconductor.org/packages/devel/bioc/manuals/ALDEx2/man/ALDEx2.pdf using a data frame. The documentation provides this example:

# The 'reads' data.frame or
        # RangedSummarizedExperiment object should
        # have row and column names that are unique,
        # and looks like the following:
        #
        #              T1a T1b  T2  T3  N1  N2  Nx
        #   Gene_00001   0   0   2   0   0   1   0
        #   Gene_00002  20   8  12   5  19  26  14
        #   Gene_00003   3   0   2   0   0   0   1
        #   Gene_00004  75  84 241 149 271 257 188
        #   Gene_00005  10  16   4   0   4  10  10
        #   Gene_00006 129 126 451 223 243 149 209
        #       ... many more rows ...
        data(selex)
        #subset for efficiency
        selex <- selex[1201:1600,]
        conds <- c(rep("NS", 7), rep("S", 7))
        x <- aldex.clr(selex, conds, mc.samples=2, denom="all", verbose=FALSE)

In my case, I need to load the selex data like this:

reads_df <- read.table(file="~/selex.txt", header=TRUE, sep="\t", dec=".", as.is=FALSE);

We now have the reads_df:

head(reads_df):
        X X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS
1 S:D:A:D    524    355    443    489    465    509    754     0     0     0
2 S:D:A:E    588    383    564    462    559    564    961     5     5    11
3 S:E:A:D    596    318    542    443    605    459   1022    77    44     8
4 S:E:A:E    535    352    549    514    555    465   1476   718   168    76
5 S:D:C:D    218    104    192    193    177    190    709     0     0     0
6 S:D:C:E    269    180    151    234    281    269    467     1     0     0
  X1_DS X2_AS X2_CS X2_DS
1    13   675     1     4
2   437    10     4     1
3    12     4     2    89
4   459    10    31     5
5     0     1     0     0
6     4     0     0     0

Here are the column types:

sapply(reads_df, class)
 "factor" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"

The first column is the “feature” column, which is character, not integer, and so throws an exception when attempting to create the aldex.clr object:

conds <- c(rep("NS", 7), rep("S", 7));
aldex_clr_df <- aldex.clr(reads_df, conds=conds, mc.samples=128, denom="all");

Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
Calls: aldex.clr -> aldex.clr -> aldex.clr.function -> apply
Execution halted

I'm sure that I must be missing something simple here, but I'm not quite sure what. I so appreciate any help with this.

Thanks!

ALDEx2 • 1.8k views

ADD COMMENT • link 3.0 years ago Greg • 0

0

Entering edit mode

I have no experience with ALDEx2, but based on what you show regarding expected input, and your input + error, I would say that the rownames of your data (a data.frame?) should be the content of the 1st column (X) (and not 1, 2, 3 etc.). Next you should remove this column X from your data.

Something like:

rownames(reads_df) <- reads_df$X
reads_df <- reads_df[, -1]

Also, please realize that the use of colons (:) in names is syntactically not valid in R. You should replace them. See for example ?make.names.

ADD REPLY • link 3.0 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thanks for your response on this. Yes, reads_df is a data frame, created with this code:

reads_df <- read.table(file="~/selex.txt", header=TRUE, sep="\t", dec=".", as.is=FALSE);

Here are a few lines of the selex.txt file, with the first line being the header:

        1_ANS   1_BNS   1_CNS   1_DNS   2_ANS   2_CNS   2_DNS   1_AS    1_BS    1_CS    1_DS    2_AS    2_CS    2_DS
A:D:A:D 347     271     396     317     391     260     620     8       1       1       2       0       1       3
A:D:A:E 436     361     461     241     410     387     788     83      6       8       12      0       3       2
A:E:A:D 476     288     378     215     412     430     591     997     3747    662     421     37      75      6353

The values in the first column are character strings - from the aldex.clr perspective, I believe these are the features. So the colons here are simply part of the string.

The row numbers are a result of printing the data frame to show what it contains. The first column in the data frame is actually the X column containing the strings. The call to the function aldex.clr() is failing because it expects integers, and so doesn't handle the character column.

I've tried removing the first column:

# Extract numeric columns.
numeric_cols <- unlist(lapply(reads_df, is.numeric));
numeric_reads_df <- reads_df[ , numeric_cols];
aldex_clr_df <- aldex.clr(numeric_reads_df, mc.samples=opt$num_mc_samples, denom=opt$denom);

The aldex.clr object is built:

typeof(aldex_clr_df):  S4 
class(aldex_clr_df):  aldex.clr

But the features are now incorrect:

    features <- getFeatures(aldex_clr_df);
    cat("features: ", features, "\n");

features:  1.156718 1.536168 1.722722 1.629469 0.4310803 0.4348671 0.3411319 0.5214256 0.3898375 0.4238301 0.5456295 0.6670449 0.3933128 0.5519186 0.8365554 1.004358 -0.5114089 0.09291021 0.6510461 0.375262

So I'm wondering how to load the data frame in such a way that the contents mimic what is being done in the documentation example:

data(selex)

ADD REPLY • link 3.0 years ago Greg • 0

1

Entering edit mode

I am not sure if I fully got your question...

After saving the 4x14 table from your 2nd post in a txt file named select.txt, I am able to obtain the expected results by assigning the rownames to be the content of the first column; just add the argument row.names=1 when calling read.table.

> library(ALDEx2)
> 
> reads_df <- read.table(file="selex.txt", row.names=1, header=TRUE, sep="\t", dec=".", as.is=FALSE)
> dim(reads_df)
[1]  3 14
> head(reads_df)
        X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS X1_DS
A:D:A:D    347    271    396    317    391    260    620     8     1     1     2
A:D:A:E    436    361    461    241    410    387    788    83     6     8    12
A:E:A:D    476    288    378    215    412    430    591   997  3747   662   421
        X2_AS X2_CS X2_DS
A:D:A:D     0     1     3
A:D:A:E     0     3     2
A:E:A:D    37    75  6353
> 
> numeric_cols <- unlist(lapply(reads_df, is.numeric))
> numeric_reads_df <- reads_df[ , numeric_cols]
> dim(reads_df)
[1]  3 14
> head(reads_df)
        X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS X1_DS
A:D:A:D    347    271    396    317    391    260    620     8     1     1     2
A:D:A:E    436    361    461    241    410    387    788    83     6     8    12
A:E:A:D    476    288    378    215    412    430    591   997  3747   662   421
        X2_AS X2_CS X2_DS
A:D:A:D     0     1     3
A:D:A:E     0     3     2
A:E:A:D    37    75  6353
> 
> aldex_clr_df <- aldex.clr(numeric_reads_df)
no conditions provided: forcing denom = 'all'
no conditions provided: forcing conds = 'NA'
operating in serial mode
computing center with all features
> 
> typeof(aldex_clr_df)
[1] "S4"
> class(aldex_clr_df)
[1] "aldex.clr"
attr(,"package")
[1] "ALDEx2"
> 
> getFeatures(aldex_clr_df)
   A:D:A:D    A:D:A:E    A:E:A:D 
-0.2738907  0.1349089  0.1389818 
>

ADD REPLY • link 3.0 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thanks so much for this simple fix. The row.names is what was missing from my read.tables() call. This code now works as expected.

conditions_vector <- c(rep("NS", 7), rep("S", 7));
reads_df <- read.table(file=opt$reads, header=TRUE, sep="\t", row.names=1, dec=".", as.is=FALSE);
aldex_clr_obj <- aldex.clr(reads_df, mc.samples=128, denom="all");
...
cat("typeof(aldex_clr_obj): ", typeof(aldex_clr_obj), "\n");
cat("class(aldex_clr_obj): ", class(aldex_clr_obj), "\n");
cat("length(getMonteCarloInstances(aldex_clr_obj)): ", length(getMonteCarloInstances(aldex_clr_obj)), "\n");
cat("getSampleIDs(aldex_clr_obj): ", getSampleIDs(aldex_clr_obj), "\n");
cat("getFeatureNames(aldex_clr_obj): ", getFeatureNames(aldex_clr_obj), "\n");

typeof(aldex_clr_obj):  S4 
class(aldex_clr_obj):  aldex.clr 
length(getMonteCarloInstances(aldex_clr_obj)):  14 
getSampleIDs(aldex_clr_obj):  X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS X1_DS X2_AS X2_CS X2_DS 
getFeatureNames(aldex_clr_obj):  A:D:A:D A:D:A:E A:E:A:D A:E:A:E A:D:C:D A:D:C:E A:E:C:D A:E:C:E A:D:D:D...

ADD REPLY • link 3.0 years ago Greg • 0