How to create an aldex.clr object from a "selex" data table
0
0
Entering edit mode
Greg • 0
@6d040b9c
Last seen 2.5 years ago
United States

I'm new to using ALDEx2, so have a very simple question about how to create the aldex.clr object described here https://www.bioconductor.org/packages/devel/bioc/manuals/ALDEx2/man/ALDEx2.pdf using a data frame. The documentation provides this example:

# The 'reads' data.frame or
        # RangedSummarizedExperiment object should
        # have row and column names that are unique,
        # and looks like the following:
        #
        #              T1a T1b  T2  T3  N1  N2  Nx
        #   Gene_00001   0   0   2   0   0   1   0
        #   Gene_00002  20   8  12   5  19  26  14
        #   Gene_00003   3   0   2   0   0   0   1
        #   Gene_00004  75  84 241 149 271 257 188
        #   Gene_00005  10  16   4   0   4  10  10
        #   Gene_00006 129 126 451 223 243 149 209
        #       ... many more rows ...
        data(selex)
        #subset for efficiency
        selex <- selex[1201:1600,]
        conds <- c(rep("NS", 7), rep("S", 7))
        x <- aldex.clr(selex, conds, mc.samples=2, denom="all", verbose=FALSE)

In my case, I need to load the selex data like this:

reads_df <- read.table(file="~/selex.txt", header=TRUE, sep="\t", dec=".", as.is=FALSE);

We now have the reads_df:

head(reads_df):
        X X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS
1 S:D:A:D    524    355    443    489    465    509    754     0     0     0
2 S:D:A:E    588    383    564    462    559    564    961     5     5    11
3 S:E:A:D    596    318    542    443    605    459   1022    77    44     8
4 S:E:A:E    535    352    549    514    555    465   1476   718   168    76
5 S:D:C:D    218    104    192    193    177    190    709     0     0     0
6 S:D:C:E    269    180    151    234    281    269    467     1     0     0
  X1_DS X2_AS X2_CS X2_DS
1    13   675     1     4
2   437    10     4     1
3    12     4     2    89
4   459    10    31     5
5     0     1     0     0
6     4     0     0     0

Here are the column types:

sapply(reads_df, class)
 "factor" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"

The first column is the “feature” column, which is character, not integer, and so throws an exception when attempting to create the aldex.clr object:

conds <- c(rep("NS", 7), rep("S", 7));
aldex_clr_df <- aldex.clr(reads_df, conds=conds, mc.samples=128, denom="all");

Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
Calls: aldex.clr -> aldex.clr -> aldex.clr.function -> apply
Execution halted

I'm sure that I must be missing something simple here, but I'm not quite sure what. I so appreciate any help with this.

Thanks!

ALDEx2 • 1.6k views
ADD COMMENT
0
Entering edit mode

I have no experience with ALDEx2, but based on what you show regarding expected input, and your input + error, I would say that the rownames of your data (a data.frame?) should be the content of the 1st column (X) (and not 1, 2, 3 etc.). Next you should remove this column X from your data.

Something like:

rownames(reads_df) <- reads_df$X
reads_df <- reads_df[, -1]

Also, please realize that the use of colons (:) in names is syntactically not valid in R. You should replace them. See for example ?make.names.

ADD REPLY
0
Entering edit mode

Thanks for your response on this. Yes, reads_df is a data frame, created with this code:

reads_df <- read.table(file="~/selex.txt", header=TRUE, sep="\t", dec=".", as.is=FALSE);

Here are a few lines of the selex.txt file, with the first line being the header:

        1_ANS   1_BNS   1_CNS   1_DNS   2_ANS   2_CNS   2_DNS   1_AS    1_BS    1_CS    1_DS    2_AS    2_CS    2_DS
A:D:A:D 347     271     396     317     391     260     620     8       1       1       2       0       1       3
A:D:A:E 436     361     461     241     410     387     788     83      6       8       12      0       3       2
A:E:A:D 476     288     378     215     412     430     591     997     3747    662     421     37      75      6353

The values in the first column are character strings - from the aldex.clr perspective, I believe these are the features. So the colons here are simply part of the string.

The row numbers are a result of printing the data frame to show what it contains. The first column in the data frame is actually the X column containing the strings. The call to the function aldex.clr() is failing because it expects integers, and so doesn't handle the character column.

I've tried removing the first column:

# Extract numeric columns.
numeric_cols <- unlist(lapply(reads_df, is.numeric));
numeric_reads_df <- reads_df[ , numeric_cols];
aldex_clr_df <- aldex.clr(numeric_reads_df, mc.samples=opt$num_mc_samples, denom=opt$denom);

The aldex.clr object is built:

typeof(aldex_clr_df):  S4 
class(aldex_clr_df):  aldex.clr

But the features are now incorrect:

    features <- getFeatures(aldex_clr_df);
    cat("features: ", features, "\n");
features:  1.156718 1.536168 1.722722 1.629469 0.4310803 0.4348671 0.3411319 0.5214256 0.3898375 0.4238301 0.5456295 0.6670449 0.3933128 0.5519186 0.8365554 1.004358 -0.5114089 0.09291021 0.6510461 0.375262

So I'm wondering how to load the data frame in such a way that the contents mimic what is being done in the documentation example:

data(selex)
ADD REPLY
1
Entering edit mode

I am not sure if I fully got your question...

After saving the 4x14 table from your 2nd post in a txt file named select.txt, I am able to obtain the expected results by assigning the rownames to be the content of the first column; just add the argument row.names=1 when calling read.table.

> library(ALDEx2)
> 
> reads_df <- read.table(file="selex.txt", row.names=1, header=TRUE, sep="\t", dec=".", as.is=FALSE)
> dim(reads_df)
[1]  3 14
> head(reads_df)
        X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS X1_DS
A:D:A:D    347    271    396    317    391    260    620     8     1     1     2
A:D:A:E    436    361    461    241    410    387    788    83     6     8    12
A:E:A:D    476    288    378    215    412    430    591   997  3747   662   421
        X2_AS X2_CS X2_DS
A:D:A:D     0     1     3
A:D:A:E     0     3     2
A:E:A:D    37    75  6353
> 
> numeric_cols <- unlist(lapply(reads_df, is.numeric))
> numeric_reads_df <- reads_df[ , numeric_cols]
> dim(reads_df)
[1]  3 14
> head(reads_df)
        X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS X1_DS
A:D:A:D    347    271    396    317    391    260    620     8     1     1     2
A:D:A:E    436    361    461    241    410    387    788    83     6     8    12
A:E:A:D    476    288    378    215    412    430    591   997  3747   662   421
        X2_AS X2_CS X2_DS
A:D:A:D     0     1     3
A:D:A:E     0     3     2
A:E:A:D    37    75  6353
> 
> aldex_clr_df <- aldex.clr(numeric_reads_df)
no conditions provided: forcing denom = 'all'
no conditions provided: forcing conds = 'NA'
operating in serial mode
computing center with all features
> 
> typeof(aldex_clr_df)
[1] "S4"
> class(aldex_clr_df)
[1] "aldex.clr"
attr(,"package")
[1] "ALDEx2"
> 
> getFeatures(aldex_clr_df)
   A:D:A:D    A:D:A:E    A:E:A:D 
-0.2738907  0.1349089  0.1389818 
> 
ADD REPLY
0
Entering edit mode

Thanks so much for this simple fix. The row.names is what was missing from my read.tables() call. This code now works as expected.

conditions_vector <- c(rep("NS", 7), rep("S", 7));
reads_df <- read.table(file=opt$reads, header=TRUE, sep="\t", row.names=1, dec=".", as.is=FALSE);
aldex_clr_obj <- aldex.clr(reads_df, mc.samples=128, denom="all");
...
cat("typeof(aldex_clr_obj): ", typeof(aldex_clr_obj), "\n");
cat("class(aldex_clr_obj): ", class(aldex_clr_obj), "\n");
cat("length(getMonteCarloInstances(aldex_clr_obj)): ", length(getMonteCarloInstances(aldex_clr_obj)), "\n");
cat("getSampleIDs(aldex_clr_obj): ", getSampleIDs(aldex_clr_obj), "\n");
cat("getFeatureNames(aldex_clr_obj): ", getFeatureNames(aldex_clr_obj), "\n");
typeof(aldex_clr_obj):  S4 
class(aldex_clr_obj):  aldex.clr 
length(getMonteCarloInstances(aldex_clr_obj)):  14 
getSampleIDs(aldex_clr_obj):  X1_ANS X1_BNS X1_CNS X1_DNS X2_ANS X2_CNS X2_DNS X1_AS X1_BS X1_CS X1_DS X2_AS X2_CS X2_DS 
getFeatureNames(aldex_clr_obj):  A:D:A:D A:D:A:E A:E:A:D A:E:A:E A:D:C:D A:D:C:E A:E:C:D A:E:C:E A:D:D:D...
ADD REPLY

Login before adding your answer.

Traffic: 328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6