Definitions

The arkhe package provides a set of S4 classes for archaeological data matrices. These new classes represent different special types of matrix.

  • Numeric matrix:
    • CountMatrix represents absolute frequency data,
    • AbundanceMatrix represents relative frequency data,
    • OccurrenceMatrix represents a co-occurrence matrix,
    • SimilarityMatrix represents a (dis)similarity matrix,
  • Logical matrix:
    • IncidenceMatrix represents presence/absence data,
    • StratigraphicMatrix represents stratigraphic relationships.

It assumes that you keep your data tidy: each variable (taxon/type) must be saved in its own column and each observation (assemblage/sample) must be saved in its own row. Note that missing values are not allowed.

The internal structure of S4 classes implemented in arkhe is depicted in the UML class diagram in the following figure.

UML class diagram of the S4 classes structure.

UML class diagram of the S4 classes structure.

Numeric matrix

Absolute frequency matrix (CountMatrix)

We denote the \(m \times p\) count matrix by \(A = \left[ a_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} a_{i \cdot} = \sum_{j = 1}^{p} a_{ij} && a_{\cdot j} = \sum_{i = 1}^{m} a_{ij} && a_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} a_{ij} && \forall a_{ij} \in \mathbb{N} \end{align}\]

Relative frequency matrix (AbundanceMatrix)

A frequency matrix represents relative abundances.

We denote the \(m \times p\) frequency matrix by \(B = \left[ b_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} b_{i \cdot} = \sum_{j = 1}^{p} b_{ij} = 1 && b_{\cdot j} = \sum_{i = 1}^{m} b_{ij} && b_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} b_{ij} && \forall b_{ij} \in \left[ 0,1 \right] \end{align}\]

Co-occurrence matrix (OccurrenceMatrix)

A co-occurrence matrix is a symmetric matrix with zeros on its main diagonal, which works out how many times (expressed in percent) each pairs of taxa occur together in at least one sample.

The \(p \times p\) co-occurrence matrix \(D = \left[ d_{i,j} \right] ~\forall i,j \in \left[ 1,p \right]\) is defined over an \(m \times p\) abundance matrix \(A = \left[ a_{x,y} \right] ~\forall x \in \left[ 1,m \right], y \in \left[ 1,p \right]\) as:

\[ d_{i,j} = \sum_{x = 1}^{m} \bigcap_{y = i}^{j} a_{xy} \]

with row and column sums:

\[\begin{align} d_{i \cdot} = \sum_{j \geqslant i}^{p} d_{ij} && d_{\cdot j} = \sum_{i \leqslant j}^{p} d_{ij} && d_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} d_{ij} && \forall d_{ij} \in \mathbb{N} \end{align}\]

Logical matrix

Incidence matrix (IncidenceMatrix)

We denote the \(m \times p\) incidence matrix by \(C = \left[ c_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} c_{i \cdot} = \sum_{j = 1}^{p} c_{ij} && c_{\cdot j} = \sum_{i = 1}^{m} c_{ij} && c_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} c_{ij} && \forall c_{ij} \in \lbrace 0,1 \rbrace \end{align}\]

Usage

Many familiar methods and group generic functions are available for all *Matrix classes (such as length, dim, rowSums, rowMeans, sum, any, all…). In addition, all functions that call as.matrix first on their main argument should work (e. g. apply).

# Load packages
library(arkhe)

Create

These new classes are of simple use, on the same way as the base matrix:

set.seed(12345)
## Create a count data matrix
## Data are rounded to zero decimal places, then coerced with as.integer
CountMatrix(data = sample(0:10, 100, TRUE),
            nrow = 10, ncol = 10)
#> <CountMatrix: 10 x 10>
#>       col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
#> row1     2    6    2    3    9    7    9    3    3     6
#> row2     9    9    8    7    9   10    6    8    9     0
#> row3     7    0    3   10    2    3    6   10    3     2
#> row4     9    7    9    5    2    1    4    0    8     1
#> row5    10    6    6    8    2    2    6    2    1     4
#> row6     7    5    1    4    0    5    9    9    7     9
#> row7     1    0    3    2    9    2    7    6    9     5
#> row8     5    3   10    0    7    6    2    9    0     6
#> row9    10    7    8    0   10    9    4    9    8     8
#> row10    5    9    8    4    8    6   10    6    5     9

## Create an incidence (presence/absence) matrix
## Data are coerced to logical as by as.logical
IncidenceMatrix(data = sample(0:1, 100, TRUE),
                nrow = 10, ncol = 10)
#> <IncidenceMatrix: 10 x 10>
#>        col1  col2  col3  col4  col5  col6  col7  col8  col9 col10
#> row1   TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
#> row2   TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
#> row3   TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
#> row4   TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
#> row5  FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
#> row6   TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
#> row7   TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE
#> row8  FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE
#> row9  FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE
#> row10  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Note that an AbundanceMatrix can only be created by coercion (see below).

Coerce

arkhe uses coercing mechanisms (with validation methods) for data type conversions:

## Create a count matrix
A0 <- matrix(data = sample(0:10, 100, TRUE), nrow = 10, ncol = 10)

## Coerce to absolute frequencies
A1 <- as_count(A0)

## Coerce to relative frequencies
B <- as_abundance(A1)

## Row sums are internally stored before coercing to a frequency matrix
## (use get_totals() to get these values)
## This allows to restore the source data
A2 <- as_count(B)
all(A1 == A2)
#> [1] TRUE

## Coerce to presence/absence
C <- as_incidence(A1)

## Coerce to a co-occurrence matrix
D <- as_occurrence(A1)