The arkhe package provides a set of S4 classes for archaeological data matrices that extend the basic matrix data type. These new classes represent different special types of matrix.

  • Numeric matrix:
    • CountMatrix represents absolute frequency data,
    • AbundanceMatrix represents relative frequency data,
    • OccurrenceMatrix represents a co-occurrence matrix,
    • SimilarityMatrix represents a (dis)similarity matrix,
  • Logical matrix:
    • IncidenceMatrix represents presence/absence data,
    • StratigraphicMatrix represents stratigraphic relationships.

It assumes that you keep your data tidy: each variable (taxon/type) must be saved in its own column and each observation (assemblage/sample) must be saved in its own row. Note that missing values are not allowed.

The internal structure of S4 classes implemented in arkhe is depicted in the UML class diagram in the following figure.

UML class diagram of the S4 classes structure.

UML class diagram of the S4 classes structure.

Numeric matrix

Absolute frequency matrix (CountMatrix)

We denote the \(m \times p\) count matrix by \(A = \left[ a_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} a_{i \cdot} = \sum_{j = 1}^{p} a_{ij} && a_{\cdot j} = \sum_{i = 1}^{m} a_{ij} && a_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} a_{ij} && \forall a_{ij} \in \mathbb{N} \end{align}\]

Relative frequency matrix (AbundanceMatrix)

A frequency matrix represents relative abundances.

We denote the \(m \times p\) frequency matrix by \(B = \left[ b_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} b_{i \cdot} = \sum_{j = 1}^{p} b_{ij} = 1 && b_{\cdot j} = \sum_{i = 1}^{m} b_{ij} && b_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} b_{ij} && \forall b_{ij} \in \left[ 0,1 \right] \end{align}\]

Co-occurrence matrix (OccurrenceMatrix)

A co-occurrence matrix is a symmetric matrix with zeros on its main diagonal, which works out how many times (expressed in percent) each pairs of taxa occur together in at least one sample.

The \(p \times p\) co-occurrence matrix \(D = \left[ d_{i,j} \right] ~\forall i,j \in \left[ 1,p \right]\) is defined over an \(m \times p\) abundance matrix \(A = \left[ a_{x,y} \right] ~\forall x \in \left[ 1,m \right], y \in \left[ 1,p \right]\) as:

\[ d_{i,j} = \sum_{x = 1}^{m} \bigcap_{y = i}^{j} a_{xy} \]

with row and column sums:

\[\begin{align} d_{i \cdot} = \sum_{j \geqslant i}^{p} d_{ij} && d_{\cdot j} = \sum_{i \leqslant j}^{p} d_{ij} && d_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} d_{ij} && \forall d_{ij} \in \mathbb{N} \end{align}\]

Logical matrix

Incidence matrix (IncidenceMatrix)

We denote the \(m \times p\) incidence matrix by \(C = \left[ c_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} c_{i \cdot} = \sum_{j = 1}^{p} c_{ij} && c_{\cdot j} = \sum_{i = 1}^{m} c_{ij} && c_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} c_{ij} && \forall c_{ij} \in \lbrace 0,1 \rbrace \end{align}\]



These new classes are of simple use, on the same way as the base matrix:

Note that an AbundanceMatrix can only be created by coercion (see below).