damuta.utils module

alp_B(data, B)

Compute the sum of multinomial log-likelihoods for a dataset and probability matrix.

Parameters:
  • data (np.ndarray) – Observed counts, shape (n_samples, n_categories).

  • B (np.ndarray) – Probability matrix, shape (n_samples, n_categories).

Returns:

Total log-likelihood for the dataset.

Return type:

float

alr(x, e=1e-12)

Compute the additive log-ratio (ALR) transformation for compositional data.

Parameters:
  • x (np.ndarray) – Input array, shape (n, d).

  • e (float, optional) – Small value added for numerical stability (default: 1e-12).

Returns:

ALR-transformed array, shape (n, d-1).

Return type:

np.ndarray

alr_inv(y)

Inverse additive log-ratio (ALR) transformation.

Parameters:

y (np.ndarray) – ALR-transformed array, shape (n, d-1).

Returns:

Reconstructed compositional data, shape (n, d).

Return type:

np.ndarray

dirichlet(node_name, a, shape, scale=1, testval=None, observed=None)

Create a reparameterized Dirichlet distribution using Gamma variables for use in PyMC3 models.

Parameters:
  • node_name (str) – Name for the node in the model.

  • a (array-like) – Concentration parameters for the Dirichlet distribution.

  • shape (tuple) – Shape of the resulting variable.

  • scale (float, optional) – Scale parameter for the Gamma distribution (default: 1).

  • testval (array-like, optional) – Test value for the variable.

  • observed (array-like, optional) – Observed values for the variable.

Returns:

A deterministic node representing the Dirichlet variable.

Return type:

pm.Deterministic

flatten_eta(eta)

Flatten a 3D eta array (p, k, m) to 2D (k, c) for compatibility.

Parameters:

eta (np.ndarray) – Misrepair signature array, shape (p, k, m).

Returns:

Flattened array, shape (k, 6).

Return type:

np.ndarray

get_tau(phi, eta)

Compute the full 96-channel signature matrix from damage (phi) and misrepair (eta) signatures.

Parameters:
  • phi (np.ndarray) – Damage signatures, shape (n_damage, 32).

  • eta (np.ndarray) – Misrepair signatures, shape (n_misrepair, 2, 3).

Returns:

Combined signature matrix, shape (n_damage * n_misrepair, 96).

Return type:

np.ndarray

kmeans_alr(data, nsig, rng=Generator(PCG64) at 0x7D60AEBBC740)

Perform k-means clustering in ALR space and return cluster centers in the original space.

Parameters:
  • data (np.ndarray) – Input data, shape (n_samples, n_features).

  • nsig (int) – Number of clusters.

  • rng (np.random.Generator, optional) – Random number generator (default: np.random.default_rng()).

Returns:

Cluster centers in the original space, shape (nsig, n_features).

Return type:

np.ndarray

lap_B(data, Bs)

Compute the log average posterior (LAP) for a dataset and a set of probability matrices.

Parameters:
  • data (np.ndarray) – Observed counts, shape (n_samples, n_categories).

  • Bs (np.ndarray) – Array of probability matrices, shape (n_draws, n_samples, n_categories).

Returns:

Log average posterior value.

Return type:

float

load_checkpoint(fn)
load_dataset(dataset_sel, counts_fp=None, annotation_fp=None, annotation_subset=None, seed=None, data_seed=None, sig_defs_fp=None, sim_S=None, sim_N=None, sim_I=None, sim_tau_hyperprior=None, sim_J=None, sim_K=None, sim_alpha_bias=None, sim_psi_bias=None, sim_gamma_bias=None, sim_beta_bias=None)
load_datasets(dataset_args)
marginalize_for_eta(sigs, normalize=True)

Compute misrepair signatures (eta) by marginalizing over trinucleotide context classes.

Parameters:
  • sigs (np.ndarray) – Signature matrix, shape (n_signatures, 96).

  • normalize (bool, optional) – Whether to normalize the output so each row sums to 1 (default: True).

Returns:

Misrepair signatures, shape (n_signatures, 6).

Return type:

np.ndarray

marginalize_for_phi(sigs)

Compute damage signatures (phi) by marginalizing over misrepair classes.

Parameters:

sigs (np.ndarray) – Signature matrix, shape (n_signatures, 96).

Returns:

Damage signatures, shape (n_signatures, 32).

Return type:

np.ndarray

mult_ll(x, p)

Compute the multinomial log-likelihood for observed counts and probabilities.

Parameters:
  • x (np.ndarray) – Observed counts, shape (n_samples, n_categories).

  • p (np.ndarray) – Probabilities, shape (n_samples, n_categories).

Returns:

Log-likelihood values for each sample.

Return type:

np.ndarray

save_checkpoint(fp, model, trace, dataset_args, model_args, pymc3_args, run_id)