damuta.utils module
- alp_B(data, B)
Compute the sum of multinomial log-likelihoods for a dataset and probability matrix.
- Parameters:
data (np.ndarray) – Observed counts, shape (n_samples, n_categories).
B (np.ndarray) – Probability matrix, shape (n_samples, n_categories).
- Returns:
Total log-likelihood for the dataset.
- Return type:
float
- alr(x, e=1e-12)
Compute the additive log-ratio (ALR) transformation for compositional data.
- Parameters:
x (np.ndarray) – Input array, shape (n, d).
e (float, optional) – Small value added for numerical stability (default: 1e-12).
- Returns:
ALR-transformed array, shape (n, d-1).
- Return type:
np.ndarray
- alr_inv(y)
Inverse additive log-ratio (ALR) transformation.
- Parameters:
y (np.ndarray) – ALR-transformed array, shape (n, d-1).
- Returns:
Reconstructed compositional data, shape (n, d).
- Return type:
np.ndarray
- dirichlet(node_name, a, shape, scale=1, testval=None, observed=None)
Create a reparameterized Dirichlet distribution using Gamma variables for use in PyMC3 models.
- Parameters:
node_name (str) – Name for the node in the model.
a (array-like) – Concentration parameters for the Dirichlet distribution.
shape (tuple) – Shape of the resulting variable.
scale (float, optional) – Scale parameter for the Gamma distribution (default: 1).
testval (array-like, optional) – Test value for the variable.
observed (array-like, optional) – Observed values for the variable.
- Returns:
A deterministic node representing the Dirichlet variable.
- Return type:
pm.Deterministic
- flatten_eta(eta)
Flatten a 3D eta array (p, k, m) to 2D (k, c) for compatibility.
- Parameters:
eta (np.ndarray) – Misrepair signature array, shape (p, k, m).
- Returns:
Flattened array, shape (k, 6).
- Return type:
np.ndarray
- get_tau(phi, eta)
Compute the full 96-channel signature matrix from damage (phi) and misrepair (eta) signatures.
- Parameters:
phi (np.ndarray) – Damage signatures, shape (n_damage, 32).
eta (np.ndarray) – Misrepair signatures, shape (n_misrepair, 2, 3).
- Returns:
Combined signature matrix, shape (n_damage * n_misrepair, 96).
- Return type:
np.ndarray
- kmeans_alr(data, nsig, rng=Generator(PCG64) at 0x7D60AEBBC740)
Perform k-means clustering in ALR space and return cluster centers in the original space.
- Parameters:
data (np.ndarray) – Input data, shape (n_samples, n_features).
nsig (int) – Number of clusters.
rng (np.random.Generator, optional) – Random number generator (default: np.random.default_rng()).
- Returns:
Cluster centers in the original space, shape (nsig, n_features).
- Return type:
np.ndarray
- lap_B(data, Bs)
Compute the log average posterior (LAP) for a dataset and a set of probability matrices.
- Parameters:
data (np.ndarray) – Observed counts, shape (n_samples, n_categories).
Bs (np.ndarray) – Array of probability matrices, shape (n_draws, n_samples, n_categories).
- Returns:
Log average posterior value.
- Return type:
float
- load_checkpoint(fn)
- load_dataset(dataset_sel, counts_fp=None, annotation_fp=None, annotation_subset=None, seed=None, data_seed=None, sig_defs_fp=None, sim_S=None, sim_N=None, sim_I=None, sim_tau_hyperprior=None, sim_J=None, sim_K=None, sim_alpha_bias=None, sim_psi_bias=None, sim_gamma_bias=None, sim_beta_bias=None)
- load_datasets(dataset_args)
- marginalize_for_eta(sigs, normalize=True)
Compute misrepair signatures (eta) by marginalizing over trinucleotide context classes.
- Parameters:
sigs (np.ndarray) – Signature matrix, shape (n_signatures, 96).
normalize (bool, optional) – Whether to normalize the output so each row sums to 1 (default: True).
- Returns:
Misrepair signatures, shape (n_signatures, 6).
- Return type:
np.ndarray
- marginalize_for_phi(sigs)
Compute damage signatures (phi) by marginalizing over misrepair classes.
- Parameters:
sigs (np.ndarray) – Signature matrix, shape (n_signatures, 96).
- Returns:
Damage signatures, shape (n_signatures, 32).
- Return type:
np.ndarray
- mult_ll(x, p)
Compute the multinomial log-likelihood for observed counts and probabilities.
- Parameters:
x (np.ndarray) – Observed counts, shape (n_samples, n_categories).
p (np.ndarray) – Probabilities, shape (n_samples, n_categories).
- Returns:
Log-likelihood values for each sample.
- Return type:
np.ndarray
- save_checkpoint(fp, model, trace, dataset_args, model_args, pymc3_args, run_id)