damuta.models module
- class HierarchicalTandemLda(dataset: DataSet, type_col: str, n_damage_sigs=None, n_misrepair_sigs=None, alpha_bias=0.1, psi_bias=0.01, beta_bias=0.1, phi_obs=None, etaC_obs=None, etaT_obs=None, opt_method='ADVI', init_strategy='kmeans', init_signatures=None, seed=2021)
Bases:
TandemLdaBayesian inference of mutational signatures and their activities using a Hierarchical Tandem LDA model.
This class fits COSMIC-style mutational signatures using a Hierarchical Tandem LDA model, where damage signatures and misrepair signatures have separate sets of activities. A tissue-type hierarchical prior is fitted over damage-misrepair signature associations to improve the interpretability of misrepair activity specificities.
- Parameters:
dataset (DataSet) – Dataset containing mutation counts and sample annotations for fitting.
n_damage_sigs (int) – Number of damage signatures to fit.
n_misrepair_sigs (int) – Number of misrepair signatures to fit.
type_col (str) – Name of the annotation column containing the tissue type of each sample.
alpha_bias (float or numpy.ndarray of shape (32,)) – Dirichlet concentration parameter for damage signature trinucleotide context priors.
psi_bias (float or numpy.ndarray of shape (n_damage_sigs,)) – Dirichlet concentration parameter for damage signature activity priors.
beta_bias (float or numpy.ndarray of shape (6,)) – Dirichlet concentration parameter for misrepair signature substitution type priors.
gamma_bias (float or numpy.ndarray of shape (n_misrepair_sigs,)) – Dirichlet concentration parameter for misrepair signature activity priors.
opt_method (str) – Optimization method: “ADVI” for mean-field inference or “FullRankADVI” for full-rank inference.
seed (int) – Random seed for reproducibility.
- model
PyMC3 model instance.
- Type:
pymc3.Model
- model_kwargs
Dictionary of parameters used for constructing the model.
- Type:
dict
- approx
PyMC3 approximation object created via self.fit().
- Type:
pymc3.approximations.Approximation
- run_id
Unique identifier for the current run, used for saving checkpoint files.
- Type:
str
- get_estimated_W(n_draws=1)
Extract signature activity tensor from model posterior.
- Parameters:
n_draws (int, default=1) – Number of posterior samples to draw.
- Returns:
4D array of signature activities with shape (n_draws, n_samples, n_damage_sigs, n_misrepair_sigs).
- Return type:
np.ndarray
- class Lda(dataset: DataSet, n_sigs=None, alpha_bias=0.1, psi_bias=0.01, tau_obs=None, opt_method='ADVI', init_strategy='kmeans', init_signatures=None, seed=2021)
Bases:
ModelBayesian inference of mutational signatures and their activities.
Fit COSMIC-style mutational signatures using a Latent Dirichlet Allocation (LDA) model.
- Parameters:
dataset (DataSet) – Data object containing mutation counts for fitting.
n_sigs (int) – Number of signatures to infer.
alpha_bias (float or numpy.ndarray of shape (96,), default=0.1) – Dirichlet concentration parameter for signature prior. Controls the sparsity of inferred signatures.
psi_bias (float or numpy.ndarray of shape (n_sigs,), default=0.01) – Dirichlet concentration parameter for signature activity prior. Controls the sparsity of signature activities.
tau_obs (numpy.ndarray, optional) – Observed signatures to include in the model.
opt_method ({'ADVI', 'FullRankADVI'}, default='ADVI') – Optimization method for variational inference. ‘ADVI’ for mean-field, ‘FullRankADVI’ for full-rank.
init_strategy ({'uniform', 'kmeans', 'from_sigs'}, default='uniform') – Strategy for initializing signatures.
init_signatures (SignatureSet, optional) – Pre-defined signatures for initialization when init_strategy is ‘from_sigs’.
seed (int, default=2021) – Random seed for reproducibility.
- model
PyMC3 model instance.
- Type:
pymc3.Model
- model_kwargs
Dictionary of parameters used in model construction.
- Type:
dict
- approx
Variational approximation object. Created after calling self.fit().
- Type:
pymc3.approximations.Approximation
- run_id
Unique identifier for the current run. Used for checkpoint files and wandb logging.
- Type:
str
- n_sigs
Number of signatures being fit.
- Type:
int
- fit(n_iter=30000, \*\*kwargs)
Fit the model to the data using variational inference.
- sample_posterior(n_samples=1000)
Sample from the fitted posterior distribution.
- get_signatures()
Extract inferred signatures from the fitted model.
- get_activities()
Extract inferred signature activities from the fitted model.
- get_estimated_SignatureSet(n_draws=1)
Construct SignatureSet from posterior
- get_estimated_activities_DataFrame(n_draws=1)
Extract activities as DataFrame
- get_estimated_signatures(n_draws=1)
Extract signatures from model posterior
- class TandemLda(dataset: DataSet, n_damage_sigs=None, n_misrepair_sigs=None, alpha_bias=0.1, psi_bias=0.01, beta_bias=0.1, gamma_bias=0.01, phi_obs=None, etaC_obs=None, etaT_obs=None, opt_method='ADVI', init_strategy='kmeans', init_signatures=None, seed=2021)
Bases:
ModelBayesian inference of mutational signatures and their activities using a Tandem LDA model.
This class fits COSMIC-style mutational signatures using a Tandem Latent Dirichlet Allocation (LDA) model, where damage signatures and misrepair signatures each have their own set of activities.
- Parameters:
dataset (DataSet) – Data for fitting the model.
n_damage_sigs (int) – Number of damage signatures to fit.
n_misrepair_sigs (int) – Number of misrepair signatures to fit.
alpha_bias (float or numpy.ndarray of shape (32,)) – Dirichlet concentration parameter on (0, inf) for damage signatures. Determines the prior probability of trinucleotide context types appearing in inferred damage signatures.
psi_bias (float or numpy.ndarray of shape (n_damage_sigs,)) – Dirichlet concentration parameter on (0, inf) for damage signature activities. Determines the prior probability of each damage signature activity.
beta_bias (float or numpy.ndarray of shape (6,)) – Dirichlet concentration parameter on (0, inf) for misrepair signatures. Determines the prior probability of substitution types appearing in inferred misrepair signatures.
gamma_bias (float or numpy.ndarray of shape (n_misrepair_sigs,)) – Dirichlet concentration parameter on (0, inf) for misrepair signature activities. Determines the prior probability of each misrepair signature activity.
opt_method (str) – Optimization method for variational inference. Either “ADVI” for mean-field inference or “FullRankADVI” for full-rank inference.
seed (int) – Random seed for reproducibility.
- model
PyMC3 model instance.
- Type:
pymc3.Model
- model_kwargs
Dictionary of parameters passed when constructing the model (e.g., hyperprior values).
- Type:
dict
- approx
PyMC3 approximation object created via self.fit().
- Type:
pymc3.approximations.Approximation
- run_id
Unique identifier for the current run, used for saving checkpoint files and in wandb if enabled.
- Type:
str
- get_estimated_SignatureSet(n_draws=1)
Construct a SignatureSet object from model posterior samples.
- Parameters:
n_draws (int, default=1) – Number of posterior samples to draw. If >1, samples are averaged in the signature space
- Returns:
SignatureSet constructed from damage and misrepair signatures.
- Return type:
- get_estimated_W(n_draws=1)
Extract signature activity tensor from model posterior.
- Parameters:
n_draws (int, default=1) – Number of posterior samples to draw.
- Returns:
4D array of signature activities with shape (n_draws, n_samples, n_damage_sigs, n_misrepair_sigs).
- Return type:
np.ndarray
- get_estimated_activities(n_draws=1)
- get_estimated_activities_DataFrame(n_draws=1)
Extract damage and misrepair signature activities as DataFrames.
- Parameters:
n_draws (int, default=1) – Number of posterior samples to draw. If >1, samples are averaged in the activity space.
- Returns:
Tuple of (theta, gamma) DataFrames where theta contains damage signature activities and gamma contains misrepair signature activities.
- Return type:
tuple
- get_estimated_connections_DataFrame(n_draws=1)
Extract damage-misrepair signature connections as DataFrame.
- Parameters:
n_draws (int, default=1) – Number of posterior samples to draw. If >1, samples are averaged in the connection space.
- Returns:
DataFrame with damage-misrepair signature combinations as columns, with column names like ‘D1_M1’, ‘D1_M2’, etc.
- Return type:
pd.DataFrame
- get_estimated_signatures(n_draws=1)
Extract damage and misrepair signatures from model posterior.
- Parameters:
n_draws (int, default=1) – Number of posterior samples to draw.
- Returns:
Tuple containing (phi, eta) where phi is damage signatures array and eta is misrepair signatures array reshaped to (n_draws, n_sigs, 6).
- Return type:
tuple