damuta.sim module

encode_counts(counts)

Encode mutation counts into a format suitable for topic modeling.

This function takes a DataFrame of mutation counts and encodes it into two lists of indices: one for the 32 mutation types and another for the 6 possible base changes.

Parameters:

counts (pandas.DataFrame) – A DataFrame where each row represents a sample and each column represents a mutation type (96 mutation types in total).

Returns:

A tuple containing two elements:

  1. list of lists: Each inner list contains indices (0-31) representing the 32 mutation types for each mutation in each sample.

  2. list of lists: Each inner list contains indices (0-5) representing the 6 possible base changes for each mutation in each sample.

Return type:

tuple

Notes

The encoding is based on the 96 mutation types, which are converted into 32 mutation types (context) and 6 base changes. This encoding is useful for topic modeling approaches in mutation signature analysis.

sim_from_sigs(tau, tau_hyperprior, S, N, I=None, seed=None)

Simulate mutation data from predefined signatures.

This function generates simulated mutation data based on given signatures and hyperparameters. It uses a Dirichlet process to generate sample-specific activities and then creates mutation counts for each sample.

Parameters:

taupandas.DataFrame

Predefined signatures in COSMIC format.

tau_hyperpriorfloat

Concentration parameter for the Dirichlet prior on signature activities.

Sint

Number of samples to simulate.

Nint

Number of mutations per sample.

Iint, optional

Number of signatures to use. If None, all signatures in tau are used.

seedint, optional

Random seed for reproducibility.

Returns:

datapandas.DataFrame

Simulated mutation data. Each row represents a sample, and each column represents a mutation type.

sim_paramsdict

Dictionary containing simulation parameters: - ‘tau’: The signatures used for simulation. - ‘tau_activities’: The generated sample-specific activities.

sim_parametric(n_damage_sigs, n_misrepair_sigs, S, N, alpha_bias=0.9, psi_bias=0.1, gamma_bias=0.1, beta_bias=0.9, seed=1333)

Simulate data using a parametric model with damage and misrepair signatures.

This function generates simulated mutation data based on a model with separate damage and misrepair processes. It creates random distributions for damage signatures (phi), sample-specific activities (theta), misrepair signatures (eta), and their interactions (A).

Parameters:

n_damage_sigsint

Number of damage signatures to simulate.

n_misrepair_sigsint

Number of misrepair signatures to simulate.

Sint

Number of samples to simulate.

Nint

Number of mutations per sample.

alpha_biasfloat, optional

Concentration parameter for the Dirichlet prior on damage signatures (default: 0.9).

psi_biasfloat, optional

Concentration parameter for the Dirichlet prior on sample-specific activities (default: 0.1).

gamma_biasfloat, optional

Concentration parameter for the Dirichlet prior on misrepair signature activities (default: 0.1).

beta_biasfloat, optional

Concentration parameter for the Dirichlet prior on misrepair signatures (default: 0.9).

seedint, optional

Random seed for reproducibility (default: 1333).

Returns:

tuple

A tuple containing two elements: 1. pandas.DataFrame: Simulated mutation counts for each sample and mutation type. 2. dict: Dictionary containing the generated model parameters and intermediate results.