gmmd package

Submodules

gmmd.GMM_Demux module

gmmd.GMM_Demux.main()

gmmd.GMM_Demux.warn(*args, **kwargs)

gmmd.classifier module

gmmd.classifier.classify_drops(base_bv_array, high_array, low_array, data)

Calculate confidence of all GEM cells.

Parameters

base_bv_array (list) – Binary array with all combinations of cells, generated by gmmd.compute.obtain_base_bv_array().
high_array (list) – Post probabilities of the higher mean Gaussian distribution, generated by gmmd.classifier.obtain_arrays().
low_array (list) – Post probabilities of the lower mean Gaussian distribution, generated by gmmd.classifier.obtain_arrays().
data (pandas.DataFrame) – CLR-transformed HTO matrix.

Returns

Classified results and confidence, all class names.

Return type

pandas.DataFrame, list

gmmd.classifier.count_bad_droplets(data, confidence_threshold)

Count empty droplets with or confidence under the threshold.

Parameters

data (pandas.DataFrame) – Classification result, generated by gmmd.classifier.classify_drops().
confidence_threshold (float) – Confidence threshold.

Returns

Counts of negative droplets, counts of unclear droplets.

Return type

list, list

gmmd.classifier.count_by_class(data, base_bv_array)

Obtain list of count by class.

Parameters

data (pandas.DataFrame) – Purified classification result, generated by gmmd.classifier.purify_droplets().
base_bv_array (list) – Binary array with all combinations of cells, generated by gmmd.compute.obtain_base_bv_array().

Returns

List of count by class.

Return type

list

gmmd.classifier.get_SSD_count_ary(data, SSD_idx, sample_num)

Obtain SSD count list.

Parameters

data (pandas.DataFrame) – Purified classification result, generated by gmmd.classifier.purify_droplets().
SSD_idx (list) – SSD list, generated by gmmd.classifier.obtain_SSD_list().
sample_num (int) – Number of samples.

Returns

List of SSD count.

Return type

list

gmmd.classifier.obtain_MSM_list(data, sample_num, idx_list=None)

Return GEM barcodes that are MSMs.

Parameters

data (pandas.DataFrame) – Simplified classification result, generated by gmmd.classifier.store_simplified_classify_result().
sample_num (int) – Number of HTO samples.
idx_list (list, Default = None) – List of index to extract.

Return type

list

gmmd.classifier.obtain_SSD_list(data, sample_num, class_id_ary=None)

Return GEM barcodes that are SSDs.

Parameters

data (pandas.DataFrame) – Purified classification result, generated by gmmd.classifier.purify_droplets().
sample_num (int) – Number of HTO samples.
class_id_ary (list, Default = None) – To extract specific class, provide an id array.

Return type

list

Example

>>> obtain_SSD_list(data, 4, [5, 11])

Returns SSDs that contain cluster id 5 or 11.

e.g. HTO samples are HTO_1, HTO_2, HTO_3, HTO_4, then cluster id 5 will be HTO_1+HTO_2, and 11 will be HTO_1+HTO_2+HTO_3.

gmmd.classifier.obtain_arrays(data, path=None)

Obtain post probabilities of the high and low Gaussian distribution.

Parameters: data (pandas.DataFrame) – CLR-transformed HTO matrix.
Returns: Post probabilities of higher mean, post probabilities of lower mean.
Return type: list, list

gmmd.classifier.purify_droplets(data, confidence_threshold)

Remove empty droplets or with confidence under the threshold.

Parameters

data (pandas.DataFrame) – Classification result, generated by gmmd.classifier.classify_drops().
confidence_threshold (float) – Confidence threshold.

Returns

Purified classification result.

Return type

pandas.DataFrame

gmmd.classifier.read_full_classify_result(path)

Read classification result from path.

Parameters: path (String) – Path to GMM_full.csv and GMM_full.config.
Returns: Classification result, number of samples, class names, sample names.
Return type: pandas.DataFrame, list, list, list

gmmd.classifier.store_full_classify_result(data, class_name_array, confidence_threshold, path)

Store the full classification result in {path}/GMM_full.csv. The result will contain all cluster ids, and the corresponding names can be found in {path}/GMM_full.config.

Parameters

data (pandas.DataFrame) – Classification result, generated by gmmd.classifier.classify_drops().
class_name_array (list) – All class names, generated by gmmd.classifier.classify_drops().
path (String) – File path to store the result.

gmmd.classifier.store_simplified_classify_result(data, class_name_array, path, sample_num, confidence_threshold)

Store the simplified classification result in {path}/GMM_simplified.csv. The result will combine MSM classifications and mark those with confidence under the threshold as unclear. The corresponding cluster names can be found in {path}/GMM_simplified.config.

Parameters

data (pandas.DataFrame) – Classification result, generated by gmmd.classifier.classify_drops().
class_name_array (list) – All class names, generated by gmmd.classifier.classify_drops().
path (String) – File path to store the result.
sample_num (int) – Number of the HTO samples.
confidence_threshold (float) – Confidence threshold.

Returns

Simplified classification result.

Return type

pandas.DataFrame

gmmd.compute module

gmmd.compute.check_set_bit(bv, bit_pos)

Return true if the bit_pos th bit of bv is 1.

Parameters

bv (BitVector.BitVector) – Bit vector to be checked.
bit_pos (int) – The position of the bit.

Return type

bool

gmmd.compute.compute_scaler(params)

Compute scaler for parameters.

Parameters: params (list) – List of parameters.
Returns: List of scaler.
Return type: list

gmmd.compute.experiment_params_wrapper(params, HTO_GEM_ary, sample_num, scaler, base_bv_array, operator)

gmmd.compute.gather_multiplet_rates(venn_values, SSM_rate_ary, sample_num)

Compute MSM/SSM/Singlet rates.

Parameters

venn_values (list) – List of count by class, generated by gmmd.classifier.count_by_class().
SSM_rate_ary (list) – List of SSM rates, generated from gmmd.estimator.compute_SSM_rate_with_cell_num().
sample_num (int) – Number of samples.

Returns

MSM_rate, SSM_rate, singlet_rate.

Return type

float, float, float

gmmd.compute.init_mask(sample_num)

Return an empty BitVector.BitVector object of sample_num size.

Parameters: sample_num (int) – Bit size.
Return type: BitVector.BitVector

gmmd.compute.obtain_HTO_GEM_num(data, base_bv_array)

Find HTO numbers from the given data.

Parameters

data (pandas.DataFrame) – Purified classification result, generated by gmmd.classifier.purify_droplets().
base_bv_array (list) – Binary array with all combinations of cells, generated by gmmd.compute.obtain_base_bv_array().

Returns

A list of HTO numbers for each cell type.

Return type

list

gmmd.compute.obtain_HTO_cell_n_drop_num(data_df, base_bv_array, sample_num, estimated_total_cell_num, confidence_threshold)

gmmd.compute.obtain_base_bv_array(sample_num)

Returns the binary array representing all combinations of cells.

Parameters

sample_num (int) – Number of HTO samples.

Returns

List containing all combinations, each a BitVector.BitVector element.

Return type

list

Example

>>> obtain_base_bv_array(3)
[000, 100, 010, 001, 110, 101, 011, 111]

A 1 in the i th position of the element means presence of the i th HTO sample, and 0 means absence.

gmmd.compute.obtain_experiment_params(base_bv_array, HTO_GEM_ary, sample_num, estimated_total_cell_num, params0=None)

Get parameters for estimation.

Parameters

base_bv_array (list) – Binary array with all combinations of cells, generated by gmmd.compute.obtain_base_bv_array().
HTO_GEM_ary (list) – HTO nmumbers array, generated by gmmd.compute.obtain_HTO_GEM_num().
sample_num (int) – Number of samples.
estimated_total_cell_num (int) – Estimated total cell number.
params0 (list, Default = [{drop_num=}80000, {capture_rate=}0.5, {cell_num_ary=}[{estimated_total_cell_num / sample_num}] * {sample_num}) – List of parameters.

Returns

List of parameters.

Return type

list

gmmd.compute.param_scaling(params, scaler, operator)

Get parameters for estimation.

Parameters

params (list) – List of parameters.
scaler (list) – List of scaler.
operator (func) – A function to operate on the parameters.

Returns

List of scaled parameters.

Return type

list

gmmd.compute.set_bit(bv, bit_pos)

Set the bit_pos th bit of bv as 1.

Parameters

bv (BitVector.BitVector) – Bit vector to be set.
bit_pos (int) – The position of the bit.

Return type

BitVector.BitVector

gmmd.estimator module

gmmd.estimator.cell_num_estimator(a_num, captured_drop_num, capture_rate)

gmmd.estimator.compute_GEM_prob(drop_num, cell_num)

gmmd.estimator.compute_SSD_num(drop_num, subject_cell_num, total_cell_num, ambiguous_rate=0)

Compute SSM rate with cell numbers.

Parameters

drop_num (int) – Number of droplets.
subject_cell_num (int) – Number of subject cells.
total_cell_num (int) – Number of all cells.
ambiguous_rate (float, Default = 0) – Ambiguous rate.

Returns

SSD number.

Return type

int

gmmd.estimator.compute_SSM_rate_with_cell_num(cell_num, drop_num)

Compute SSM rate with cell numbers.

Parameters

cell_num (float) – Number of cells.
drop_num (int) – Number of droplets.

Returns

Rate of SSM.

Return type

float

gmmd.estimator.compute_mix_rate(drop_num, cell_num)

gmmd.estimator.compute_multiplet_rates_asymp(cell_num, sample_num, drop_num)

gmmd.estimator.compute_observation_probability(drop_num, capture_rate, cell_num_ary, HTO_GEM_ary, base_bv_array, sample_num)

gmmd.estimator.compute_relative_SSM_rate(SSM_rate, singlet_rate)

Compute relative SSM rate over singlet rate.

Parameters

SSM_rate (float) – SSM rate.
singlet_rate (float) – Singlet rate.

Returns

Relative SSM rate.

Return type

float

gmmd.estimator.compute_relative_SSM_rate_asymp(cell_num, drop_num)

gmmd.estimator.compute_shared_num(drop_num, A_num, B_num)

gmmd.estimator.debug_compute_doublet_num(drop_num, type_a_num, type_b_num)

gmmd.estimator.debug_get_cell_num(drop_num, GEM_num, capture_rate)

gmmd.estimator.debug_pure_cluster_MSM_rate(drop_num, tau_cell_num, sample_num_ary, capture_rate, ambiguous_rate=0)

gmmd.estimator.drop_num_estimator(a_num, b_num, shared_num)

gmmd.estimator.estimator(GMM_full_df, purified_df, sample_num, base_bv_array, confidence_threshold, estimated_total_cell_num, SSD_idx, sample_names, examine_cell_path=None, ambiguous_rate=0.05, class_name_ary=None)

gmmd.estimator.examine_cluster_type(ambiguous_rate, sample_num, drop_num, capture_rate, rounded_cell_num_ary, purified_df, class_name_ary, confidence_threshold, cell_list_path)

gmmd.estimator.get_min_hto_num(cell_num, drop_num, SSM_threshold, sample_num=1)

gmmd.estimator.get_tau_cell_num(drop_num, total_cell_num, cluster_GEM_num, ambiguous_rate=0.0)

Get tau cell number.

Parameters

drop_num (int) – Number of droplets.
total_cell_num (int) – Number of all cells.
cluster_GEM_num (int) – Number of GEMs.
ambiguous_rate (float, Default = 0) – Ambiguous rate.

Returns

Tau cell number.

Return type

int

gmmd.estimator.phony_cluster_MSM_rate(cell_num_ary, cell_type_num=2)

Estimate phony cluster MSM rate with cell numbers.

Parameters

cell_num_ary (list) – List of cell numbers.
cell_type_num (int, Default = 2) – Number of MSMs (≥2).

Returns

MSM rate.

Return type

float

gmmd.estimator.pure_cluster_MSM_rate(drop_num, cluster_GEM_num, cell_num_ary, capture_rate, ambiguous_rate=0)

Estimate pure cluster MSM rate.

Parameters

drop_num (int) – Number of droplets.
cluster_GEM_num (int) – Number of GEMs.
cell_num_ary (list) – List of cell numbers.
capture_rate (float) – Capture rate.
ambiguous_rate (float, Default = 0) – Ambiguous rate.

Returns

MSM rate.

Return type

float

gmmd.estimator.store_summary_result(path, full_report_df, sample_df, examine_result)

gmmd.estimator.test_phony_hypothesis(cluster_MSM_num, cluster_GEM_num, cell_num_ary, capture_rate)

Test phony-type hypothesis.

Parameters

cluster_MSM_num (int) – Number of MSMs.
cluster_GEM_num (int) – Number of GEMs.
cell_num_ary (list) – List of cell numbers.
capture_rate (float) – Capture rate.

Returns

P-value.

Return type

float

gmmd.estimator.test_pure_hypothesis(cluster_MSM_num, drop_num, cluster_GEM_num, cell_num_ary, capture_rate, ambiguous_rate=0)

Test pure-type hypothesis.

Parameters

cluster_MSM_num (int) – Number of MSMs.
drop_num (int) – Number of droplets.
cluster_GEM_num (int) – Number of GEMs.
cell_num_ary (list) – List of cell numbers.
capture_rate (float) – Capture rate.
ambiguous_rate (float, Default = 0) – Ambiguous rate.

Returns

P-value.

Return type

float

gmmd.io module

gmmd.io.clr_norm(data_df)

gmmd.io.read_cellranger(path, hto_array)

gmmd.io.read_csv(path, hto_array)

gmmd.io.store_cellranger(data_df, SSD_idx, path)

gmmd.multi module

gmmd.multi.compute_confidence(high_array, low_array, high_ary_idx, all_ary_idx)

gmmd.multi.get_HTO_cell_idx(high_array, threshold)

gmmd.multi.get_HTO_cell_num(high_array, threshold)

gmmd.multi.get_shared_cell_idx(high_array, low_array, high_ary_idx, all_ary_idx, threshold)

gmmd.multi.get_shared_cell_num(high_array, low_array, high_ary_idx, all_ary_idx, threshold)

gmmd package

Submodules

gmmd.GMM_Demux module

gmmd.classifier module

gmmd.compute module

gmmd.estimator module

gmmd.io module

gmmd.multi module

Module contents