gmmd package
Submodules
gmmd.GMM_Demux module
- gmmd.GMM_Demux.main()
- gmmd.GMM_Demux.warn(*args, **kwargs)
gmmd.classifier module
- gmmd.classifier.classify_drops(base_bv_array, high_array, low_array, data)
Calculate confidence of all GEM cells.
- Parameters
base_bv_array (
list) – Binary array with all combinations of cells, generated bygmmd.compute.obtain_base_bv_array().high_array (
list) – Post probabilities of the higher mean Gaussian distribution, generated bygmmd.classifier.obtain_arrays().low_array (
list) – Post probabilities of the lower mean Gaussian distribution, generated bygmmd.classifier.obtain_arrays().data (
pandas.DataFrame) – CLR-transformed HTO matrix.
- Returns
Classified results and confidence, all class names.
- Return type
pandas.DataFrame,list
- gmmd.classifier.count_bad_droplets(data, confidence_threshold)
Count empty droplets with or confidence under the threshold.
- Parameters
data (
pandas.DataFrame) – Classification result, generated bygmmd.classifier.classify_drops().confidence_threshold (
float) – Confidence threshold.
- Returns
Counts of negative droplets, counts of unclear droplets.
- Return type
list,list
- gmmd.classifier.count_by_class(data, base_bv_array)
Obtain list of count by class.
- Parameters
data (
pandas.DataFrame) – Purified classification result, generated bygmmd.classifier.purify_droplets().base_bv_array (
list) – Binary array with all combinations of cells, generated bygmmd.compute.obtain_base_bv_array().
- Returns
List of count by class.
- Return type
list
- gmmd.classifier.get_SSD_count_ary(data, SSD_idx, sample_num)
Obtain SSD count list.
- Parameters
data (
pandas.DataFrame) – Purified classification result, generated bygmmd.classifier.purify_droplets().SSD_idx (
list) – SSD list, generated bygmmd.classifier.obtain_SSD_list().sample_num (
int) – Number of samples.
- Returns
List of SSD count.
- Return type
list
- gmmd.classifier.obtain_MSM_list(data, sample_num, idx_list=None)
Return GEM barcodes that are MSMs.
- Parameters
data (
pandas.DataFrame) – Simplified classification result, generated bygmmd.classifier.store_simplified_classify_result().sample_num (
int) – Number of HTO samples.idx_list (
list, Default =None) – List of index to extract.
- Return type
list
- gmmd.classifier.obtain_SSD_list(data, sample_num, class_id_ary=None)
Return GEM barcodes that are SSDs.
- Parameters
data (
pandas.DataFrame) – Purified classification result, generated bygmmd.classifier.purify_droplets().sample_num (
int) – Number of HTO samples.class_id_ary (
list, Default =None) – To extract specific class, provide an id array.
- Return type
list- Example
>>> obtain_SSD_list(data, 4, [5, 11])
Returns SSDs that contain cluster id 5 or 11.
e.g. HTO samples are
HTO_1, HTO_2, HTO_3, HTO_4, then cluster id5will beHTO_1+HTO_2, and11will beHTO_1+HTO_2+HTO_3.
- gmmd.classifier.obtain_arrays(data, path=None)
Obtain post probabilities of the high and low Gaussian distribution.
- Parameters
data (
pandas.DataFrame) – CLR-transformed HTO matrix.- Returns
Post probabilities of higher mean, post probabilities of lower mean.
- Return type
list,list
- gmmd.classifier.purify_droplets(data, confidence_threshold)
Remove empty droplets or with confidence under the threshold.
- Parameters
data (
pandas.DataFrame) – Classification result, generated bygmmd.classifier.classify_drops().confidence_threshold (
float) – Confidence threshold.
- Returns
Purified classification result.
- Return type
pandas.DataFrame
- gmmd.classifier.read_full_classify_result(path)
Read classification result from
path.- Parameters
path (
String) – Path toGMM_full.csvandGMM_full.config.- Returns
Classification result, number of samples, class names, sample names.
- Return type
pandas.DataFrame,list,list,list
- gmmd.classifier.store_full_classify_result(data, class_name_array, confidence_threshold, path)
Store the full classification result in
{path}/GMM_full.csv. The result will contain all cluster ids, and the corresponding names can be found in{path}/GMM_full.config.- Parameters
data (
pandas.DataFrame) – Classification result, generated bygmmd.classifier.classify_drops().class_name_array (
list) – All class names, generated bygmmd.classifier.classify_drops().path (
String) – File path to store the result.
- gmmd.classifier.store_simplified_classify_result(data, class_name_array, path, sample_num, confidence_threshold)
Store the simplified classification result in
{path}/GMM_simplified.csv. The result will combine MSM classifications and mark those with confidence under the threshold as unclear. The corresponding cluster names can be found in{path}/GMM_simplified.config.- Parameters
data (
pandas.DataFrame) – Classification result, generated bygmmd.classifier.classify_drops().class_name_array (
list) – All class names, generated bygmmd.classifier.classify_drops().path (
String) – File path to store the result.sample_num (
int) – Number of the HTO samples.confidence_threshold (
float) – Confidence threshold.
- Returns
Simplified classification result.
- Return type
pandas.DataFrame
gmmd.compute module
- gmmd.compute.check_set_bit(bv, bit_pos)
Return true if the
bit_posth bit ofbvis1.- Parameters
bv (
BitVector.BitVector) – Bit vector to be checked.bit_pos (
int) – The position of the bit.
- Return type
bool
- gmmd.compute.compute_scaler(params)
Compute scaler for parameters.
- Parameters
params (
list) – List of parameters.- Returns
List of scaler.
- Return type
list
- gmmd.compute.experiment_params_wrapper(params, HTO_GEM_ary, sample_num, scaler, base_bv_array, operator)
- gmmd.compute.gather_multiplet_rates(venn_values, SSM_rate_ary, sample_num)
Compute MSM/SSM/Singlet rates.
- Parameters
venn_values (
list) – List of count by class, generated bygmmd.classifier.count_by_class().SSM_rate_ary (
list) – List of SSM rates, generated fromgmmd.estimator.compute_SSM_rate_with_cell_num().sample_num (
int) – Number of samples.
- Returns
MSM_rate, SSM_rate, singlet_rate.
- Return type
float,float,float
- gmmd.compute.init_mask(sample_num)
Return an empty
BitVector.BitVectorobject ofsample_numsize.- Parameters
sample_num (
int) – Bit size.- Return type
BitVector.BitVector
- gmmd.compute.obtain_HTO_GEM_num(data, base_bv_array)
Find HTO numbers from the given data.
- Parameters
data (
pandas.DataFrame) – Purified classification result, generated bygmmd.classifier.purify_droplets().base_bv_array (
list) – Binary array with all combinations of cells, generated bygmmd.compute.obtain_base_bv_array().
- Returns
A list of HTO numbers for each cell type.
- Return type
list
- gmmd.compute.obtain_HTO_cell_n_drop_num(data_df, base_bv_array, sample_num, estimated_total_cell_num, confidence_threshold)
- gmmd.compute.obtain_base_bv_array(sample_num)
Returns the binary array representing all combinations of cells.
- Parameters
sample_num (
int) – Number of HTO samples.- Returns
List containing all combinations, each a
BitVector.BitVectorelement.- Return type
list- Example
>>> obtain_base_bv_array(3) [000, 100, 010, 001, 110, 101, 011, 111]
A
1in the i th position of the element means presence of the i th HTO sample, and0means absence.
- gmmd.compute.obtain_experiment_params(base_bv_array, HTO_GEM_ary, sample_num, estimated_total_cell_num, params0=None)
Get parameters for estimation.
- Parameters
base_bv_array (
list) – Binary array with all combinations of cells, generated bygmmd.compute.obtain_base_bv_array().HTO_GEM_ary (
list) – HTO nmumbers array, generated bygmmd.compute.obtain_HTO_GEM_num().sample_num (
int) – Number of samples.estimated_total_cell_num (
int) – Estimated total cell number.params0 (
list, Default =[{drop_num=}80000, {capture_rate=}0.5, {cell_num_ary=}[{estimated_total_cell_num / sample_num}] * {sample_num}) – List of parameters.
- Returns
List of parameters.
- Return type
list
- gmmd.compute.param_scaling(params, scaler, operator)
Get parameters for estimation.
- Parameters
params (
list) – List of parameters.scaler (
list) – List of scaler.operator (
func) – A function to operate on the parameters.
- Returns
List of scaled parameters.
- Return type
list
- gmmd.compute.set_bit(bv, bit_pos)
Set the
bit_posth bit ofbvas1.- Parameters
bv (
BitVector.BitVector) – Bit vector to be set.bit_pos (
int) – The position of the bit.
- Return type
BitVector.BitVector
gmmd.estimator module
- gmmd.estimator.cell_num_estimator(a_num, captured_drop_num, capture_rate)
- gmmd.estimator.compute_GEM_prob(drop_num, cell_num)
- gmmd.estimator.compute_SSD_num(drop_num, subject_cell_num, total_cell_num, ambiguous_rate=0)
Compute SSM rate with cell numbers.
- Parameters
drop_num (
int) – Number of droplets.subject_cell_num (
int) – Number of subject cells.total_cell_num (
int) – Number of all cells.ambiguous_rate (
float, Default =0) – Ambiguous rate.
- Returns
SSD number.
- Return type
int
- gmmd.estimator.compute_SSM_rate_with_cell_num(cell_num, drop_num)
Compute SSM rate with cell numbers.
- Parameters
cell_num (
float) – Number of cells.drop_num (
int) – Number of droplets.
- Returns
Rate of SSM.
- Return type
float
- gmmd.estimator.compute_mix_rate(drop_num, cell_num)
- gmmd.estimator.compute_multiplet_rates_asymp(cell_num, sample_num, drop_num)
- gmmd.estimator.compute_observation_probability(drop_num, capture_rate, cell_num_ary, HTO_GEM_ary, base_bv_array, sample_num)
- gmmd.estimator.compute_relative_SSM_rate(SSM_rate, singlet_rate)
Compute relative SSM rate over singlet rate.
- Parameters
SSM_rate (
float) – SSM rate.singlet_rate (
float) – Singlet rate.
- Returns
Relative SSM rate.
- Return type
float
- gmmd.estimator.compute_relative_SSM_rate_asymp(cell_num, drop_num)
- gmmd.estimator.debug_compute_doublet_num(drop_num, type_a_num, type_b_num)
- gmmd.estimator.debug_get_cell_num(drop_num, GEM_num, capture_rate)
- gmmd.estimator.debug_pure_cluster_MSM_rate(drop_num, tau_cell_num, sample_num_ary, capture_rate, ambiguous_rate=0)
- gmmd.estimator.drop_num_estimator(a_num, b_num, shared_num)
- gmmd.estimator.estimator(GMM_full_df, purified_df, sample_num, base_bv_array, confidence_threshold, estimated_total_cell_num, SSD_idx, sample_names, examine_cell_path=None, ambiguous_rate=0.05, class_name_ary=None)
- gmmd.estimator.examine_cluster_type(ambiguous_rate, sample_num, drop_num, capture_rate, rounded_cell_num_ary, purified_df, class_name_ary, confidence_threshold, cell_list_path)
- gmmd.estimator.get_min_hto_num(cell_num, drop_num, SSM_threshold, sample_num=1)
- gmmd.estimator.get_tau_cell_num(drop_num, total_cell_num, cluster_GEM_num, ambiguous_rate=0.0)
Get tau cell number.
- Parameters
drop_num (
int) – Number of droplets.total_cell_num (
int) – Number of all cells.cluster_GEM_num (
int) – Number of GEMs.ambiguous_rate (
float, Default =0) – Ambiguous rate.
- Returns
Tau cell number.
- Return type
int
- gmmd.estimator.phony_cluster_MSM_rate(cell_num_ary, cell_type_num=2)
Estimate phony cluster MSM rate with cell numbers.
- Parameters
cell_num_ary (
list) – List of cell numbers.cell_type_num (
int, Default =2) – Number of MSMs (≥2).
- Returns
MSM rate.
- Return type
float
- gmmd.estimator.pure_cluster_MSM_rate(drop_num, cluster_GEM_num, cell_num_ary, capture_rate, ambiguous_rate=0)
Estimate pure cluster MSM rate.
- Parameters
drop_num (
int) – Number of droplets.cluster_GEM_num (
int) – Number of GEMs.cell_num_ary (
list) – List of cell numbers.capture_rate (
float) – Capture rate.ambiguous_rate (
float, Default =0) – Ambiguous rate.
- Returns
MSM rate.
- Return type
float
- gmmd.estimator.store_summary_result(path, full_report_df, sample_df, examine_result)
- gmmd.estimator.test_phony_hypothesis(cluster_MSM_num, cluster_GEM_num, cell_num_ary, capture_rate)
Test phony-type hypothesis.
- Parameters
cluster_MSM_num (
int) – Number of MSMs.cluster_GEM_num (
int) – Number of GEMs.cell_num_ary (
list) – List of cell numbers.capture_rate (
float) – Capture rate.
- Returns
P-value.
- Return type
float
- gmmd.estimator.test_pure_hypothesis(cluster_MSM_num, drop_num, cluster_GEM_num, cell_num_ary, capture_rate, ambiguous_rate=0)
Test pure-type hypothesis.
- Parameters
cluster_MSM_num (
int) – Number of MSMs.drop_num (
int) – Number of droplets.cluster_GEM_num (
int) – Number of GEMs.cell_num_ary (
list) – List of cell numbers.capture_rate (
float) – Capture rate.ambiguous_rate (
float, Default =0) – Ambiguous rate.
- Returns
P-value.
- Return type
float
gmmd.io module
- gmmd.io.clr_norm(data_df)
- gmmd.io.read_cellranger(path, hto_array)
- gmmd.io.read_csv(path, hto_array)
- gmmd.io.store_cellranger(data_df, SSD_idx, path)
gmmd.multi module
- gmmd.multi.compute_confidence(high_array, low_array, high_ary_idx, all_ary_idx)
- gmmd.multi.get_HTO_cell_idx(high_array, threshold)
- gmmd.multi.get_HTO_cell_num(high_array, threshold)