genipe.formats package¶
Module contents¶
Submodules¶
genipe.formats.impute2 module¶
-
genipe.formats.impute2.
additive_from_probs
(a1, a2, probs)[source]¶ Compute additive format from probability matrix.
- Parameters
- Returns
- the additive format computed from the probabilities, the minor
and major allele.
- Return type
The encoding is as follow: 0 when homozygous major allele, 1 when heterozygous and 2 when homozygous minor allele.
The minor and major alleles are inferred by looking at the MAF. By default, we think a2 is the minor allele, but flip if required.
-
genipe.formats.impute2.
dosage_from_probs
(homo_probs, hetero_probs, scale=2)[source]¶ Computes dosage from probability matrix (for the minor allele).
- Parameters
homo_probs (numpy.array) – the probabilities for the homozygous genotype
hetero_probs (numpy.array) – the probabilities for the heterozygous genotype
scale (int) – the scale value
- Returns
the dosage computed from the probabilities
- Return type
numpy.array
-
genipe.formats.impute2.
get_good_probs
(prob_matrix, min_prob=0.9)[source]¶ Gathers good imputed genotypes (>= probability threshold).
- Parameters
prob_matrix (numpy.array) – the probability matrix
min_prob (float) – the probability threshold
- Returns
- a mask array containing the positions where the
probabilities are equal or higher to the threshold
- Return type
numpy.array
-
genipe.formats.impute2.
hard_calls_from_probs
(a1, a2, probs)[source]¶ Computes hard calls from probability matrix.
-
genipe.formats.impute2.
maf_dosage_from_probs
(prob_matrix, a1, a2, scale=2, gender=None, site_name=None)[source]¶ Computes MAF and dosage vector from probs matrix.
- Parameters
- Returns
- a tuple containing four values: the dosage vector, the minor
allele frequency, the minor and the major allele.
- Return type
When ‘gender’ is not None, we assume that the MAF on chromosome X is required (hence, males count as 1, and females as 2 alleles). There is also an Exception raised if there are any heterozygous males.
-
genipe.formats.impute2.
maf_from_probs
(prob_matrix, a1, a2, gender=None, site_name=None)[source]¶ Computes MAF from a probability matrix (and gender if chromosome X).
- Parameters
- Returns
- a tuple containing three values: the minor allele frequency, the
minor and the major allele.
- Return type
When ‘gender’ is not None, we assume that the MAF on chromosome X is required (hence, males count as 1, and females as 2 alleles). There is also an Exception raised if there are any heterozygous males.
-
genipe.formats.impute2.
matrix_from_line
(impute2_line)[source]¶ Generates the probability matrix from an IMPUTE2 line.
- Parameters
impute2_line (list) – a single line from IMPUTE2’s result (split by space)
- Returns
- a tuple containing the marker’s information (first five values
of the line) and the matrix probability (numpy array, float)
- Return type
The shape of the matrix is n x 3 where n is the number of samples. The columns represent the probability for AA, AB and BB.
Note
The
impute2_line
variable is a list of str, corresponding to a line from the IMPUTE2’s result, split by space.
genipe.formats.index module¶
-
genipe.formats.index.
get_index
(fn, cols, names, sep)[source]¶ Restores the index for a given file.
- Parameters
- Returns
the index
- Return type
If the index doesn’t exist for the file, it is first created.