# genipe.formats package¶

## genipe.formats.impute2 module¶

genipe.formats.impute2.matrix_from_line(impute2_line)[source]

Generates the probability matrix from an IMPUTE2 line.

Parameters: impute2_line (list) – a single line from IMPUTE2’s result (split by space) a tuple containing the marker’s information (first five values of the line) and the matrix probability (numpy array, float) tuple

The shape of the matrix is n x 3 where n is the number of samples. The columns represent the probability for AA, AB and BB.

Note

The impute2_line variable is a list of str, corresponding to a line from the IMPUTE2’s result, split by space.

genipe.formats.impute2.get_good_probs(prob_matrix, min_prob=0.9)[source]

Gathers good imputed genotypes (>= probability threshold).

Parameters: prob_matrix (numpy.array) – the probability matrix min_prob (float) – the probability threshold a mask array containing the positions where the probabilities are equal or higher to the threshold numpy.array
genipe.formats.impute2.maf_from_probs(prob_matrix, a1, a2, gender=None, site_name=None)[source]

Computes MAF from a probability matrix (and gender if chromosome X).

Parameters: prob_matrix (numpy.array) – the probability matrix a1 (str) – the first allele a2 (str) – the second allele gender (numpy.array) – the gender of the samples site_name (str) – the name for this site a tuple containing three values: the minor allele frequency, the minor and the major allele. tuple

When ‘gender’ is not None, we assume that the MAF on chromosome X is required (hence, males count as 1, and females as 2 alleles). There is also an Exception raised if there are any heterozygous males.

genipe.formats.impute2.dosage_from_probs(homo_probs, hetero_probs, scale=2)[source]

Computes dosage from probability matrix (for the minor allele).

Parameters: homo_probs (numpy.array) – the probabilities for the homozygous genotype hetero_probs (numpy.array) – the probabilities for the heterozygous genotype scale (int) – the scale value the dosage computed from the probabilities numpy.array
genipe.formats.impute2.hard_calls_from_probs(a1, a2, probs)[source]

Computes hard calls from probability matrix.

Parameters: a1 (str) – the first allele a2 (str) – the second allele probs (numpy.array) – the probability matrix the hard calls computed from the probabilities numpy.array
genipe.formats.impute2.maf_dosage_from_probs(prob_matrix, a1, a2, scale=2, gender=None, site_name=None)[source]

Computes MAF and dosage vector from probs matrix.

Parameters: prob_matrix (numpy.array) – the probability matrix a1 (str) – the first allele a2 (str) – the second allele scale (int) – the scale value gender (numpy.array) – the gender of the samples site_name (str) – the name for this site a tuple containing four values: the dosage vector, the minor allele frequency, the minor and the major allele. tuple

When ‘gender’ is not None, we assume that the MAF on chromosome X is required (hence, males count as 1, and females as 2 alleles). There is also an Exception raised if there are any heterozygous males.

genipe.formats.impute2.additive_from_probs(a1, a2, probs)[source]

Compute additive format from probability matrix.

Parameters: a1 (str) – the a1 allele a2 (str) – the a2 allele probs (numpy.array) – the probability matrix the additive format computed from the probabilities, the minor and major allele. tuple

The encoding is as follow: 0 when homozygous major allele, 1 when heterozygous and 2 when homozygous minor allele.

The minor and major alleles are inferred by looking at the MAF. By default, we think a2 is the minor allele, but flip if required.

## genipe.formats.index module¶

genipe.formats.index.get_index(fn, cols, names, sep)[source]

Restores the index for a given file.

Parameters: fn (str) – the name of the file cols (list) – a list containing column to keep (as int) names (list) – the name corresponding to the column to keep (as str) sep (str) – the field separator the index pandas.DataFrame

If the index doesn’t exist for the file, it is first created.

genipe.formats.index.get_open_func(fn, return_fmt=False)[source]

Get the opening function.

Parameters: fn (str) – the name of the file return_fmt (bool) – if the file format needs to be returned either a tuple containing two elements: a boolean telling if the format is bgzip, and the opening function. tuple