Result Class

class pyqms.Results(lookup=None, params=None, fixed_labels=None, metabolic_labels=None, aa_compositions=None, isotopic_distributions=None, charges=None, verbose=False)

pyQms results class.

Holds all matching information and lookups. Can be accessed as a dictionary. Several lookup allow the mapping of moleclar formulas to molecules (e.g. peptides) and/or trivial names (e.g. protein names).

Structure

key (named tuple)

  • file_name
  • formula
  • charge
  • label_percentiles

value (named tuple)

  • spec_id
  • rt
  • score
  • scaling_factor
  • peaks
add(key, value)

Adds match to the result container.

Parameters:
  • key (named tuple) – ( file_name, formula, charge, label_percentiles )
  • value (named tuple) – (spec_id, rt, score, scaling_factor, peaks)
Returns
formatted_key (named tuple) : ( file_name, formula, charge, label_percentiles )

Structure

key (named tuple)

  • file_name
  • formula
  • charge
  • label_percentiles

value (named tuple)

  • spec_id
  • rt
  • score
  • scaling_factor
  • peaks
calc_amounts_from_rt_info_file(rt_info_file=None, rt_border_tolerance=None, calc_amount_function=None, evidence_score_field='PEP', buffer_only=False, buffered_csv_dicts=None)

Function to calculate molecule/peptide amounts based on the quant summary/rt info file genearte by write_rt_info_file(). See e.g. example script generate_quant_summary_file.py. A function to calculate the final molecule amounts can be defined otherwise the default maximum intensity function is used.

Parameters:
  • rt_info_file (str) – output file name of the quant summary/rt info csv file, must be be a complete or relative path
  • rt_border_tolerance (int) – retention time border tolerance in minutes
  • calc_amount_function (obj) – python function to calculate final amounts based on a simple dictionary structure

The function to calculate the amount of the molecules (calc_amount_function) should be able to process the below shown dictionary structure (obj_for_calc_amount). The default function returns the maximum amount in the retention time window or in the complete profile. The function should return the determined amount, the retention time (or approximate) as well as the score. If functions are used which deternine the amount over more than one spectrum retention times for this amount should be e.g. at the maximum intensity of the profile. Scores could be e.g. averaged or also the score at the maximum amount could be used.

Example for obj_for_calc_amount:

{
    'rt'       : [rt1,rt2,...],
    'i'        : [in1,in2,...],
    'scores'   : [sc1,sc2,...],
    'spec_ids' : [id1,idt2,...],
}

Example key names (default):

  • ‘max I in window’
  • ‘max I in window (rt)’
  • ‘max I in window (score)’
  • ‘auc in window’ (area under curve)
  • ‘sum I in window’ (summed up intensities)
curate_rt_windows(evidence_dict, rt_tolerance)

Internal function to curate RT windows

determine_max_itensity(obj_for_calc_amount)

Function to determine the maximum intensity in given elution profile. The structure of the object passed to the function is shown below. This fucntion can be used as a template function to write and use of function to determine the amount of a molecule e.g. area under curve or summed up intensities. All self written function must return amount, rt at or around the amount and the mScore at or around the amount in an dictionary with appropiate key names.

Example key names (default):

  • ‘max I in window’
  • ‘max I in window (rt)’
  • ‘max I in window (score)’

Note

This is the default function to determine the peptide amount when write_amount_csv() is called.

Examples:

{
   'rt'       : [rt1,rt2,...],
   'i'        : [in1,in2,...],
   'scores'   : [sc1,sc2,...],
   'spec_ids' : [id1,idt2,...],
}
Returns:keys are shown above
Return type:dict
extract_results(molecules=None, charges=None, file_names=None, label_percentiles=None, formulas=None, score_threshold=None)

Extract selected results.

Extracts all matches from the results instance that meet given filter criteria.

Parameters:
  • molecules (list of str, optional) – considered molecules. Those will be translated using self._translate_molecules_to_formulas()
  • charges (list of int, optional) – considered charge states.
  • file_names (list of str, optional) – list of file names to be considered.
  • label_percentiles (list of tuple, optional) – list of label percentile tuples to be considered.
  • formulas (list of str) – list of chemical formulas
Yields:

key, i, entry (tuple) – result class key, index of entry and entry

Structure

key (named tuple)

  • file_name
  • formula
  • charge
  • label_percentiles

value (named tuple)

  • spec_id
  • rt
  • score
  • scaling_factor
  • peaks
max_score(molecules=None, charges=None, file_names=None, label_percentiles=None, formulas=None)

Find max score for a given set of parameters.

Parameters:
  • molecules (list of str, optional) – considered molecules. Those will be translated using self._translate_molecules_to_formulas()
  • charges (list of int, optional) – considered charge states.
  • file_names (list of str, optional) – list of file names to be considered.
  • label_percentiles (list of tuple, optional) – list of label percentile tuples to be considered.
  • formulas (list of str) – list of chemical formulas
Returns:

key is appropriate key in result.dict

Return type:

best_score, key, index (tuple)

plot_MIC_3D(key, file_name=None, rt_window=None, i_transform=None)

Plot MIC from results using rpy2 in 3D.

plot_MICs_2D(key_list, file_name=None, rt_window=None, i_transform=None, xlimits=None, additional_legends=None, title=None, zlimits=None, ablines=None, graphics=None)
Parameters:additional_legends (dict) – key points on lists of strings that are plotted as well.
write_result_csv(output_file_name=None)

Write raw results into a .csv file

Parameters:output_file_name (str) – output file name of the csv containing containing all raw results, should be a complete path

Warning

Depending on data size the resulting csv can become very large. Some csv viewer can not handle files with a large number of lines.

Keys in csv:

  • Formula : molecular formula of the molecule (str)
  • Molecule : molecule or trivial name (str)
  • Charge : charge of the molecule (int)
  • ScanID : ScanID of the quantified spectrum (int)
  • Label Percentiles : Labeling percentile ( (element, enrichment in %), )
  • Amount : the determined amount of the molecule
  • Retention Time : retetention time of the ScanID
  • mScore : score of the isotopologue match
  • Filename : filename of spectrum input files
write_result_mztab(output_file_name=None, rt_border_tolerance=None)

Write minimal peptide quantification results into a .mztab file. It is neccessary to specify the ‘formula to evidences’ dict in the lookup of the results class to write results!

Note:

This basic mzTab writer is still in beta stage. Use and evaluate with care.

PRIDE CV based quantifcation unit and value is fixd to:

  • PRIDE:0000393, Relative quantification unit
  • PRIDE:0000425, MS1 intensity based label-free quantification method
Args:
output_file_name (str): output file name of the mztab containing
containing all raw results, should be a complete path

Note:

Adiitional information has to be passed tot he result class for a more complete mztab output.

Keys in mztab:

  • sequence
  • accession
  • unique
  • database
  • database_version
  • search_engine
  • best_search_engine_score[1-n]
  • modifications
  • retention_time
  • retention_time_window
  • charge
  • mass_to_charge
  • peptide_abundance_study_variable[1-n]
  • peptide_abundance_stdev_study_variable[1-n]
  • peptide_abundance_std_error_study_variable[1-n]
  • search_engine_score[1-n]_ms_run[1-n]
  • peptide_abundance_assay[1-n]
  • spectra_ref
  • opt_{identifier}_*
  • reliability
  • uri

Addtional information can be added to the mzTab file by adding a dict like shown below to the results.lookup dict under the key ‘mztab_meta_info’.:

mztab_meta_info = {
    'protein_search_engine_score'   : [],
    'psm_search_engine_score'       : ['[MS,MS:1001475,OMSSA:evalue, ]'],
    'fixed_mod'                     : ['[UNIMOD, UNIMOD:4, Carbamidomethyl, ]'],
    'variable_mod'                  : ['[UNIMOD, UNIMOD:35, Oxidation, ]'],
    'study_variable-description'    : ['Standard BSA measurement'],
    'ms_run-location'               : ['BSA1.mzML'],
}
write_rt_info_file(output_file=None, list_of_csvdicts=None, trivial_name_lookup=None, rt_border_tolerance=None, update=True, buffer_only=False)

Function to write a default quant summary/rt info file. See e.g. example script generate_quant_summary_file.py.

Parameters:
  • output_file (str) – output file name of the csv, should be a complete path
  • list_of_csvdicts (list) – list of dictionaries passed to the DictWriter class, default fieldnames can be found below
  • trivial_name_lookup (dict) – self defined trivial_name_lookup, see format below.
  • rt_border_tolerance (int) – retention time border tolerance in minutes
  • update (bool) – if True read in or passed dictionaries in list_of_csvdicts will be updated with default evidence and trivial name information

The quant summary file can manually be updated (e.g. the start and stop RT information). If an evidence lookup is present in the result class ( can be passed to the isotopologue library or later be set in the result class), these information are used to define the retention time borders (e.g. peptide identfication information from peptide spectrum matches).

Default fieldnames:

  • file_name : filename of spectrum input file
  • formula : molecular formula of the molecule
  • molecule : molecule or trivial name
  • trivial_name(s) : protein or trivial names
  • label_percentiles : labeling percentile ( (element, enrichment in %), )
  • charge : charge of the molecule
  • start (min) : start of retention time window
  • stop (min) : stop of retention time window
  • max I in window : maximum intensity in retention time window
  • max I in window (rt) : retention time @ maximum intensity in retention time window
  • max I in window (score) : score @ maximum intensity in retention time window
  • auc in window : area under curve in retention time window
  • sum I in window : summed up intensities in retention time window
  • evidences (min) : all evidences/identifications (score@rt;…)

Trivial name lookup example:

{
    'C(33)H(59)14N(1)N(8)O(9)S(1)' : ['BSA','Bovine serum albumine']
}