Quick start

Download and installation

Please Download and install pyQms following these Installation instructions. Please consider using a virtual environment (e.g. using the excellent virtualenvwrapper) for using and developing pyQms.

Matching a peak list

Let’s start with a most simple example: Mathing a single peptide on a predefined peak list. Start a Python (3.4+) console and start quantifying in 4 steps:

First import pyQms:

import pyqms

Second, initialize a isotopologue library (pyqms.IsotopologueLibrary) using ‘DDSPDLPK’ as the example peptide (from BSA example file) and the charge state 2:

lib = pyqms.IsotopologueLibrary(
    molecules  = [ 'DDSPDLPK' ],
    charges    = [ 2 ],
)

Third, match the library on the provided peak list. You can find a peak list here, which will produce a match with this peptide. Copy and paste the peak list into the Python console.

Fourth, use the pyqms.IsotopologueLibrary.match_all() function to quantify the peptide using the peak list:

results = lib.match_all(
    mz_i_list = peak_list,
    file_name = 'test',
    spec_id   = 1165,
    spec_rt   = 29.10,
    results   = None
)

Done! The peptide has been quantified in the given peak list. Please continue with the next section to learn how to access and process the results.

Note

The keyword arguments file_name, spec_id and spec_rt are hardcoded in this example case. In the advanced examples these information (as well as the peak list) are parsed from the mzML file directly.

Access and interpret the results

The results object represents the pyqms.Results class and is organized as a dictionary:

results.keys()

Will give the following output:

dict_keys(
    [
        m_key(
            file_name='test',
            formula='C(37)H(59)N(9)O(16)',
            charge=2,
            label_percentiles=(('N', '0.000'),)
        )
    ]
)

The keys of the pyqms.Results class are namedtuple() with the following field_names:

  • file_name
  • formula
  • charge
  • label_percentiles

file_name related to the original file name of the LC-MS/MS runs, formula is the molecular formula of the input molecule/peptide, charge refers to the charge state of the matched isotope envelope and label_percentile indicates the labeling of the molecule. Default behaviour is to use the natural abundance of the element isotopes (default this fieldname is set to 0% artificical enrichment of nitrogen i.e. (‘N’,‘0.000’) in a tuple of multiple possible labeling percentiles i.e. ((‘N’,‘0.000’),).

Note

Every input molecule (e.g. peptide ‘DDSPDLPK’ ) will be converted to its molecular formula (‘C(37)H(59)N(9)O(16)’) in Hill notation by pyQms. To map between the peptide and formula, please use the integrated lookups, i.e. results.lookup[‘formula to molecule’] or results.lookup[‘molecule to formula’]`. Please consider, that multiple molecules can have the same formula, therefor e.g. results.lookup[‘formula to molecule’][‘C(37)H(59)N(9)O(16)’] is by default a list.

For each of the keys one will get the following dict:

{
    'data': [
        match(
            spec_id=1165,
            rt=29.1,
            score=0.9606609710868856,
            scaling_factor=40.75802642055527,
            peaks=(
                (443.7112735313511, 2517650.0, 1.0, 443.7112648946701, 62091), (444.21248374593875, 1156173.75, 0.4459422196277157, 444.2127374486285, 27689),
                (444.71384916266277, 336326.96875, 0.12958327918547244, 444.7142840859656, 8046),
                (445.21533524843596, 58547.0703125, 0.02805309805863953, 445.21582563050043, 1742)
            )
        )
    ],
    'max_score': 0.9606609710868856,
    'len_data': 1,
    'max_score_index': 0
}

The keys on the top level of this dictionary are:

  • data
  • max_score
  • len_data
  • max_score_index

While len_data will indicate how many spectra were matched for the formula in the repective key, max_score and max_score_index provides the maximum score, which was obtained during matching and the index of this match in the data list, respectively. The data list contains matches for all single spectra as namedtuple(). The following fieldnames are contained in each match:

  • spec_id
  • rt
  • score
  • scaling_factor
  • peaks

Besides the given input information on the spectrum like the spectrum ID (spec_id) and the retention time (spec_rt) the mScore of the match is provided (score) as well as the determined amount/intensity of the molecule in the spectrum (scaling_factor). Furthermore, detailed match information are given in peaks. This tuple contains for each peak of the isotopologue the following information in this order:

  • The measured (and matched) m/z value of the isotope peak in the spectrum
  • The measured intensity of the isotope peak in the spectrum
  • The relative intensity of the isotopologue peak to the monoisotopic peak
  • The calculated m/z value of the isotope peak of the input molecule
  • The calculated intensity of the isotope peak of the input molecule

These information can be processed to further analyze, besides the mScore, the quality of the match.

Note

Please note, that measured m/z entry in peaks can be None, if this peak was not found in the input data.

We have now seen, how peptides/molecules can be quantified and how the results can be accessed.

Note

The pyqms.Results class offers several functions to access, process and visualize the data. E.g. pyqms.Results.extract_results() provides and iterator yielding key, i, entry. The key is the namedtuple() containing the molecules information, i is the position of entry in results[key][‘data’] and entry is the match namedtuple().

Quantify peptides in a whole LC-MS run

This part will describe how to process a whole LC-MS/MS run and quantify multiple peptides in one batch. This example assumes you have started your Python console in the pyqms base folder.

For this example we will use pymzML, which is used to parse mzML files and retrieve the spectra and meta data used for quantification. pymzML will be installed as a requirement (See: Installation).

We start again by importing pyQms and initializing a isotopologue library ( pyqms.IsotopologueLibrary ):

import pyqms
lib = pyqms.IsotopologueLibrary(
    molecules        = [
        'HLVDEPQNLIK',
        'YICDNQDTISSK',
        'DLGEEHFK'
    ],
    charges          = [2, 3, 4, 5],
)

We need to import pymzML and initialize the run. Note, that the path to the BSA1 mzML file (‘data/BSA1.mzML’) may have to be adjusted. This file can be downloaded using this example script get_example_BSA_file (See: Get the BSA example mzML file) and can then be found under the ‘data’ folder in the pyqms base folder.

import pymzml
run = pymzml.run.Reader( 'data/BSA1.mzML' )

We now iterate over the spectra in the mzML file and quantify all peptides in all MS1 spectra. Before we start the loop we set the results variable to None. Please note, that the results variable is iteratively passed to pyqms.IsotopologueLibrary.match_all(). This will lead to one results object, which combines quantifications for all peptides in every spectra. See also description above (see: access results) or refer directly to the pyqms.Results: class :

results = None
for spectrum in run:
    scan_time = spectrum['scan time']
    spec_id   = spectrum['id']
    if spectrum['ms level'] == 1:
        results = lib.match_all(
            mz_i_list = spectrum.centroidedPeaks,
            file_name = 'BSA1',
            spec_id   = spec_id,
            spec_rt   = scan_time,
            results   = results
        )

Note

pymzML centroids spectra if these are not already centroided, if spectrum.centroidedPeaks is accessed.

The results can now be accessed as described above (see: access results). Furthermore the pyqms.Results class can be pickled:

import pickle
pickle.dump(
    results,
    open(
        'data/BSA1_pyQms_results.pkl',
        'wb'
    )
)

For further examples and how to use the adaptor functions, please refer to the next section.

Use the adaptors, Luke

The Adaptors functions are useful for parsing a set of identified peptides (e.g. from Ursgal result files; Ursgal documentation) including retention time information for determining the maximum intensity of every (identified) peptide in the LC-MS/MS measurement. Furthermore, adaptors can be added to e.g. read results of other analysis pipelines and tools.

The current adaptor to read Ursgal results can be used as follows for the shipped identification result file of the database search engine OMSSA. Please note, that if the adaptors are used one need to define fixed modifications like Carbamidomathylation as presented. This modification and the molecules will then be correctly formatted as input for pyqms:

import pyqms
import pyqms.adaptors
input_fixed_labels = {
    'C' : [
        {
            'element_composition' : {
                'O'   : 1,
                'H'   : 3,
                '14N' : 1,
                'C'   : 2
            },
            'evidence_mod_name': 'Carbamidomethyl'
        },
    ]
}
formatted_fixed_labels, evidence_lookup, molecules = pyqms.adaptors.parse_evidence(
    fixed_labels   = input_fixed_labels,
    evidence_files = [ 'data/BSA1_omssa_2_1_9_unified.csv' ],
)

The returned objects can be used a direct input for the pyQms pyqms.IsotopologueLibrary. The advantage of parsing evidence files is, that MS2 identification information is added to the results and can e.g. be used for defining RT windows for a correct quantification of every peptide:

lib = pyqms.IsotopologueLibrary(
    molecules    = molecules,
    charges      = [1, 2, 3, 4, 5],
    fixed_labels = formatted_fixed_labels,
    evidences    = evidence_lookup
)

Further examples and more adavanced usage

Please refer to the Example Scripts section for more usage examples and ready-to-go Python scripts for quantification, data analysis and visualization.