Reading Compressed Result Files


from pyXLMS import __version__
 
print(f"Installed pyXLMS version: {__version__}")

✓


    Installed pyXLMS version: 1.8.7


from pyXLMS import parser
from pyXLMS import transform

All functionality to parse results is available via the parser submodule. We also import the transform submodule to show some summary statistics of the read files.

Using `**kwargs` and Reading Compressed Files via `parser.read()`


parser_result = parser.read(
    "../../data/_test/annotate_string_scores/Nucleus_Rep1_Crosslinks.txt.xz",
    engine="MS Annika",
    crosslinker="DSBSO",
    format="txt",
    unsafe=True,
    verbose=0,
    compression="xz",
)

✓


    Reading MS Annika crosslinks...: 100%|██████████████████████████████████████████████████████████████████████| 81066/81066 [00:04<00:00, 16415.63it/s]

We can read any file using the parser.read() method and specifying the used search engine and crosslinker. We can also pass additional parameters (**kwargs) that are used by the search engine specific parsers or by pandas.read*, e.g. in this case we pass two additional parameters to parser.read_msannika() namely format, and unsafe. We also pass one parameter to pandas.read_csv() namely compression to signal that our result file is xz compressed. For more information you can read the documentation of the parser.read() method here: docs.

Important

Please note that you might need to explicitly specify a format when passing parameters to pandas.read* as the parser might not be able to infer the format if the file is compressed!


transform.display(parser_result)

✓


    Data Type:                            parser_result
    Completeness:                         partial
    Identifying Search Engine:            MS Annika
    Number of Crosslink-Spectrum-Matches: None
    Number of Crosslinks:                 81066

The parser.read() method returns a dictionary with a set of specified keys and their values. We refer to this dictionary as a parser_result object. All parser.read* methods return such a parser_result object, you can read more about that here: docs, and here: data types specification.

As you can see from the parser_result the compressed result file contained crosslinks. We would be able to access those via parser_result["crosslinks"]. We will do this a bit further down.


_ = transform.summary(parser_result)

✓


    Number of crosslinks: 81066.0
    Number of unique crosslinks by peptide: 81066.0
    Number of unique crosslinks by protein: nan
    Number of intra crosslinks: 7791.0
    Number of inter crosslinks: 73275.0
    Number of target-target crosslinks: 32007.0
    Number of target-decoy crosslinks: 0.0
    Number of decoy-decoy crosslinks: 49059.0
    Minimum crosslink score: 1.0
    Maximum crosslink score: 1159.12

With the transform.summary() method we can also print out some summary statistics about our read crosslinks. You can read more about the method here: docs.


sample_xl = parser_result["crosslinks"][0]
transform.display(sample_xl)

✓


    Data Type:                          crosslink
    Completeness:                       full
    Alpha Peptide:                      GGAKR
    Alpha Peptide Crosslink Position:   4
    Alpha Proteins:                     ['Q9UQ26']
    Alpha Proteins Crosslink Positions: [238]
    Alpha Decoy:                        True
    Beta Peptide:                       KGGGK
    Beta Peptide Crosslink Position:    1
    Beta Proteins:                      ['P25490', 'Q8WXX5']
    Beta Proteins Crosslink Positions:  [237, 10]
    Beta Decoy:                         True
    Crosslink Type:                     inter
    Crosslink Score:                    40.09

Using parser_result["crosslinks"][0] we can get the first crosslink of the file and take a closer look at that.

Here is an example crosslink, you can learn more about the specific attributes and their values here: docs, and here: data types specification.


type(parser_result["crosslink-spectrum-matches"])

✓


    NoneType

In this example parser_result["crosslink-spectrum-matches"] is None because this file did not contain any crosslink-spectrum-matches (CSMs). Therefore in this case, no CSMs can be displayed here.

Reading Compressed Result Files

Using **kwargs and Reading Compressed Files via parser.read()

Using `**kwargs` and Reading Compressed Files via `parser.read()`