Serializing pyXLMS Objects

If you want to save your data to continue your analysis at a later point or with a different tool or programming language you need serialization. pyXLMS supports two different ways to serialize and de-serialize crosslink-spectrum-matches, crosslinks, and parser_result objects which is illustrated by the examples below.


from pyXLMS import __version__
 
print(f"Installed pyXLMS version: {__version__}")

✓


    Installed pyXLMS version: 1.5.3


from pyXLMS import parser
from pyXLMS import transform

We import the parser submodule to read crosslink-spectrum-matches and crosslinks, and the transform submodule for serialization and to show some summary statistics.


parser_result = parser.read(
    "../data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
    engine="MS Annika",
    crosslinker="DSS",
)

✓


    Reading MS Annika CSMs...: 100%|███████████████████████████████████████████████████| 826/826 [00:00<00:00, 3615.67it/s]
    Reading MS Annika crosslinks...: 100%|█████████████████████████████████████████████| 300/300 [00:00<00:00, 8051.05it/s]

We read crosslink-spectrum-matches and crosslinks using the generic parser from a single .pdResult file.


csms = parser_result["crosslink-spectrum-matches"]
xls = parser_result["crosslinks"]

For easier access we assign our crosslink-spectrum-matches to the variable csms and our crosslinks to the variable xls.


_ = transform.summary(csms)

✓


    Number of CSMs: 826.0
    Number of unique CSMs: 826.0
    Number of intra CSMs: 803.0
    Number of inter CSMs: 23.0
    Number of target-target CSMs: 786.0
    Number of target-decoy CSMs: 39.0
    Number of decoy-decoy CSMs: 1.0
    Minimum CSM score: 1.1132827525593785
    Maximum CSM score: 452.9861536355926

With the transform.summary() method we can also print out some summary statistics about our read crosslink-spectrum-matches. You can read more about the method here: docs.


_ = transform.summary(xls)

✓


    Number of crosslinks: 300.0
    Number of unique crosslinks by peptide: 300.0
    Number of unique crosslinks by protein: 298.0
    Number of intra crosslinks: 279.0
    Number of inter crosslinks: 21.0
    Number of target-target crosslinks: 265.0
    Number of target-decoy crosslinks: 0.0
    Number of decoy-decoy crosslinks: 35.0
    Minimum crosslink score: 1.1132827525593785
    Maximum crosslink score: 452.9861536355926

Also some summary statistics about our read crosslinks.

Serialization to Tables

pyXLMS crosslink-spectrum-matches and crosslinks can be serialized and saved as tables to any format supported by pandas . De-serialization, e.g. reading the created tables, is only available via text-based formats (e.g. csv, tsv), Microsoft Excel format, and parquet format.

Warning

Table-based serialization of parser_result objects is not supported!

Serialization to CSV


transform.to_dataframe(csms).to_csv("csms.csv")

Serialization to tables is facilitated via the transform.to_dataframe() function [docs ] which returns a pandas DataFrame and can then be saved to disk via one of the pandas functions.


loaded_csms = parser.read_custom("csms.csv")

✓


    Reading CSMs...: 100%|█████████████████████████████████████████████████████████████| 826/826 [00:00<00:00, 3392.79it/s]

Using the function parser.read_custom() [docs , page] we can the read back the data from the file…


_ = transform.summary(loaded_csms)

✓


    Number of CSMs: 826.0
    Number of unique CSMs: 826.0
    Number of intra CSMs: 803.0
    Number of inter CSMs: 23.0
    Number of target-target CSMs: 786.0
    Number of target-decoy CSMs: 39.0
    Number of decoy-decoy CSMs: 1.0
    Minimum CSM score: 1.1132827525593785
    Maximum CSM score: 452.9861536355926

…which will give us back our original crosslink-spectrum-matches.

Serialization to Parquet


transform.to_dataframe(xls).to_parquet("xls.parquet")

Serialization to tables is facilitated via the transform.to_dataframe() function [docs ] which returns a pandas DataFrame and can then be saved to disk via one of the pandas functions.


loaded_xls = parser.read_custom("xls.parquet")

✓


    Reading crosslinks...: 100%|███████████████████████████████████████████████████████| 300/300 [00:00<00:00, 6181.46it/s]

Using the function parser.read_custom() [docs , page] we can the read back the data from the file…


_ = transform.summary(loaded_xls)

✓


    Number of crosslinks: 300.0
    Number of unique crosslinks by peptide: 300.0
    Number of unique crosslinks by protein: 298.0
    Number of intra crosslinks: 279.0
    Number of inter crosslinks: 21.0
    Number of target-target crosslinks: 265.0
    Number of target-decoy crosslinks: 0.0
    Number of decoy-decoy crosslinks: 35.0
    Minimum crosslink score: 1.1132827525593785
    Maximum crosslink score: 452.9861536355926

…which will give us back our original crosslinks.

Serialization to JSON

All pyXLMS objects - including whole parser_result objects - are serializable to and de-serializable from JavaScript Object Notation (JSON) using the standard python json library.


import json
from typing import Any
 
 
def to_json(data: Any, filename: str) -> None:
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=4)
    return
 
 
to_json(csms, "csms.json")
to_json(xls, "xls.json")
to_json(parser_result, "pr.json")

We can serialize our crosslink-spectrum-matches, crosslinks, and parser_result using json.dump()…


def from_json(filename: str) -> Any:
    with open(filename, "r", encoding="utf-8") as f:
        return json.load(f)
 
 
loaded_csms = from_json("csms.json")
loaded_xls = from_json("xls.json")
loaded_parser_result = from_json("pr.json")

…and we can de-serialize our JSON files again using json.load()…


_ = transform.summary(loaded_csms)

✓


    Number of CSMs: 826.0
    Number of unique CSMs: 826.0
    Number of intra CSMs: 803.0
    Number of inter CSMs: 23.0
    Number of target-target CSMs: 786.0
    Number of target-decoy CSMs: 39.0
    Number of decoy-decoy CSMs: 1.0
    Minimum CSM score: 1.1132827525593785
    Maximum CSM score: 452.9861536355926

…which will give us back our original crosslink-spectrum-matches,…


_ = transform.summary(loaded_xls)

✓


    Number of crosslinks: 300.0
    Number of unique crosslinks by peptide: 300.0
    Number of unique crosslinks by protein: 298.0
    Number of intra crosslinks: 279.0
    Number of inter crosslinks: 21.0
    Number of target-target crosslinks: 265.0
    Number of target-decoy crosslinks: 0.0
    Number of decoy-decoy crosslinks: 35.0
    Minimum crosslink score: 1.1132827525593785
    Maximum crosslink score: 452.9861536355926

…crosslinks,…


_ = transform.summary(loaded_parser_result)

✓


    Number of CSMs: 826.0
    Number of unique CSMs: 826.0
    Number of intra CSMs: 803.0
    Number of inter CSMs: 23.0
    Number of target-target CSMs: 786.0
    Number of target-decoy CSMs: 39.0
    Number of decoy-decoy CSMs: 1.0
    Minimum CSM score: 1.1132827525593785
    Maximum CSM score: 452.9861536355926
    Number of crosslinks: 300.0
    Number of unique crosslinks by peptide: 300.0
    Number of unique crosslinks by protein: 298.0
    Number of intra crosslinks: 279.0
    Number of inter crosslinks: 21.0
    Number of target-target crosslinks: 265.0
    Number of target-decoy crosslinks: 0.0
    Number of decoy-decoy crosslinks: 35.0
    Minimum crosslink score: 1.1132827525593785
    Maximum crosslink score: 452.9861536355926

…and parser_result.