Tutorial example

Once CONGA has been installed in your conda-environment, CONGA can be called as a module using python3 -m CONGA. On this page we walk through a simple example of running conga for the three different search programs, Tide, Comet and MSFragger.

Crux-Tide

You can download the search files to test the commands below: Tide-search files.

To run CONGA using Tide results with default parameters, use the following command in the terminal.

python3 -m CONGA path/to/narrow_example1.tide-search.txt path/to/open_example1.tide-search.txt

CONGA allows for a variety of delta-masses (possibly unaccounted PTMs) and variable modifications to be reported for the same discovered sequence. It does this by clustering together PSMs by considering precursors with similar masses (in m/z units) that are matched to the same discovered “stem” sequence. It then takes the maximal scoring PSM in each cluster and for each modified variant of the “stem” sequence. To cluster the PSMs accurately, it is beneficial to supply a pair of values to --isolation-window, which should be the lower and upper isolation window offsets in m/z units. By default, CONGA uses --isolation-window 2,2. E.g.

python3 -m CONGA --isolation-window 0.75,0.75 path/to/narrow_example1.tide-search.txt path/to/open_example1.tide-search.txt

CONGA’s target-decoy competition is conducted over (unmodified) “stem” sequences. In other words, it reports the highest scoring PSM among all PSMs with the same stem sequence. If you want CONGA to report peptides that distinguish between sequences with different modifications, you can turn this feature off by setting --account_mods F. Note that the assumptions of theoretical FDR control do not strictly hold in this case, because target peptides with variable modifications may be scored highly simply because they share similar b- and y-ions in common with the same target peptide but with possibly a different ensemble of variable modifications. In any case, if you wish to proceed, you will need to provide the tab-delimited target-decoy peptide pairs produced by tide-index.

python3 -m CONGA --account_mods F --isolation-window 0.75,0.75 path/to/narrow_example1.tide-search.txt path/to/open_example1.tide-search.txt path/to/tide-index_example1.peptides.txt

The default score uses Tailor scores. If you prefer to use Xcorr Scores, you can change the score option --score xcorr_score:

python3 -m CONGA --score xcorr_score --isolation-window 0.75,0.75 path/to/narrow_example1.tide-search.txt path/to/open_example1.tide-search.txt

The software will automatically detect if the PSMs in the search files have been searched against a concatenated target-decoy database or separately against each database. If the latter, then you will need to have the name target appear in the search file names. In that case, the software will search for the respective decoy-search files by replacing target with decoy in the search file names in the same directory.

python3 -m CONGA path/to/narrow_example2.tide-search.target.txt path/to/open_example2.tide-search.target.txt

Comet

You can download the search files to test the commands below: Comet-search files.

You can use either standalone Comet or the implementation of Comet in Crux. The CONGA software will automatically standardize the difference in terminology between the standalone version and the one in Crux. To run CONGA using Comet-search results with default parameters, use the following command in the terminal.

python3 -m CONGA --score e-value path/to/narrow_example1.comet.txt path/to/open_example1.comet.txt

There is no need to provide a list of target-decoy pairs, because Comet always reverses the target peptides to yield decoy peptides. Comet can use either Xcorr scores or E-values. You will need to specify the score, because the default (Tailor score) is specific to Tide.

All other discussion related to CONGA’s handle on varaible modifications in Crux-Tide equally applies to Comet search files.

MSFragger

You can download the search files to test the commands below: MSFragger-search files.

Using MSFragger search files is more challenging. MSFragger nor any of the related softwares in Fragpipe produce decoy peptides that pair with the target peptides, which is essential for CONGA. Unfortunately not even reversing the entire protein sequence and digesting the protein to produce decoy peptides yields the same result as reversing the target peptides. Instead the user must always provide a tab-delimited target-decoy pairing in the same format as what Tide-index produces. You can do this by creating your own script, with two columns one labelled target and the other decoy, or more simply by implementing Tide-index. The latter is reasonably achievable by matching the parameters in Tide-index with MSFragger. More specifically we have the following equivalencies between the two softwares.

Parameters

Tide-index

MSFragger

–enzyme

search_enzyme_name_1

–missed_cleavages

allowed_missed_cleavage_1

–max_mods

max_variable_mods_combinations

–max_length

digest_max_length

–min_length

digest_min_length

–min_mass, –max_mass

digest_mass_range

In addition to the above parameters, one should also make sure that the number of decimal places reported by variable-modifications in tide-index match with those in MSFragger. This is only important for target-decoy pairing when variable modifications are considered.

To run CONGA using MSFragger-search results with default parameters, use the following command:

python3 -m CONGA --score hyperscore path/to/narrow.tsv path/to/open.tsv path/to/tide-index.peptides.txt

Like Comet, the score needs to be specified, because the default is Tailor score. All other discussion related to CONGA’s handle on varaible modifications Crux-Tide equally applies to MSFragger-search files.

Interpreting the log file

Here we explain what a typical log file looks like from CONGA:

INFO: CPU: macOS-10.16-x86_64-i386-64bit (1)
INFO: Version: 1.0.1 (2)
INFO: 2023-02-09 16:06:16.072313 (3)
INFO: Command used: /Volumes/Expansion/Documents_from_local_computer/CONGA/CONGA/__main__.py --overwrite T --competition_window 4 data/tide/narrow_example1.tide-search.txt data/tide/open_example1.tide-search.txt (4)
INFO: Successfully read in arguments
INFO: Reading in search files.
INFO: Successfully read in search files.
INFO: tailor_score successfully found in search files. (5)

INFO: Tide search files detected. (6)
INFO: Concatenated search files detected. (7)
INFO: Filtering for neighbours. (8)
INFO: Doing dynamic level competition. (9)
INFO: Constructing groups adaptively. (10)
INFO:                                    decoys  targets     ratio (11)
group names:
narrow                               1706    19495  0.087510
top 1 PSMs & top (4, 6] mass bins      69      117  0.589744
top 1 PSMs & top 1 mass bin            16      714  0.022409
top 1 PSMs & top 3 mass bin            18      176  0.102273
top 2 or more PSMs                    114      590  0.193220
left over group                      2226     2497  0.891470
INFO: Applying group walk. (12)
INFO: Group walk complete.
INFO: 19196 peptides discovered at the 1% FDR level. (13)
INFO: 20076 peptides discovered at the 5% FDR level.
INFO: Scan multiplicities among the discovered peptides at 1% FDR level: (14)
INFO:                     Count
Scan multiplicity:
1                   18426
2                     385
INFO: Scan multiplicities among the discovered peptides at 5% FDR level:
INFO:                     Count
Scan multiplicity:
1                   19284
2                     396
INFO: Writing peptides at user-specified FDR level to directory.
INFO: Elapsed time: 49.64 s
  1. CPU information.

  2. Version number.

  3. Date and time.

  4. Command used.

  5. CONGA checks whether the score specified is found in the search files provided.

  6. CONGA detects what type of search engine was used based off the column names of the search files provided.

  7. CONGA detects whether a separate search or a concantenated search was conducted based off whether it is able to find any decoys in the search files.

  8. CONGA notifies the user that it is about to initiate the first phase of filtering for neighbouring peptides.

  9. CONGA notifies the user that it is about to initiate the second phase of undergoing dynamic competition.

  10. CONGA notifies the user that it is about to initiate the third phase of partitioning the peptides into groups.

  11. CONGA reports the names of the groups it constructed, the number of decoys and targets in each group, and the ratio of decoys to targets in each group.

  12. CONGA notifies the user that it is about to initiate the fourth phase of applying the group-walk algorithm.

  13. The number of peptides discovered at the 1% FDR level and at the 5% FDR level.

  14. A count of the number of times each scan is responsbile for discovering peptide at the 1% and 5% FDR level. As an example, there are 18426 scans each responsible for discovering 1 peptide, and 385 scans responsible for discovering 2 peptides at the 1% level. Hence the total number of discoveries are 18426 + 2*385 = 19196