AntibodyMap Documentation

If you would like to contact us about anything related to antibodymap resource, please drop an email to konrad@proteincontact.org. Documentation and how-to's are available for the following services:

  • (SAAB) Structurally annotating the Ig-seq datasets here.
  • (OAS) Observed Antibody Space Database here.

Structural Annotation of Antibodies (SAAB)

Our algorithm performs structural annotation of large volumes of sequences such as those resulting from Ig-seq experiments.

We make our algorithm available in two forms: a web version to annotate single sequences and a standalone version to annotate entire datasets.

The standalone version is available and documented in our repository here.

The web version is designed to demonstrate structural annotation for single variable domain sequence. Users can paste their variable domain sequence and press 'annotate'. The annotation screen shows the optimal templates for the framework and each CDR region separately. If the template antibody is in complex with an antigen in its PDB entry, the name of the antigen is also reported.

Observed Antibody Space (OAS) Database

The OAS database collates >600B redundant antibody sequences from 53 studies. We make the data available for bulk download and filtering with respect to certain metadata parameters. For bulk download or to use our online filter please go here. The data is released under CC-BY license.

The OAS database can be filtered using meta-data entries, such as organism, isotype etc. The fields are non-exclusive meaning that the user can choose a combination of fields that does not exist in our database (for instance specifying isotype and light chain which is impossible)

After filtering, there is an option to bulk download all data-units (see below) that match your constraints. This is in the form of a shell script with consecutive wget commands. You should be simply able to download it and issue:

chmod u+rx bulk_download.sh
./bulk_download.sh

Data in OAS database is organized into studies, that are in turn sub-divided into data-units. A single data-unit is a set of sequences uniquely identified by its metadata. The range of meta-parameters are:

Each data-unit is a divided into the first line that holds the metadata on the unit in json format and therefter json-formatted sequence data on each line. Each sequences holds the following information:

Therefore the contents of each data-unit file can look as below and you can read more here how to parse them out:

{
	"Longitudinal": "no",
	"Chain": "Heavy",
	"Author": "Bashford et al., (2013)",
	"Isotype": "Bulk",
	"Age": "81",
	"Disease": "CLL",
	"Link": "https://doi.org/10.1101/gr.154815.113",
	"Vaccine": "None",
	"BType": "Unsorted-B-Cells",
	"Subject": "subject-10",
	"Species": "human",
	"BSource": "PBMC",
	"Size": 5
} {
	"redundancy": 1,
	"name": 1,
	"seq": "SVKVSCKASGGTFSSYAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSEDTAVYYCARVIPDDIVVVPAAIYYYGYGRLGGQGTTVTVSS",
	"cdr3": "ARVIPDDIVVVPAAIYYYGYGRL",
	"j": "IGHJ6*02",
	"v": "IGHV1-69D*01",
	"data": "{\"fwh1\": {\"24\": \"K\", ..."
} {
	"redundancy": 1,
	"name": 2,
	"seq": "SVKVSCKASGGTFSSYAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSDDTAVYYCARVIPDDIVVVPAAIYYYGYGRRGGQGTTVTVSS",
	"cdr3": "ARVIPDDIVVVPAAIYYYGYGRR",
	"j": "IGHJ6*02",
	"v": "IGHV1-69D*01",
	"data": "{\"fwh1\": {\"24\": \"K\", \"25\": \"A\",..."
} 
							

Additionally, for each data-unit file we have prepared the corresponding file with the original nucleotide sequences.