AntibodyMap

Observed Antibody Space (OAS) Database

The OAS database collates >1B redundant antibody sequences from 60 studies. We make the data available for bulk download and filtering with respect to certain metadata parameters. For bulk download or to use our online filter please go here. The data is released under CC-BY license.

The OAS database can be filtered using meta-data entries, such as organism, isotype etc. The fields are non-exclusive meaning that the user can choose a combination of fields that does not exist in our database (for instance specifying isotype and light chain which is impossible)

After filtering, there is an option to bulk download all data-units (see below) that match your constraints. This is in the form of a shell script with consecutive wget commands. You should be simply able to download it and issue:

chmod u+rx bulk_download.sh
./bulk_download.sh

Data in OAS database is organized into studies, that are in turn sub-divided into data-units. A single data-unit is a set of sequences uniquely identified by its metadata. The range of meta-parameters are:

Chain Heavy/light chain annotation.
Isotype Identified or deposited isotype information.
Age Information on age of the human B-cell donors.
Disease Indicates whether the donor was sick at the time of B-cell extraction.
Vaccine Indicates whether the B-cell donor was purposely immunized prior to B-cell extraction.
B-cell subset Indicates whether a particular B-cell subset was sorted for Ig-seq.
Species Organism of the B-cell donor.
Author First author and date of publication.
Link Link to the publication with the study.
Size Number of non-redundant sequences.
B-cell source Which organ/tissue the B-cells were extracted from.
Subject Indicates whether the B-cells can be tracked back to a particular individual.
Longitudinal If the study is conducted over a period of time, indicates the particular timepoint when B-cells were sourced.

Each data-unit is a divided into the first line that holds the metadata on the unit in json format and therefter json-formatted sequence data on each line. Each sequences holds the following information:

Gene Annotation IMGT gene annotations for the sequence. These are split into V and J gene annotations.
IMGT-numbered sequence Amino acid sequence annotated according to the IMGT numbering scheme.
Amino-acid sequence. Un-numbered sequence of amino acids.
Redundancy. How often we have seen this amino acid sequence in the given data-unit.
CDR-3 IMGT-defined sequence of the CDR-3

Therefore the contents of each data-unit file can look as below and you can read more here how to parse them out:

{
	"Longitudinal": "no",
	"Chain": "Heavy",
	"Author": "Bashford et al., (2013)",
	"Isotype": "Bulk",
	"Age": "81",
	"Disease": "CLL",
	"Link": "https://doi.org/10.1101/gr.154815.113",
	"Vaccine": "None",
	"BType": "Unsorted-B-Cells",
	"Subject": "subject-10",
	"Species": "human",
	"BSource": "PBMC",
	"Size": 5
} {
	"redundancy": 1,
	"name": 1,
	"seq": "SVKVSCKASGGTFSSYAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSEDTAVYYCARVIPDDIVVVPAAIYYYGYGRLGGQGTTVTVSS",
	"cdr3": "ARVIPDDIVVVPAAIYYYGYGRL",
	"j": "IGHJ6*02",
	"v": "IGHV1-69D*01",
	"data": "{\"fwh1\": {\"24\": \"K\", ..."
} {
	"redundancy": 1,
	"name": 2,
	"seq": "SVKVSCKASGGTFSSYAISWVRQAPGQGLEWMGGIIPIFGTANYAQKFQGRVTITADESTSTAYMELSSLRSDDTAVYYCARVIPDDIVVVPAAIYYYGYGRRGGQGTTVTVSS",
	"cdr3": "ARVIPDDIVVVPAAIYYYGYGRR",
	"j": "IGHJ6*02",
	"v": "IGHV1-69D*01",
	"data": "{\"fwh1\": {\"24\": \"K\", \"25\": \"A\",..."
}

Additionally, for each data-unit file we have prepared the corresponding file with the original nucleotide sequences.

AntibodyMap Documentation

Structural Annotation of Antibodies (SAAB)

Observed Antibody Space (OAS) Database