API
Overview
BioGUID.org is primarily intended as a web service (rather than a web site). The website includes basic search functionality, and may continue to expand in the future.
However, the main development focus is on web services, available through the APIs described below. As BioGUID.org expands in content and the web services mature, updates
will be reported on this page. If you have any questions, or wish to report a bug or request new features, please contact Richard Pyle via email on deepreef [at] bishopmuseum.org.
Controlled Vocabularies
The following lists of terms are used by several BioGUID services.
Identifier Classses
- Generic Identifier
- URL
- LSID
- DOI
- UUID
- Handle
- ARK
Object Classses
- Agent
- Collection
- Dataset
- DereferenceService
- Event
- Evidence
- GeologicalContext
- Identification
- IdentifierDomain
- Language
- Location
- LocationType
- MaterialSample
- Media
- Occurrence
- Organism
- Reference
- ReferenceType
- Taxon
- Transaction
Relationship Types
- Congruent
- Includes
- IncludedIn
- Overlaps
Search Services
Search Identifier
URL: http://bioguid.org/searchIdentifier
Description:This service allows searching for identifiers within the BioGUID index. Searches are conducted
against both the raw identifier (e.g., a UUID, DOI, integer, or other text representing an identifier sensu stricto,
and against the identifier in full context (with dereference service prefixes and suffixes). For example, a search on
‘8BDC0735-FEA4-4298-83FA-D04F67C3FBEC’ yields the same results as a search on
‘urn:lsid:zoobank.org:act:8BDC0735-FEA4-4298-83FA-D04F67C3FBEC’ or
‘http://zoobank.org/8BDC0735-FEA4-4298-83FA-D04F67C3FBEC’.
Example Use Case 1: “I have an identifier, and I want to find out if any other identifiers exist that have been assigned to the same object.”
Example Use Case 2: “I have an non-self-resolving identifier, and I want to see what web services are available to dereference the identifier.”
Parameters:
- q= [any text string representing an identifier]
- format= [json | html] (default=json, if not provided)
Output Columns:
- ObjectClass [The class of object that is identified]
- IdentifierDomainUUID [The BioGUID identifier for the Identifier Domain within which this identifier exists]
- IdentifierClass [The Class of identifier (UUID, Handle, etc.)]
- Abbreviation [A standard abbreviation for the Identifier Domain]
- IdentifierDomain [The name of the Identifier Domain]
- IdentifierDomainDescription [A brief textual description of the Identifier Domain]
- IdentifierDomainLogo [A link to an image file representing the logo for the Identifier Domain]
- AgentUUID [A GNUB Agent UUID associated with this Identifier Domain]
- PreferredDereferenceServiceUUID [The BioGUID identifier for the preferred Dereference Service associated with the Identifier Domain (and, by extension, the identifier)]
- PreferredDereferenceServiceProtocol [The Internet Protocol used by the preferred Dereference Service]
- PreferredDereferenceService [The name of the preferred Dereference Service]
- PreferredDereferenceServiceDescription [A brief textual description of the preferred Dereference Service]
- PreferredDereferencePrefix [Prefix text that is prepended to an identifer to make it actionable within the indicated protocol, via the preferred Dereference Service]
- Identifier [The identifier (sensu stricto) itself]
- PreferredDereferenceSuffix [Suffix text that is Appended to an identifer to make it actionable within the indicated protocol, via the preferred Dereference Service]
- PreferredDereferenceServiceLogo [A link to an image file representing the logo for the Dereference Service]
- AlternateDereferenceServices [A structured text string containing the elements of additional Dereference Services (other than the Preferred Dereference Service) that may
perform action on the Identifier. Dereference Services are delimited by a pipe (‘|’) character, and elements within a Dereference Service are delimited by a tilde
(‘~’) character. The sequence of elements is: |[DereferenceServiceUUID]~[DereferenceServiceName]~[DereferencePrefix]~[DereferenceSuffix]~[DereferenceServiceLogo]|.]
Examples:
- http://bioguid.org/searchIdentifier?q=234 [Search for identifiers with “234” as part of the identifier string]
- http://bioguid.org/searchIdentifier?q=234&format=html [Same as previous example, but formatted in HTML]
- http://bioguid.org/searchIdentifier?q=cf0ada98-2dc0-4402-8ada-b192335d2d7e [Search for identifiers with “cf0ada98-2dc0-4402-8ada-b192335d2d7e” as part of the identifier string]
- http://bioguid.org/searchIdentifier?q=cf0ada98-2dc0-4402-8ada-b192335d2d7e&format=html [Same as previous example, but formatted in HTML]
Notes: This service uses a form of full-text indexing that matches only complete “words”. Matches are not limited to full-identifier exact matches, but the searched text
must be delimited by standard word-breakers. For example, a search for “234” will not include within the results identifiers such as “23456” or “1234”, but it will
include identifiers such as “10.2108/zsj.27.234” and “10.3897/zookeys.234.3417”. All horizontal tab characters (ASCII-09), Line-Feed (LF) characters (ASCII-10), and
Carriage Return (CR) characters (ASCII-13) are stripped from search terms, stored identifier values and indexed values, and all unicode text is converted to non-unicode text (most identifiers do not include
unicode characters anyway, and they certainly should not include embedded tab, LF and CR characters). Also, hyphens (-) and single-quote characters (') are stripped from both the entered search term
and from the index that is searched. This was done to allow searching of UUIDs and ISSNs both with and without hyphens embedded (among others), and to prevent other problems that single quote
characters can introduced. These characters are maintained within the stored identifier values, however.
Search Identifier Domain
URL: http://bioguid.org/searchIdentifierDomain
Description:This service allows searching for Identifier Domains Registered on BioGUID.org. Searches are conducted
against the standard Abbreviation, the full name, and the text description of the IdentifierDomain.
Example Use Case 1: “Has my Identifier Domain been registered within BioGUID.org?”
Example Use Case 2: “Show me a list of Registered Identifier Domains matching a search term.”
Parameters:
- q= [any text string representing part of an abbreviation, name or descriptive text for an Identifier Domain]
- format= [json | html] (default=json, if not provided)
Output Columns:
- IdentifierDomainUUID [The BioGUID identifier for the Identifier Domain]
- IdentifierClass [The Class of identifier (UUID, Handle, etc.) within this Identifier Domain]
- Abbreviation [A standard abbreviation for the Identifier Domain]
- IdentifierDomain [The name of the Identifier Domain]
- IdentifierDomainDescription [A brief textual description of the Identifier Domain]
- IdentifierDomainLogo [A link to an image file representing the logo for the Identifier Domain]
- AgentUUID [A GNUB Agent UUID associated with this Identifier Domain]
- PreferredDereferenceServiceUUID [The BioGUID identifier for the preferred Dereference Service associated with the Identifier Domain]
- PreferredDereferenceServiceProtocol [The Internet Protocol used by the preferred Dereference Service]
- PreferredDereferenceService [The name of the preferred Dereference Service]
- PreferredDereferenceServiceDescription [A brief textual description of the preferred Dereference Service]
- PreferredDereferencePrefix [Prefix text that is prepended to an identifier to make it actionable within the indicated protocol, via the preferred Dereference Service]
- PreferredDereferenceSuffix [Suffix text that is appended to an identifier to make it actionable within the indicated protocol, via the preferred Dereference Service]
- PreferredDereferenceServiceLogo [A link to an image file representing the logo for the Dereference Service]
Examples:
- http://bioguid.org/searchIdentifierDomain?q=ZooBank [Search for ZooBank Identifier Domains]
- http://bioguid.org/searchIdentifierDomain?q=ZooBank&format=html [Same as previous example, but formatted in HTML]
- http://bioguid.org/searchIdentifierDomain?q=doi [Search for the DOI Identifier Domain]
- http://bioguid.org/searchIdentifierDomain?q=doi&format=html [Same as previous example, but formatted in HTML]
- http://bioguid.org/searchIdentifierDomain?q=USNM [Search for Identifier Domains with the standard abbreviation for the U.S. National Museum (Smithsonian Institution)]
- http://bioguid.org/searchIdentifier?q=USNM&format=html [Same as previous example, but formatted in HTML]
Export Data
Data can be downloaded and extracted from BioGUID.org through both dynamic and static datasets. These datasets all share the same data structure, as follows:
Output Columns:
- BatchType [A Label representing the type of records included in the dataset. For example, ‘WithinDomainMultipleIdnetifiers’; see below]
- ObjectClass [The Class of Object (Occurrence, Reference, Taxon, Agent, etc.) represented by the identifier]
- IdentifierDomainUUID [The BioGUID identifier for the Identifier Domain]
- IdentifierDomain [The name of the Identifier Domain]
- DereferencePrefix [Prefix text that is prepended to the Identifier to make it actionable]
- Identifier [The identifier for the base record]
- DereferenceSuffix [Suffix text that is appended to the Identifier to make it actionable]
- RelatedObjectClass [The Class of Object (Occurrence, Reference, Taxon, Agent, etc.) represented by the related identifier]
- RelatedIdentifierDomainUUID [The BioGUID identifier for the Related Identifier Domain]
- RelatedIdentifierDomain [The name of the Related Identifier Domain]
- RelatedDereferencePrefix [Prefix text that is prepended to the RelatedIdentifer to make it actionable]
- RelatedIdentifier [The identifier for the related record]
- RelatedDereferenceSuffix [Suffix text that is appended to the RelatedIdentifer to make it actionable]
- RelationshipType [The nature of the relationship between the Identifier and the RelatedIdentifer. See the DarwinCore term relationshipOfResource]
Dynamic Data Download
This service allows you to submit the UUID for an Identifier Domain, and get back a full index of all identifiers that BioGUID has indexed within that Identifier Domain.
URL: http://bioguid.org/searchIdentifierDomain
Description:This service allows you to submit the UUID for an Identifier Domain, and get back a full index of all identifiers that BioGUID has indexed within that Identifier Domain. Optionally,
all linked identifiers can also be included in the data export.
Example Use Case 1: “Show me a list of all DOIs in BioGUID.org.”
Example Use Case 2: “Show me a list of all ISSNs with their corresponding linked identifiers.”
Parameters:
- d= [UUID of an Identifier Domain]
- ir= [0=simple index of all identifiers; 1=include all linked identifiers]
- format= [csv | html | tab] (default=tab, if not provided)
Examples:
- http://bioguid.org/identifierDomainIndex?d=5ccc425c-8d84-427a-9dbe-40abc4ff5118&ir=1&format=tab [All DOIs in BioGUID.org, in tab-delimited form]
- http://bioguid.org/identifierDomainIndex?d=3aae0b26-2d17-4ec0-8d7f-55f46e158476&ir=1&format=html [All ISBN identifiers with all associated cross-linked identifiers, as an html table]
- http://bioguid.org/identifierDomainIndex?d=7d5ac2f3-5b2a-46b0-91c8-a6285aa757aa&ir=1&format=csv [All ISSN identifiers with all associated cross-linked identifiers, in comma-delimited form]
Certain kinds of pre-processed datasets will be provided as downloadable Zip files with static datasets. These take too long to generate dynamically, so they are periodically updated and archived on the BioGUID.org website. More of these static datasets will be developed as requests are submitted to us by email to deepreef [at] bishopmuseum.org.
Downloadable Static Datasets:
- WithinDomainMultipleIdentifiers Each row represents an example where the same Identifier Domain has issued more than one
identifier for the same data object. The ‘Identifier’ is arbitrarily selected from among the two or more competing identifiers (usually the smallest number or alphabetically first of the alternate identifiers); and
the ‘RelatedIdentifier’ represents the alternate identifier. In cases where more than two identifiers have been issued by the same Identifier Domain for the same data object,
the same ‘Identifier’ is repeated for each additional Identifier. For example, if the same Identifier Domain issued four identifiers for the same data object, then there
will be three rows in the dataset, with the first of the four identifiers repeated in the ‘Identifier’ column, and each of the three additional identifiers represented in
the ‘RelatedIdentifier’ column for each of the three rows. For this type of dataset, ObjectClass, IdentifierDomain, IdentifierDomainUUID, DereferencePrefix, and DereferenceSuffex are always
identical for both the primary columns and the corresponding related columns; and the RelationshipType is always ‘sameAs’.
Write Services
Create Identifier Domain
Currently, new Identifier Domains can only be created through the website, here. Before entering a new Identifier Domain, please use the search feature to see if it already exists.
Once you're sure it's not already in the system, go ahead and add it using the form on the Domains page. The fields for generating a new Identifier Domain record are as follows:
- Domain Name (IdentifierDomain) [The name of the Identifier Domain]
- Abbreviation (Abbreviation) [A standard abbreviation for the Identifier Domain]
- Identifier ClassIdentifierClass [The Class of identifier (UUID, Handle, etc.) within this Identifier Domain; use Controlled Vocabulary]
- Description (IdentifierDomainDescription) [A brief textual description of the Identifier Domain]
- Logo (IdentifierDomainLogo) [A link to an image file representing the logo for the Identifier Domain]
- Dereference Service Protocol (PreferredDereferenceServiceProtocol) [The Internet Protocol used by the preferred Dereference Service; use Controlled Vocabulary]
- Preferred Dereference Service (PreferredDereferenceService) [The name of the preferred Dereference Service]
- Preferred Dereference Prefix (PreferredDereferencePrefix) [Prefix text that is prepended to an identifier to make it actionable within the indicated protocol, via the preferred Dereference Service]
- PreferredDereferenceSuffix (PreferredDereferenceSuffix) [Suffix text that is appended to an identifier to make it actionable within the indicated protocol, via the preferred Dereference Service]
Note: AgentUUID and DereferenceServiceDescription are not yet supported in the upload web page. These fields will be supported at a later time. DereferenceServiceLogo is assumed to be the same as IdentifierDomainLogo, but this will also be improved to allow different logos at a later time.
Once inserted, the new Identifier Domain will be generated and IdentifierDomainUUID will be assigned. Use this IdentifierDomainUUID when submitting bulk uploads, as described below. Also, a DereferenceServiceUUID will be generated for any new Dereference Service created as part of this import.
Bulk Upload Identifiers
Anyone can now upload bulk content to BioGUID.org! For the moment, this can only be done through the website, but we plan to add a service
that will allow batch files to be uploaded programmatically. To upload a batch of identifiers, create a CSV file encoded as UTF-8, with all values enclosed in double-quote characters ("), using the
following column headers (and corresponding values):
- ObjectClass* [The Class of Object (Occurrence, Reference, Taxon, Agent, etc.) represented by the identifier; use Controlled Vocabulary]
- IdentifierDomainUUID* [The BioGUID identifier for the Identifier Domain]
- DereferencePrefix [Prefix text that is prepended to the Identifer to make it actionable]
- Identifier*
- DereferenceSuffix [Suffix text that is appended to the Identifier to make it actionable]
- RelatedObjectClass [The Class of Object (Occurrence, Reference, Taxon, Agent, etc.) represented by the related identifier; use Controlled Vocabulary]
- RelatedIdentifierDomainUUID [The BioGUID identifier for the Related Identifier Domain]
- RelatedDereferencePrefix [Prefix text that is prepended to the RelatedIdentifer to make it actionable]
- RelatedIdentifier [The identifier for the related record]
- RelatedDereferenceSuffix [The identifier for the related record]
- RelationshipType
*Items with an asterisk are required; all others are optional.
There should be NO BLANK LINES anywhere in the file (including the last line)!
Download a sample file here.
Note: We have not yet tested this on very large CSV files. It should be fine for a file containing a few hundred thousand rows, but if you have a dataset containing millions of identifiers, by all means give it a try — but we haven't yet been able to test it with large batch files. Also, his service tracks details about each identifier in terms of whether it has already been imported, or whether there are problems with the records as submitted. In the near future, these reports will be made available as soon as the batch of submitted records is processed.
The time it takes to process depends on the number of records, and the percentage of records that represent new objects (as opposed to identifiers that can be mapped to existing objects). It can range from a few seconds to a few hours. As we gather more data on uploaded datasets, we'll be able to make more accurate predictions about how long each batch import will take to fully process. You don't need to remain on the page for the entire process time; you only need to wait until the zip file has been transferred to the BioGUID server. This depends on your internet connection speed and the size of the zip file, but it shouldn't require more than a minute or two, unless the file is very large.
Logos
Small Logo
Large Logo
|
All content within the BioGUID.org site is available under the Creative Commons Zero license (Public Domain). |