API Reference

BibTeX Validation and Enrichment Script

This script validates BibTeX entries by: 1. Checking DOI information via Crossref API 2. Checking arXiv information via arXiv API 3. Searching Google Scholar for missing information (optional) 4. Comparing and updating fields 5. Generating a validation report

class validate_bibtex.BibEntry(entry_type: str, citekey: str, fields: Dict[str, str])[source]

Bases: object

citekey: str
entry_type: str
fields: Dict[str, str]
class validate_bibtex.BibTeXValidator(bib_file: str, output_file: str | None = None, update_bib: bool = False, delay: float = 1.0)[source]

Bases: object

Validates and enriches BibTeX entries

ARXIV_DOI_PATTERN = re.compile('10\\.48550/ARXIV\\.(\\d{4}\\.\\d{4,5})', re.IGNORECASE)
ARXIV_NOTE_PATTERN = re.compile('(?i)arxiv:\\s*(\\d{4}\\.\\d{4,5}(?:v\\d+)?)', re.IGNORECASE)
FIELD_SCHEMA = {'common': {'core': ['author', 'editor', 'title', 'year', 'month', 'note', 'key', 'crossref'], 'extended': ['doi', 'url', 'urldate', 'eprint', 'archiveprefix', 'primaryclass', 'isbn', 'issn', 'language', 'keywords', 'file']}, 'strongly_recommended': {'article': ['volume', 'pages'], 'inbook': ['chapter', 'pages'], 'incollection': ['pages', 'chapter'], 'inproceedings': ['pages'], 'techreport': ['number']}, 'types': {'article': {'extended': ['doi', 'url', 'urldate', 'issn'], 'optional': ['volume', 'number', 'pages', 'month', 'note'], 'required': ['author', 'title', 'journal', 'year']}, 'book': {'extended': ['doi', 'url', 'urldate', 'isbn'], 'optional': ['volume', 'number', 'series', 'address', 'edition', 'month', 'note'], 'required': ['title', 'publisher', 'year'], 'required_any': ['author', 'editor']}, 'booklet': {'extended': ['doi', 'url', 'urldate'], 'optional': ['author', 'howpublished', 'address', 'month', 'year', 'note'], 'required': ['title']}, 'inbook': {'extended': ['doi', 'url', 'urldate', 'isbn'], 'optional': ['volume', 'number', 'series', 'address', 'edition', 'month', 'note'], 'required': ['title', 'publisher', 'year'], 'required_any': ['author', 'editor'], 'required_any_2': ['chapter', 'pages']}, 'incollection': {'extended': ['doi', 'url', 'urldate', 'isbn'], 'optional': ['editor', 'volume', 'number', 'series', 'type', 'chapter', 'pages', 'address', 'edition', 'month', 'note'], 'required': ['author', 'title', 'booktitle', 'publisher', 'year']}, 'inproceedings': {'extended': ['doi', 'url', 'urldate', 'isbn'], 'optional': ['editor', 'volume', 'number', 'series', 'pages', 'publisher', 'organization', 'address', 'month', 'note'], 'required': ['author', 'title', 'booktitle', 'year']}, 'manual': {'extended': ['doi', 'url', 'urldate'], 'optional': ['author', 'organization', 'address', 'edition', 'month', 'year', 'note'], 'required': ['title']}, 'mastersthesis': {'extended': ['doi', 'url', 'urldate'], 'optional': ['type', 'address', 'month', 'note'], 'required': ['author', 'title', 'school', 'year']}, 'misc': {'extended': ['doi', 'url', 'urldate', 'eprint', 'archiveprefix', 'primaryclass'], 'optional': ['author', 'title', 'howpublished', 'month', 'year', 'note'], 'required': []}, 'phdthesis': {'extended': ['doi', 'url', 'urldate'], 'optional': ['type', 'address', 'month', 'note'], 'required': ['author', 'title', 'school', 'year']}, 'proceedings': {'extended': ['doi', 'url', 'urldate', 'isbn'], 'optional': ['editor', 'volume', 'number', 'series', 'publisher', 'organization', 'address', 'month', 'note'], 'required': ['title', 'year']}, 'techreport': {'extended': ['doi', 'url', 'urldate'], 'optional': ['type', 'number', 'address', 'month', 'note'], 'required': ['author', 'title', 'institution', 'year']}, 'unpublished': {'extended': ['doi', 'url', 'urldate', 'eprint', 'archiveprefix', 'primaryclass'], 'optional': ['month', 'year'], 'required': ['author', 'title', 'note']}}}
compare_fields(bib_entry: Dict, api_data: Dict, source: str = 'crossref') Dict[source]

Compare BibTeX entry with API data and identify conflicts/updates/identical/different

Returns:

Dictionary with ‘updated’, ‘conflicts’, ‘identical’, ‘different’, ‘sources’ keys

extract_arxiv_id(entry: Dict) str | None[source]

Extract arXiv ID from BibTeX entry

Checks: 1. note field: “arXiv: YYYY.NNNNN” or “arXiv: YYYY.NNNNNvN” 2. doi field: “10.48550/ARXIV.YYYY.NNNNN” 3. eprint field: “YYYY.NNNNN”

Returns:

Normalized arXiv ID (YYYY.NNNNN format, version suffix removed) or None

extract_string_from_api_value(api_value) str[source]

Extract string from API value (handles list format)

fetch_arxiv_data(arxiv_id: str) Dict | None[source]

Fetch metadata from arXiv API Respects strict rate limiting: 1 req / 3s

fetch_crossref_data(doi: str) Dict | None[source]

Fetch metadata from Crossref API

Parameters:

doi – DOI string

Returns:

Dictionary with metadata or None if not found

fetch_datacite_data(doi: str) Dict | None[source]

Fetch metadata from DataCite API

Parameters:

doi – DOI string

Returns:

Dictionary with metadata or None if not found

fetch_dblp_data(title: str, author: str | None = None) Dict | None[source]

Fetch metadata from DBLP API

Parameters:
  • title – Paper title

  • author – First author name (optional)

Returns:

Dictionary with metadata or None if not found

fetch_openalex_data(doi: str | None = None, title: str | None = None) Dict | None[source]

Fetch metadata from OpenAlex API

Parameters:
  • doi – DOI string

  • title – Title string

Returns:

Dictionary with metadata or None if not found

fetch_pubmed_data(pmid: str) Dict | None[source]

Fetch metadata from PubMed API via Entrez

Parameters:

pmid – PubMed ID

Returns:

Dictionary with metadata or None if not found

fetch_semantic_scholar_data(title: str, author: str | None = None) Dict | None[source]

Fetch metadata from Semantic Scholar API

Parameters:
  • title – Paper title

  • author – First author name (optional)

Returns:

Dictionary with metadata or None if not found

fetch_zenodo_data(doi: str) Dict | None[source]

Fetch metadata from Zenodo API

Parameters:

doi – DOI string

Returns:

Dictionary with metadata or None if not found

filter_entry_fields(entry: Dict) Dict[source]

Filter entry fields to keep only allowed fields for the entry type

format_author_list(authors: List[str]) str[source]

Convert author list to BibTeX format

format_crossref_author_list(authors: List[Dict]) str[source]

Convert Crossref author list to BibTeX format

format_date(date_parts: List[List[int]]) str | None[source]

Extract year from date-parts

generate_report(output_file: str | None = None) str[source]

Generate a validation report

map_api_type_to_bibtex(api_type: str, source: str = 'crossref') str[source]

Map API entry type to BibTeX entry type

normalize_doi(doi: str) str[source]

Normalize DOI format

normalize_entry(entry: BibEntry) BibEntry[source]

Normalize entry based on BibTeX mode policies. - Map BibLaTeX fields to BibTeX - Normalize aliases (conference -> inproceedings) - Normalize DOI and URL - Apply Type Promotion Rules (ArXiv -> Inproceedings/Article)

normalize_string_for_comparison(s: str, field_name: str = '') str[source]

Normalize string for comparison according to BibTeX conventions

Normalizations: - Remove LaTeX braces { } - Remove leading/trailing whitespace - Decode HTML entities (& -> &) - For title: lowercase for comparison - For ISSN: remove hyphens and take first if multiple (0378-7788, 1476-4687 -> 03787788) - For DOI: lowercase for comparison - For DOI: lowercase for comparison

reorder_fields()[source]

Sort fields in all entries according to PREFERRED_FIELD_ORDER

save_updated_bib(force=False)[source]

Save updated BibTeX file

search_google_scholar(query: str) Dict | None[source]

Search Google Scholar for publication information

Parameters:

query – Search query (title + first author)

Returns:

Dictionary with metadata or None

validate_all(show_progress: bool = True, max_workers: int = 30) List[ValidationResult][source]

Validate all entries in the BibTeX database

Parameters:
  • show_progress – If True, show progress indicators

  • max_workers – Number of threads for parallel execution

validate_entry(entry: Dict, index: int = 0, total: int = 0) ValidationResult[source]

Validate a single BibTeX entry

validate_entry_schema(entry: BibEntry) List[LintMessage][source]

Validate entry against schema rules.

class validate_bibtex.LintMessage(level: str, code: str, message: str, field: str | None = None)[source]

Bases: object

code: str
field: str | None = None
level: str
message: str
class validate_bibtex.ValidationResult(entry_key: str, entry_type: str = 'misc', has_doi: bool = False, doi_valid: bool = False, has_arxiv: bool = False, arxiv_valid: bool = False, arxiv_id: str | None = None, normalized_entry: ~validate_bibtex.BibEntry | None = None, lint_messages: ~typing.List[~validate_bibtex.LintMessage] = <factory>, fields_missing: ~typing.List[str] = <factory>, fields_updated: ~typing.Dict[str, ~typing.Tuple[str, str]] = <factory>, fields_conflict: ~typing.Dict[str, ~typing.Tuple[str, str]] = <factory>, fields_identical: ~typing.Dict[str, str] = <factory>, fields_different: ~typing.Dict[str, ~typing.Tuple[str, str]] = <factory>, field_sources: ~typing.Dict[str, str] = <factory>, all_sources_data: ~typing.Dict[str, ~typing.Dict] = <factory>, field_source_options: ~typing.Dict[str, ~typing.List[str]] = <factory>, original_values: ~typing.Dict[str, str] = <factory>, errors: ~typing.List[str] = <factory>, warnings: ~typing.List[str] = <factory>)[source]

Bases: object

Stores validation results for a single entry

all_sources_data: Dict[str, Dict]
arxiv_id: str | None = None
arxiv_valid: bool = False
doi_valid: bool = False
entry_key: str
entry_type: str = 'misc'
errors: List[str]
field_source_options: Dict[str, List[str]]
field_sources: Dict[str, str]
fields_conflict: Dict[str, Tuple[str, str]]
fields_different: Dict[str, Tuple[str, str]]
fields_identical: Dict[str, str]
fields_missing: List[str]
fields_updated: Dict[str, Tuple[str, str]]
has_arxiv: bool = False
has_doi: bool = False
lint_messages: List[LintMessage]
normalized_entry: BibEntry | None = None
original_values: Dict[str, str]
warnings: List[str]
validate_bibtex.create_gui_app(validator: BibTeXValidator, results: List[ValidationResult]) FastAPI[source]

Create FastAPI application for BibTeX validator GUI

Parameters:
  • validator – BibTeXValidator instance

  • results – List of ValidationResult objects

Returns:

FastAPI app instance

validate_bibtex.gui_app_factory()[source]

Factory function for uvicorn reload

validate_bibtex.main()[source]

Main function