A TypeScript package that provides a convenient wrapper around the arXiv API, enabling easy querying and parsing of arXiv papers.
npm install arxiv-api-wrapper
import { getArxivEntries, getArxivEntriesById } from 'arxiv-api-wrapper';
// Search for papers
const result = await getArxivEntries({
search: {
title: ['quantum computing'],
author: ['John Doe'],
},
maxResults: 10,
sortBy: 'submittedDate',
sortOrder: 'descending',
});
console.log(`Found ${result.feed.totalResults} papers`);
result.entries.forEach(entry => {
console.log(`${entry.arxivId}: ${entry.title}`);
});
// Or fetch specific papers by ID
const papers = await getArxivEntriesById(['2101.01234', '2101.05678']);
The package also supports the arXiv OAI-PMH endpoint (https://oaipmh.arxiv.org/oai), which is useful for metadata harvesting and bulk access. See the arXiv OAI help and the OAI-PMH v2.0 protocol for details.
import {
oaiIdentify,
oaiListRecords,
oaiListRecordsAsyncIterator,
oaiGetRecord,
oaiListSets,
oaiListIdentifiers,
oaiListMetadataFormats,
} from 'arxiv-api-wrapper';
// Repository info
const identify = await oaiIdentify();
console.log(identify.repositoryName, identify.protocolVersion);
// One page of records (e.g. Dublin Core)
const result = await oaiListRecords('oai_dc', {
from: '2024-01-01',
until: '2024-01-31',
set: 'math:math:LO', // optional: restrict to a set
rateLimit: { tokensPerInterval: 1, intervalMs: 1000 },
});
result.records.forEach((rec) => {
console.log(rec.header.identifier, rec.metadata);
});
if (result.resumptionToken) {
// Fetch next page with result.resumptionToken.value
}
// Single record by identifier (full or short form)
const record = await oaiGetRecord('cs/0112017', 'oai_dc');
For an intermediate option between manual page-by-page pagination and *All helpers, use async iterators:
for await (const rec of oaiListRecordsAsyncIterator('oai_dc', {
from: '2024-01-01',
until: '2024-01-02',
maxRecords: 50,
})) {
console.log(rec.header.identifier);
}
If you omit maxRecords (or maxHeaders / maxSets on the corresponding iterators), iteration continues until the API is exhausted.
The oaiListRecordsAll / oaiListIdentifiersAll / oaiListSetsAll helpers are convenience wrappers that collect from the corresponding async iterators.
Async iterators keep continuation token metadata in memory while paging. If a token includes an expirationDate and that time has passed, iterators fail fast locally with OaiError (code: 'badResumptionToken') before attempting another request.
All OAI functions accept optional timeoutMs, retries, userAgent, and rateLimit (same as the Atom API). Other OAI errors (e.g. idDoesNotExist) are thrown as OaiError with a code and messageText. noRecordsMatch is treated as “no results”: the wrapper returns an empty list (empty records or headers) instead of throwing, so you always get a normal result shape from oaiListRecords and oaiListIdentifiers.
Differences from OAI-PMH: The underlying arXiv OAI server returns an error response when a list request matches no records. This wrapper normalises that to an empty list so callers can assume a consistent result type without handling noRecordsMatch as an exception.
For complete API documentation with detailed type information and examples, see the generated API documentation.
getArxivEntriesById(ids: string[], options?): Promise<ArxivQueryResult>Simpler function to fetch arXiv papers by their IDs using the id_list API mode.
Parameters:
ids: string[] - Array of arXiv paper IDs (e.g., ['2101.01234', '2101.05678'])options?: object - Optional request configuration
rateLimit?: { tokensPerInterval: number, intervalMs: number } - Rate limit configurationretries?: number - Number of retry attempts (default: 3)timeoutMs?: number - Request timeout in milliseconds (default: 10000)userAgent?: string - Custom User-Agent headerReturns: Same as getArxivEntries - see return type below.
getArxivEntries(options: ArxivQueryOptions): Promise<ArxivQueryResult>Main function to query the arXiv API with search filters or ID lists.
Options:
idList?: string[] - List of arXiv IDs to fetch (e.g., ['2101.01234', '2101.05678'])search?: ArxivSearchFilters - Search filters (when used with idList, filters the entries from idList to only return those matching the search query)start?: number - Pagination offset (0-based)maxResults?: number - Maximum number of results (≤ 300)sortBy?: 'relevance' | 'lastUpdatedDate' | 'submittedDate' - Sort fieldsortOrder?: 'ascending' | 'descending' - Sort directiontimeoutMs?: number - Request timeout in milliseconds (default: 10000)retries?: number - Number of retry attempts (default: 3)rateLimit?: { tokensPerInterval: number, intervalMs: number } - Rate limit configurationuserAgent?: string - Custom User-Agent headerSearch Filters:
title?: string[] - Search in titlesauthor?: string[] - Search by author namesabstract?: string[] - Search in abstractscategory?: string[] - Filter by arXiv categoriessubmittedDateRange?: { from: string, to: string } - Date range filter (YYYYMMDDTTTT format)or?: ArxivSearchFilters[] - OR group of filtersandNot?: ArxivSearchFilters - Negated filter (ANDNOT)Returns:
{
feed: {
id: string;
updated: string;
title: string;
link: string;
totalResults: number;
startIndex: number;
itemsPerPage: number;
};
entries: Array<{
id: string;
arxivId: string;
title: string;
summary: string;
published: string;
updated: string;
authors: Array<{ name: string; affiliation?: string }>;
categories: string[];
primaryCategory?: string;
links: Array<{ href: string; rel?: string; type?: string; title?: string }>;
doi?: string;
journalRef?: string;
comment?: string;
}>;
}
const result = await getArxivEntries({
search: {
title: ['machine learning'],
author: ['Geoffrey Hinton'],
},
maxResults: 5,
});
Using the simpler getArxivEntriesById function:
const result = await getArxivEntriesById(['2101.01234', '2101.05678']);
Or using getArxivEntries:
const result = await getArxivEntries({
idList: ['2101.01234', '2101.05678'],
});
const result = await getArxivEntries({
search: {
or: [
{ title: ['quantum'] },
{ abstract: ['quantum'] },
],
submittedDateRange: {
from: '202301010600',
to: '202401010600',
},
},
sortBy: 'submittedDate',
sortOrder: 'descending',
});
const result = await getArxivEntriesById(
['2101.01234', '2101.05678'],
{
rateLimit: {
tokensPerInterval: 1,
intervalMs: 3000, // 1 request per 3 seconds
},
timeoutMs: 15000,
}
);
const result = await getArxivEntries({
search: { title: ['neural networks'] },
rateLimit: {
tokensPerInterval: 1,
intervalMs: 3000, // 1 request per 3 seconds
},
});
To generate browsable API documentation from the source code:
npm run docs:generate
This will create HTML documentation in the docs/ directory. You can then view it locally:
npm run docs:serve
The generated documentation includes:
All exported functions and types include JSDoc comments for enhanced IDE IntelliSense support. Hover over any exported symbol in your IDE to see inline documentation.
All types are exported from the package:
import type {
ArxivQueryOptions,
ArxivQueryResult,
ArxivSearchFilters,
ArxivEntry,
ArxivFeedMeta,
ArxivAuthor,
ArxivLink,
ArxivSortBy,
ArxivSortOrder,
ArxivRateLimitConfig,
ArxivDateRange,
// OAI-PMH types
OaiIdentifyResponse,
OaiRecord,
OaiHeader,
OaiSet,
OaiMetadataFormat,
OaiResumptionToken,
OaiListRecordsResult,
OaiListIdentifiersResult,
OaiListSetsResult,
OaiRequestOptions,
OaiListOptions,
OaiErrorCode,
OaiError
} from 'arxiv-api-wrapper';
ISC
Vilhelm Agdur