arxiv-api-wrapper - v2.1.2
    Preparing search index...

    arxiv-api-wrapper - v2.1.2

    arxiv-api-wrapper

    A TypeScript package that provides a convenient wrapper around the arXiv API, enabling easy querying and parsing of arXiv papers.

    npm install arxiv-api-wrapper
    
    import { getArxivEntries, getArxivEntriesById } from 'arxiv-api-wrapper';

    // Search for papers
    const result = await getArxivEntries({
    search: {
    title: ['quantum computing'],
    author: ['John Doe'],
    },
    maxResults: 10,
    sortBy: 'submittedDate',
    sortOrder: 'descending',
    });

    console.log(`Found ${result.feed.totalResults} papers`);
    result.entries.forEach(entry => {
    console.log(`${entry.arxivId}: ${entry.title}`);
    });

    // Or fetch specific papers by ID
    const papers = await getArxivEntriesById(['2101.01234', '2101.05678']);
    • Type-safe: Full TypeScript support with comprehensive type definitions
    • Flexible Search: Support for complex queries with multiple filters, OR groups, and negation
    • Rate Limiting: Built-in token bucket rate limiter to respect arXiv API guidelines
    • Retry Logic: Automatic retries with exponential backoff for transient failures
    • Pagination: Support for paginated results with configurable page size
    • Sorting: Multiple sort options (relevance, submission date, last updated)
    • OAI-PMH: Support for the arXiv Open Archives Initiative interface (Identify, ListSets, GetRecord, ListRecords, ListIdentifiers, ListMetadataFormats)

    The package also supports the arXiv OAI-PMH endpoint (https://oaipmh.arxiv.org/oai), which is useful for metadata harvesting and bulk access. See the arXiv OAI help and the OAI-PMH v2.0 protocol for details.

    import {
    oaiIdentify,
    oaiListRecords,
    oaiListRecordsAsyncIterator,
    oaiGetRecord,
    oaiListSets,
    oaiListIdentifiers,
    oaiListMetadataFormats,
    } from 'arxiv-api-wrapper';

    // Repository info
    const identify = await oaiIdentify();
    console.log(identify.repositoryName, identify.protocolVersion);

    // One page of records (e.g. Dublin Core)
    const result = await oaiListRecords('oai_dc', {
    from: '2024-01-01',
    until: '2024-01-31',
    set: 'math:math:LO', // optional: restrict to a set
    rateLimit: { tokensPerInterval: 1, intervalMs: 1000 },
    });
    result.records.forEach((rec) => {
    console.log(rec.header.identifier, rec.metadata);
    });
    if (result.resumptionToken) {
    // Fetch next page with result.resumptionToken.value
    }

    // Single record by identifier (full or short form)
    const record = await oaiGetRecord('cs/0112017', 'oai_dc');

    For an intermediate option between manual page-by-page pagination and *All helpers, use async iterators:

    for await (const rec of oaiListRecordsAsyncIterator('oai_dc', {
    from: '2024-01-01',
    until: '2024-01-02',
    maxRecords: 50,
    })) {
    console.log(rec.header.identifier);
    }

    If you omit maxRecords (or maxHeaders / maxSets on the corresponding iterators), iteration continues until the API is exhausted.

    The oaiListRecordsAll / oaiListIdentifiersAll / oaiListSetsAll helpers are convenience wrappers that collect from the corresponding async iterators.

    Async iterators keep continuation token metadata in memory while paging. If a token includes an expirationDate and that time has passed, iterators fail fast locally with OaiError (code: 'badResumptionToken') before attempting another request.

    All OAI functions accept optional timeoutMs, retries, userAgent, and rateLimit (same as the Atom API). Other OAI errors (e.g. idDoesNotExist) are thrown as OaiError with a code and messageText. noRecordsMatch is treated as “no results”: the wrapper returns an empty list (empty records or headers) instead of throwing, so you always get a normal result shape from oaiListRecords and oaiListIdentifiers.

    Differences from OAI-PMH: The underlying arXiv OAI server returns an error response when a list request matches no records. This wrapper normalises that to an empty list so callers can assume a consistent result type without handling noRecordsMatch as an exception.

    For complete API documentation with detailed type information and examples, see the generated API documentation.

    Simpler function to fetch arXiv papers by their IDs using the id_list API mode.

    Parameters:

    • ids: string[] - Array of arXiv paper IDs (e.g., ['2101.01234', '2101.05678'])
    • options?: object - Optional request configuration
      • rateLimit?: { tokensPerInterval: number, intervalMs: number } - Rate limit configuration
      • retries?: number - Number of retry attempts (default: 3)
      • timeoutMs?: number - Request timeout in milliseconds (default: 10000)
      • userAgent?: string - Custom User-Agent header

    Returns: Same as getArxivEntries - see return type below.

    Main function to query the arXiv API with search filters or ID lists.

    Options:

    • idList?: string[] - List of arXiv IDs to fetch (e.g., ['2101.01234', '2101.05678'])
    • search?: ArxivSearchFilters - Search filters (when used with idList, filters the entries from idList to only return those matching the search query)
    • start?: number - Pagination offset (0-based)
    • maxResults?: number - Maximum number of results (≤ 300)
    • sortBy?: 'relevance' | 'lastUpdatedDate' | 'submittedDate' - Sort field
    • sortOrder?: 'ascending' | 'descending' - Sort direction
    • timeoutMs?: number - Request timeout in milliseconds (default: 10000)
    • retries?: number - Number of retry attempts (default: 3)
    • rateLimit?: { tokensPerInterval: number, intervalMs: number } - Rate limit configuration
    • userAgent?: string - Custom User-Agent header

    Search Filters:

    • title?: string[] - Search in titles
    • author?: string[] - Search by author names
    • abstract?: string[] - Search in abstracts
    • category?: string[] - Filter by arXiv categories
    • submittedDateRange?: { from: string, to: string } - Date range filter (YYYYMMDDTTTT format)
    • or?: ArxivSearchFilters[] - OR group of filters
    • andNot?: ArxivSearchFilters - Negated filter (ANDNOT)

    Returns:

    {
    feed: {
    id: string;
    updated: string;
    title: string;
    link: string;
    totalResults: number;
    startIndex: number;
    itemsPerPage: number;
    };
    entries: Array<{
    id: string;
    arxivId: string;
    title: string;
    summary: string;
    published: string;
    updated: string;
    authors: Array<{ name: string; affiliation?: string }>;
    categories: string[];
    primaryCategory?: string;
    links: Array<{ href: string; rel?: string; type?: string; title?: string }>;
    doi?: string;
    journalRef?: string;
    comment?: string;
    }>;
    }
    const result = await getArxivEntries({
    search: {
    title: ['machine learning'],
    author: ['Geoffrey Hinton'],
    },
    maxResults: 5,
    });

    Using the simpler getArxivEntriesById function:

    const result = await getArxivEntriesById(['2101.01234', '2101.05678']);
    

    Or using getArxivEntries:

    const result = await getArxivEntries({
    idList: ['2101.01234', '2101.05678'],
    });
    const result = await getArxivEntries({
    search: {
    or: [
    { title: ['quantum'] },
    { abstract: ['quantum'] },
    ],
    submittedDateRange: {
    from: '202301010600',
    to: '202401010600',
    },
    },
    sortBy: 'submittedDate',
    sortOrder: 'descending',
    });
    const result = await getArxivEntriesById(
    ['2101.01234', '2101.05678'],
    {
    rateLimit: {
    tokensPerInterval: 1,
    intervalMs: 3000, // 1 request per 3 seconds
    },
    timeoutMs: 15000,
    }
    );
    const result = await getArxivEntries({
    search: { title: ['neural networks'] },
    rateLimit: {
    tokensPerInterval: 1,
    intervalMs: 3000, // 1 request per 3 seconds
    },
    });

    To generate browsable API documentation from the source code:

    npm run docs:generate
    

    This will create HTML documentation in the docs/ directory. You can then view it locally:

    npm run docs:serve
    

    The generated documentation includes:

    • Complete API reference for all exported functions and types
    • Detailed parameter descriptions and examples
    • Type information and relationships
    • Search functionality

    All exported functions and types include JSDoc comments for enhanced IDE IntelliSense support. Hover over any exported symbol in your IDE to see inline documentation.

    All types are exported from the package:

    import type {
    ArxivQueryOptions,
    ArxivQueryResult,
    ArxivSearchFilters,
    ArxivEntry,
    ArxivFeedMeta,
    ArxivAuthor,
    ArxivLink,
    ArxivSortBy,
    ArxivSortOrder,
    ArxivRateLimitConfig,
    ArxivDateRange,
    // OAI-PMH types
    OaiIdentifyResponse,
    OaiRecord,
    OaiHeader,
    OaiSet,
    OaiMetadataFormat,
    OaiResumptionToken,
    OaiListRecordsResult,
    OaiListIdentifiersResult,
    OaiListSetsResult,
    OaiRequestOptions,
    OaiListOptions,
    OaiErrorCode,
    OaiError
    } from 'arxiv-api-wrapper';

    ISC

    Vilhelm Agdur

    https://github.com/vagdur/arxiv-api-wrapper