The Architecture Reference

Api rest · APIs & Communication · Intermediate

Pagination and Filtering

Design collections to evolve: wrap arrays in an object, paginate with an opaque next link, filter with a standard expression language, and never smuggle SQL in a query parameter.

Api rest Intermediate ⏱ 5 min read Complete

📚 Analogy

A good collection endpoint is a library catalogue, not a forklift dumping every book at your feet. You ask for a shelf at a time (pagination), narrow by subject (filtering), and the catalogue hands you a card pointing to the next drawer (the next link) — without you needing to know how the stacks are organised behind the desk.

Design collections to evolve

The single most important collection decision is made on day one. Returning a raw array from GET /attendees cannot evolve — the moment you need pagination you must convert the array into an object to add a next link, and that is a breaking change. So nest the array inside an object from the start:

{
  "value": [ { "id": 1, "displayName": "Jim" } ],
  "@nextLink": "/attendees?page=2"
}

Pagination returns a partial result plus instructions for fetching the next set. Keep the @nextLink opaque — consumers follow it, they do not construct it — so you can change the underlying paging mechanism freely.

Standard list navigation

The Cookbook generalises this into a fixed set of navigation actions for any list: List, First, Previous, Next, Last, Select, Exit, Home. Returned pages include only the links that apply — the first page has no previous, and last is often omitted because it is too costly for long lists.

graph LR
List["list / collection"] --> P1["Page 1<br/>first, next, exit"]
P1 --> P2["Page 2<br/>first, previous, next"]
P2 --> Pn["Last page<br/>first, previous, exit"]
P1 -. "select item" .-> Item["/customers/aq1sw2de3"]

Assume thousands of members

Design as if every collection holds thousands of rows: compute pages lazily, return a summary plus a ‘select’ detail link rather than full records, set a default page size (~50) via client preferences, and avoid arbitrary ‘jump to page N’ — it forces expensive offset scans.

Filtering

Beyond paging, large collections need filtering. Two complementary approaches appear across the books:

  • A standard expression language. The Microsoft REST API Guidelines borrow from the OData standard: GET /attendees?$filter=displayName eq 'Jim'. You need not implement all of it up front, but designing to the standard lets the API grow without breaking consumers.
  • Simple contains-AND via the query string. Implement ?name=value where = means contains/in and & means AND?ID=3e&NAME=Mi returns rows matching both. This is often the only information-retrieval query language (IRQL) an API needs. Richer needs (more operators) call for Lucene or SQL exposed deliberately as a media type, not invented ad hoc.

Status codes for queries

Apply normal HTTP semantics so consumers can write consistent handling:

  • A well-formed filter with no matches → 200 OK with an empty collection.
  • A single-resource lookup whose target does not exist → 404.
  • An invalid client request (e.g. a non-existent property) → 4xx.
  • A valid request the backend cannot fulfil (timeout) → 5xx.
graph TD
R["Query request"] --> V{"Request well-formed?"}
V -->|"no, bad property"| C4["4xx Bad Request"]
V -->|"yes"| K{"Kind?"}
K -->|"filter, no matches"| Empty["200 OK, empty collection"]
K -->|"single resource missing"| NF["404 Not Found"]
K -->|"backend timeout"| C5["5xx Server Error"]

Never smuggle a query into ?q=

Accepting a full query string in ?q= or ?query= is an anti-pattern: it hits URL length limits, mangles reserved characters and encoding, invites SQL injection, and tightly couples consumers to your engine. Expose named queries, a standard filter expression, or a query media type instead — and don’t invent a custom query syntax.

Turn expensive queries into cacheable resources

For complex or replayed queries, create the query as a resource via PUT, then GET its URL to execute and return results — which makes the result cacheable and avoids long query strings. See caching for how ETags and Cache-Control apply.

See also

When to use it — and when not

✅ Reach for it when

  • A collection can grow large and must be returned in manageable chunks.
  • Consumers need to narrow results by field values without fetching everything.
  • You want list navigation that survives growth to thousands of members.

⛔ Think twice when

  • A tiny, bounded collection that will never need paging or filtering.
  • You are tempted to expose a raw query engine or accept SQL in ?q= (a security and coupling trap).

Check your understanding

Score: 0 / 4

1. Why wrap a collection in an object instead of returning a bare array?

Converting a top-level array to an object later breaks consumers; nesting the array in an object from the start (value + @nextLink) keeps it evolvable.

2. What does the @nextLink in a paginated response represent?

Pagination returns a partial result plus an opaque instruction (a next link) for fetching the following set.

3. How should you implement a simple contains-AND filter via the query string?

The Cookbook implements '=' as contains and '&' as AND — often the only IRQL needed; richer needs go to Lucene/SQL exposed as a media type, not smuggled into ?q=.

4. When a well-formed filter matches no records, what status is correct?

A valid filter with no matches is 200 + empty array; a single-resource lookup that does not exist is 404; an invalid request is 4xx.

Comments

Sign in with GitHub to join the discussion.