· engineering · 8 min read
Enforce Data Integrity: Master your JSON with JSON Schema
Struggling with inconsistent JSON data? JSON Schema offers a powerful, standardized approach to validate, structure, and streamline your data management.
Table of Contents
JSON, it’s everywhere, the popularity of this flexible, ubiquitous, and somewhat verbose data exchange format shows little sign of slowing. But with simplicity and flexibility comes risk - how do we know that the JSON we’re exchanging has the right “shape”, and how do we communicate and validate the “shape” of this data?
The answer lies with JSON Schema.
What is JSON Schema?
JSON Schema is a specification for defining what JSON can look like, a blueprint, describing the shape of your data, and crucially, the relationships between it.
Its simplicity is its key - it is itself “just” JSON, and libraries and tools supporting it’s use are available in many languages.
A quick glance at some of the implementations available for JSON Schema, shows immediately just how portable it is.
And it’s easy to get started, this basic example defines a user object, with a required name, and an optional age (integer with a minimum value of 18).
This simple JSON, defining the shape of our user data, can be used to validate user input - whether in the application, or the infrastructure, for example with AWS API Gateway Models, or MongoDb.
It can also be used for creating TypeScript types, generating forms and sharing api documentation dynamically.
By describing our data this way we get improved data consistency, better error handling, faster, cheaper rejections of incorrect data, and quicker, easier implementation, because the docs you share, are the docs you use.
An aside: Zod et al
Zod is a very popular, powerful JSON validation library for TypeScript.
JSON Schema, offers two main advantages over Zod, and similar libraries however:
- It is a portable, established standard, it can be in non TypeScript environments, and form part of your living documentation.
- It can express much more complex relationships and restrictions on and between data.
However, it doesn’t need to be an either/or situation - in many cases you can of course convert JSON Schema to Zod and vice-versa.
Complex schemas: Limits, Lists and patterns
The example below is slightly more complex, we have:
- default values for AutoScaleLimit and LogFileRetention
- a regular expression and string length limits for LogFileName.
- a range of numbers for AutoScaleLimit.
- a list of valid entries for LogFileRetention.
- a required property (AutoScaleLimit).
- no non-defined properties allowed (“additionalProperties”: false).
- a “dependentRequired” property - if LogFileName is given, LogFileRetention is required.
SubSchemas: dependencies and conditions
As we saw above, an object can have a “required” property, listing which properties in the data are required.
A more powerful feature is schema composition - we can validate a schema against a subschema using these keywords:
allOf
Must be valid against all subschemas (AND).anyOf
Must be valid against any of the subschemas (OR).oneOf
Must be valid against exactly one of the subschemas (XOR).not
Must not be valid against the single subschema.
We can also have conditional subschemas - if/then/else.If
the data matches a subschema, then
it must match a second one, else
(optional) it must match another.
dependentSchemas
builds on the dependentRequired
we saw above, to specify an entire subschema that is to be applied if a property is present.
This example uses “if/then” and “allOf” to force CPU size types to Fit the allowed Memory size subschemas, and the required fields
For more on schema composition, refer to the docs.
Referencing other schemas
We like our code to be DRY, and schemas are no exception.
By giving our Schemas an $id
property, we can reference them elsewhere later.
Using the $defs
keyword (in previous versions of the spec, this was definitions
), we can define a number of schemas that can be referenced from within the same schema, even recursively.
In the example below we define an “address” schema in $defs
then later use it twice in our schema for the users home and other addresses.
We’re not limited to referencing schemas in the same document however.
If we treat the $id as the base URI of our schema, we can reference other schemas relatively to it (for example, in the directory structure, or as a url).
However the implementation of the fetching of these schemas varies from implementation to implementation and is not part of the specification, It’s important to remember that schema URIs are primarily identifiers, not necessarily locations to download them from.
Read more about referencing other schemas in the docs.
Note: - tools like AJV for TypeScript can also be used to *deference a schema. Dereferencing is the essentially replacing all the references with the actual schema - this can sometimes be necessary as not all tools will be support references, or the schemas reference internal URIs that are not publicly accessible. Bear in mind that not all schemas can be dereferenced, as references can be recursive or circular.
Advanced arrays and objects
We saw in the examples above, how arrays can be implemented, their lengths constrained, uniqueness enforced, and items defined -which we can of course combine with keywords such as oneOf
to allow arrays with differing item types.
Another keyword for arrays to be aware of is contains
- this allows us to validate an array if at least one item matches a given schema:
We’ve also seen basic objects above, and how their shape, required properties and dependencies can be defined.
And we alluded to additional properties, when mentioning using "additionalProperties":false
to prevent this, the corollary of this is that we can add additional properties to objects.
To this end propertyNames
allows us to enforce the pattern of additional property names, while patternProperties
allows us to apply specific schemas to different patterns. minProperties
and maxProperties
allow us to restrict the number of (additional) properties in an object.
Conclusion
JSON Schema is not just a powerful tool for ensuring data consistency and integrity—it’s a standardized solution that integrates seamlessly with a wide array of technologies.
By adopting JSON Schema, you can enhance error handling, reduce development time, and create robust, self-validating documentation that evolves alongside your codebase. Whether you’re validating API requests, defining database schemas, or generating TypeScript types, JSON Schema provides a versatile and reliable framework for managing your JSON data.
Next time you’re facing the challenge of validating or structuring your JSON data, consider implementing JSON Schema. Its benefits in terms of consistency, efficiency, and maintainability make it an invaluable asset for any developer. Start exploring the numerous libraries and tools available, and see how JSON Schema can transform your approach to data management.
References & Resources:
- AJV - a comprehensive TypeScript library for JSON schema validation and dereferencing
- json-schema.org the home of JSON schema
- A list of implementations in various languages
- The playground for react-jsonschema-form - dynamic forms from JSON schema, and useful for visualizing your schema
- In browser validator - validate your schema and data against it.
About James Babington
A cloud architect and engineer with a wealth of experience across AWS, web development, and security, James enjoys writing about the technical challenges and solutions he's encountered, but most of all he loves it when a plan comes together and it all just works.
No comments yet. Be the first to comment!