Data Caterer is a metadata-driven data generation and testing tool that aids in creating production-like data across both batch and event data systems. Run data validations to ensure your systems have ingested it as expected, then clean up the data afterwards.
Simplify your data testing
Take away the pain and complexity of your data landscape and let Data Caterer handle it
Data testing is difficult and fragmented
- Data being sent via messages, HTTP requests or files and getting stored in databases, file systems, etc.
- Maintaining and updating tests with the latest schemas and business definitions
- Different testing tools for services, jobs or data sources
- Complex relationships between datasets and fields
- Different scenarios, permutations, combinations and edge cases to cover
Current solutions only cover half the story
- Specific testing frameworks that support one or limited number of data sources or transport protocols
- Under utilizing metadata from data catalogs or metadata discovery services
- Testing teams having difficulties understanding when failures occur
- Integration tests relying on external teams/services
- Manually generating data, or worse, copying/masking production data into lower environments
- Observability pushes towards being reactive rather than proactive
What you need is a reliable tool that can handle changes to your data landscape
With Data Caterer, you get:
- Ability to connect to any type of data source: files, SQL or no-SQL databases, messaging systems, HTTP
- Discover metadata from your existing infrastructure and services
- Gain confidence that bugs do not propagate to production
- Be proactive in ensuring changes do not affect other data producers or consumers
- Configurability to run the way you want
Tech Summary
Use the Java, Scala API, or YAML files to help with setup or customisation that are all run via a Docker image. Want to get into details? Checkout the setup pages here to get code examples and guides that will take you through scenarios and data sources.
Main features include:
- Metadata discovery
- Batch and event data generation
- Maintain referential integrity across any dataset
- Create custom data generation scenarios
- Clean up generated data
- Validate data
- Suggest data validations
Check other run configurations here.
What is it
-
Data generation and testing tool
Generate production like data to be consumed and validated.
-
Designed for any data source
We aim to support pushing data to any data source, in any format.
-
Low/no code solution
Can use the tool via either Scala, Java or YAML. Connect to data or metadata sources to generate data and validate.
-
Developer productivity tool
If you are a new developer or seasoned veteran, cut down on your feedback loop when developing with data.
What it is not
-
Metadata storage/platform
You could store and use metadata within the data generation/validation tasks but is not the recommended approach. Rather, this metadata should be gathered from existing services who handle metadata on behalf of Data Caterer.
-
Data contract
The focus of Data Caterer is on the data generation and testing, which can include details about how the data looks like and how it behaves. But it does not encompass all the additional metadata that comes with a data contract such as SLAs, security, etc.
-
Metrics from load testing
Although millions of records can be generated, there are limited capabilities in terms of metric capturing.
Data Catering vs Other tools vs In-house
Data Catering | Other tools | In-house | |
---|---|---|---|
Data flow | Batch and events generation with validation | Batch generation only or validation only | Depends on architecture and design |
Time to results | 1 day | 1+ month to integrate, deploy and onboard | 1+ month to build and deploy |
Solution | Connect with your existing data ecosystem, automatic generation and validation | Manual UI data entry or via SDK | Depends on engineer(s) building it |