Guides

Below are a list of guides you can follow to create your data generation for your use case.

For any of the paid tier guides, you can use the trial version fo the app to try it out. Details on how to get the trial can be found here.

Scenarios

First Data Generation - If you are new, this is the place to start
Multiple Records Per Column Value - How you can generate multiple records per set of columns
Foreign Keys Across Data Sources - Generate matching values across generated data sets
Data Validations - Run data validations after generating data
Auto Generate From Data Connection - Automatically generating data from just defining data sources
Delete Generated Data - Delete the generated data whilst leaving other data
Generate Batch and Event Data - Generate matching batch and event data

Data Sources

Files (CSV, JSON, ORC, Parquet) - Generate data for popular file formats
Postgres - JDBC Postgres tables
Cassandra - Cassandra tables
Kafka - Kafka topics
Solace - Solace messages
Marquez - Generate data based on metadata in Marquez
OpenMetadata - Generate data based on metadata in OpenMetadata
HTTP - HTTP requests
Files (Fixed width) - (Soon to document) A variant of CSV but with no separator
MySql - (Soon to document) JDBC MySql tables

YAML Files

Base Concept

The execution of the data generator is based on the concept of plans and tasks. A plan represent the set of tasks that need to be executed, along with other information that spans across tasks, such as foreign keys between data sources.
A task represent the component(s) of a data source and its associated metadata so that it understands what the data should look like and how many steps (sub data sources) there are (i.e. tables in a database, topics in Kafka). Tasks can define one or more steps.

Plan

Foreign Keys

Define foreign keys across data sources in your plan to ensure generated data can match
Link to associated task 1
Link to associated task 2

Task

Data Source Type	Data Source	Sample Task	Notes
Database	Postgres	Sample
Database	MySQL	Sample
Database	Cassandra	Sample
File	CSV	Sample
File	JSON	Sample	Contains nested schemas and use of SQL for generated values
File	Parquet	Sample	Partition by year column
Kafka	Kafka	Sample	Specific base schema to be used, define headers, key, value, etc.
JMS	Solace	Sample	JSON formatted message
HTTP	PUT	Sample	JSON formatted PUT body

Configuration

Basic configuration

Docker-compose

To see how it runs against different data sources, you can run using docker-compose and set DATA_SOURCE like below

./gradlew build
cd docker
DATA_SOURCE=postgres docker-compose up -d datacaterer

Can set it to one of the following:

postgres
mysql
cassandra
solace
kafka
http