Roadmap
- Support for other data sources
- AWS, GCP and Azure related data services (
cloud storage)
- Deltalake
- RabbitMQ
- ActiveMQ
- MongoDB
- Elasticsearch
- Snowflake
- Databricks
- Pulsar
- AWS, GCP and Azure related data services (
- Further support for metadata discovery
HTTP (OpenAPI spec)
- JMS
- Read from samples
API for developers and testers
Scala
Java
- UI Portal for metadata and data generation
- Metadata stored in database
- Store data generation/validation run information in file/database
Report for data generated and validation rules
- Integration with existing metadata services
- Populate metadata back to metadata services
OpenLineage metadata (Marquez)
OpenMetadata
- ODCS (Open Data Contract Standard)
- Amundsen
- Datahub
- Solace Event Portal
- Airflow
- DBT
- Integration with existing data validations
Suggest data validations
- Data dictionary
- Business definitions of fields that can be referenced for metadata across all data sources
Verification rules after data generation
Validation waiting conditions
Webhook
File exists
Data exists via SQL expression
Pause
- Extend validation types
Aggregates (sum of amount per account is > 500)
- Ordering (transactions are ordered by date)
Relationship (at least one account entry in history table per account in accounts table)
- Data profile (how close the generated data profile is compared to the expected data profile)
- Extend count
- Cover all possible cases (i.e. record for each combination of oneOf values, positive/negative values etc.)
- Similar to edge cases
- Ability to override edge cases
- Alerting
- Slack
- Overriding tasks
- Can customise tasks without copying whole schema definitions, easier to create scenarios
- Gradle plugin
- Metadata improvements
- PII detection (can integrate with Presidio)
- Relationship detection across data sources
- SQL generation
- Ordering information
- Code generation
- Schema generation from Scala/Java class
- Ordering within data sources that support order for insertion
- Further data cleanup
- Clean up data in consumer data sinks
- Clean up data from real time sources (i.e. DELETE HTTP endpoint, delete events in JMS)
Trial app to try out all features
- HTTP response data validation