Back to selected work
WellnessTech

Automated ETL Validation & Data Quality

Built a Python-based data quality framework automating validation across Snowflake, BigQuery, and Oracle.

Aspire IT Services — WeightWatchers

3DB Environments Covered
100%Automated ETL Checks
6BI Migrations Validated

The Problem

WeightWatchers ran analytics and reporting across a multi-cloud data stack — Snowflake, BigQuery, and Oracle — feeding business dashboards and user-facing reporting features. ETL pipelines moved data between these environments, but there were no automated checks to verify correctness after each run. Schema drift, duplicate records, and reconciliation mismatches were only discovered reactively — often after bad data had already surfaced in reports. The team was also mid-migration from DSS Reporting to Looker Studio, with no validation framework to confirm the migrated dashboards matched source truth. I was brought in to build the validation infrastructure from scratch.

Data Validation Pipeline Architecture

Snowflake Analytics warehouse BigQuery GCP data platform Oracle DB Operational source ETL Pipeline Extract → Transform → Load Python automated Validation Engine Schema checks Row reconciliation Duplicate detection Python · SQL ✓ Data Valid Pipeline output confirmed ✗ Issues Found Drift / dupes → flagged Looker Studio Migrated from DSS Dashboards validated

Wrote Python-based ETL validation scripts that ran automated checks across Snowflake, BigQuery, and Oracle after every pipeline execution: schema drift detection, row count reconciliation between source and destination, duplicate record identification, and field-level data type checks. For the BI migration from DSS Reporting to Looker Studio, validated that every dashboard metric matched its source dataset — ensuring the migration produced no regressions in reported figures.

Business Impact

  • 3 database environments covered — Snowflake, BigQuery, and Oracle — under a single automated validation framework
  • Shifted ETL quality assurance from reactive (catch issues in reports) to proactive (catch failures at the pipeline boundary)
  • Schema drift, duplicate records, and reconciliation mismatches caught before reaching analytics dashboards or end-user reports
  • DSS → Looker Studio BI migration validated end-to-end — confirmed dashboard metrics matched source data after full migration
Stack: PythonSQLBigQuerySnowflake Oracle DBMongoDBLooker StudioGit