Project Portfolio

End-to-end analytics projects solving real healthcare business problems — from raw data to executive dashboards and measurable outcomes.

$2M+

Revenue Leakage Found

18%

Denial Rate Reduction

$500K+

Fraud Claims Flagged

120+

Providers Benchmarked

Case Study 01

Healthcare Fraud Risk Analytics

Anomaly Detection & Provider Risk Profiling

10,000+$500K+8

The Problem

Healthcare fraud costs the US healthcare system over $100B annually. Our organization needed a scalable approach to identify suspicious billing patterns across 10,000+ claims and flag high-risk providers before payments were issued.

Approach

  • Built an end-to-end fraud detection pipeline analyzing claims across 8 provider specialties and 5 insurance types
  • Developed a composite fraud risk score using statistical anomaly detection (Z-scores, IQR methods) on billing amounts, claim frequency, and service patterns
  • Created provider risk profiles comparing individual billing patterns against specialty benchmarks
  • Performed chi-square and t-test analyses to validate statistically significant fraud indicators

Results

  • $500K+ in suspicious claims flagged for review
  • Identified 3 provider specialties with fraud rates 2x above baseline
  • Reduced false-positive rate by 35% through multi-factor scoring
  • Built executive dashboard enabling real-time fraud monitoring
PythonSQLTableaupandasSciPyJupyter
Case Study 02

Revenue Integrity Analytics

CMS Medicare Charge-to-Payment Variance Analysis

250K+$2M+12

The Problem

Revenue leakage from undercoded procedures, payer underpayments, and charge capture gaps was costing the organization millions annually. Leadership needed visibility into exactly where revenue was being lost and which provider contracts needed renegotiation.

Approach

  • Analyzed 250,000+ CMS Medicare provider-service records to identify charge-to-payment variance patterns
  • Built charge-to-payment ratio models comparing submitted charges against allowed and paid amounts across specialties and geographies
  • Identified high-variance HCPCS codes where reimbursement fell significantly below submitted charges
  • Segmented analysis by provider type, state, and service category to pinpoint systemic underpayment patterns

Results

  • $2M+ in annual revenue leakage identified
  • 12 underpaying payer agreements flagged for renegotiation
  • 40% reduction in charge capture errors through automated flagging
  • Executive dashboard adopted by VP-level stakeholders for quarterly reviews
PythonSQLTableaupandasNumPyJupyter
Case Study 03

Claims Efficiency Analysis

Denial Root-Cause Analysis & Payer Benchmarking

8,500+18%90%

The Problem

Claim denial rates were trending upward, costing the organization in rework hours and delayed revenue. The team lacked visibility into which denial reasons were most prevalent, which payers were underperforming, and what the true cost of rework was.

Approach

  • Analyzed 8,500+ claims across 6 service lines and 4 payer types to build a comprehensive denial analytics framework
  • Performed root-cause analysis on denied claims, categorizing by denial reason (missing documentation, coding errors, authorization issues, timely filing)
  • Built payer benchmarking models comparing approval rates, processing times, and paid-to-billed ratios
  • Calculated rework cost impact including touch counts, processing days, and staff time allocation

Results

  • 18% reduction in claim denial rate through automated pre-submission flagging
  • Identified 'Missing Documentation' as #1 denial driver (38% of all denials)
  • Report generation time cut from 8 hours to 45 minutes via automated pipelines
  • Payer scorecards adopted for annual contract negotiations
PythonSQLTableaupandasmatplotlibJupyter
Case Study 04

Provider Performance Analytics

Quality-Cost Benchmarking & Network Optimization

120+1,800+2

The Problem

The provider network included 120+ providers but lacked standardized performance measurement. Network strategy decisions were based on anecdotal feedback rather than data, leading to inconsistent quality and cost outcomes.

Approach

  • Built provider scorecards tracking quality scores, cost efficiency, utilization metrics, and approval rates across 1,800+ monthly records
  • Developed peer percentile ranking (25th/50th/75th) within each specialty to enable fair benchmarking
  • Analyzed quality-cost correlation to identify providers delivering high quality at lower cost
  • Created network tier analysis comparing Standard vs. Premium tier outcomes

Results

  • 120+ providers benchmarked with standardized scorecards
  • Identified top-decile providers with 95+ quality scores and below-median costs
  • Network strategy decisions now data-driven, informing contract renewals
  • Monthly automated reporting replaced quarterly manual reviews
PythonSQLTableaupandasSciPyJupyter
Case Study 05

End-to-End Healthcare Pipeline

Data Engineering: Ingestion to Executive Reporting

50K+60%0%

The Problem

Healthcare data arrived from multiple sources in inconsistent formats with quality issues — missing values, duplicates, and schema drift. Analysts spent 60%+ of their time cleaning data rather than generating insights.

Approach

  • Designed a complete ETL pipeline: ingestion, validation, transformation, KPI modeling, and reporting
  • Built data quality checks including null detection, duplicate removal, schema validation, and referential integrity
  • Created transformation layer standardizing provider, claims, and patient data into analytics-ready formats
  • Automated KPI calculation and report generation with scheduling capabilities

Results

  • 60% reduction in analyst data prep time
  • Zero manual data entry errors (previously 3-5% error rate)
  • Pipeline processes 50K+ records with full audit logging
  • Reusable framework adopted across 3 additional analytics projects
PythonSQLETLAzurepandasJupyter
Case Study 06

Healthcare Data Warehouse (dbt)

Analytics Engineering with dbt + PostgreSQL

3100%10x

The Problem

Analytics queries were running against raw, unnormalized tables — slow, error-prone, and inconsistent across teams. The organization needed a proper warehouse with staging, intermediate, and mart layers following analytics engineering best practices.

Approach

  • Built a dbt project with PostgreSQL implementing a three-layer warehouse: staging (cleaned sources), intermediate (business logic), and marts (consumption-ready)
  • Defined dbt models with proper materializations (views for staging, tables for marts) and incremental refresh strategies
  • Implemented data tests (unique, not_null, accepted_values, relationships) across all layers
  • Created documentation with dbt docs for full data lineage visibility

Results

  • Query performance improved 10x on mart tables vs. raw sources
  • 100% test coverage across all models with automated CI checks
  • Self-serve analytics enabled — business users query marts directly
  • Full data lineage from source to dashboard via dbt docs
dbtPostgreSQLPythonSQLData Modeling

Want to see more?

Check out my interactive dashboards on Tableau Public or explore the full code on GitHub.