Star ๅކๅฒ่ถ‹ๅŠฟ
ๆ•ฐๆฎๆฅๆบ: GitHub API ยท ็”Ÿๆˆ่‡ช Stargazers.cn
README.md

End-to-End Data Pipeline

Python .NET C# HTML5 CSS3 JavaScript Airflow Spark Kafka Snowflake PostgreSQL MySQL MongoDB Redis MinIO InfluxDB Elasticsearch MLflow Prometheus Grafana Swagger Serilog Dapper Docker Kubernetes Terraform GitHub Actions Helm Argo CD

A production-ready, fully containerized data platform with batch ingestion, real-time streaming, a star-schema data warehouse, ML experiment tracking, a .NET 8 REST API, and full observability -- all orchestrated through 20 Docker services managed by a single docker compose stack.

Table of Contents

Architecture

graph TB subgraph Sources MYSQL[(MySQL 8.0<br/>Source DB)] KP[Kafka Producer<br/>Sensor Data] end subgraph Orchestration AF[Airflow 2.7.3<br/>3 DAGs] end subgraph Streaming ZK[Zookeeper] --> KAFKA[Kafka 7.5.0] KP --> KAFKA end subgraph Processing GE[Great Expectations<br/>Validation] SM[Spark Master] --> SW[Spark Worker] end subgraph Storage MINIO[MinIO<br/>S3-Compatible] PG[(PostgreSQL 15<br/>Warehouse + Processed)] MONGO[(MongoDB 6.0)] REDIS[(Redis 7)] INFLUX[(InfluxDB 2.7)] end subgraph Serving API[.NET 8 API<br/>Swagger] MLFLOW[MLflow v2.9.2] end subgraph Observability PROM[Prometheus] --> GRAF[Grafana 10.2] ES[(Elasticsearch 8.11)] end MYSQL --> AF AF --> GE --> MINIO AF --> SM SM --> PG KAFKA --> SM PG --> API PG --> MLFLOW PROM --> AF PROM --> SM

Pipeline Flows

Batch Pipeline (Daily)

MySQL --> Airflow DAG --> Great Expectations --> MinIO (raw) --> Spark Transform --> PostgreSQL (processed)

Streaming Pipeline (Continuous)

Kafka Producer --> Kafka Topic (sensor_readings) --> Spark Streaming --> Anomaly Detection --> PostgreSQL + MinIO

Warehouse ETL (Hourly)

Staging Tables --> Dimension Load (customers, products, dates, devices) --> Fact Load (orders, sensors, pipeline runs) --> Aggregations (daily orders, hourly sensors)

Technology Stack

LayerTechnologyVersionPurpose
OrchestrationApache Airflow2.7.3DAG scheduling and pipeline orchestration
Batch ProcessingApache Spark3.5.3Large-scale ETL and transformations
Stream ProcessingApache Kafka7.5.0 (Confluent)Event streaming and real-time ingestion
Data QualityGreat ExpectationslatestSchema validation and data quality checks
Source DatabaseMySQL8.0Transactional source system
Data WarehouseSnowflake / PostgreSQL- / 15Star-schema warehouse (Snowflake primary, PG fallback)
Object StorageMinIOlatestS3-compatible data lake
CacheRedis7-alpineCaching and session storage
Document StoreMongoDB6.0.13NoSQL storage for semi-structured data
Time SeriesInfluxDB2.7IoT and time-series metrics
SearchElasticsearch8.11.3Full-text search and log indexing
REST API.NET 88.0Backend API with Swagger documentation
ML TrackingMLflow2.9.2Experiment tracking and model registry
MetricsPrometheus2.48.1Metrics collection and alerting
DashboardsGrafana10.2.3Visualization and monitoring dashboards
GovernanceApache Atlas (stub)--Data lineage registration
IaCTerraform + Kubernetes--Cloud deployment manifests

Prerequisites

  • Docker and Docker Compose v2+
  • Python 3.10+ (for running tests locally)
  • Make (GNU Make)
  • 16 GB RAM recommended for full stack (or 8 GB with make up-lite)
  • Ports available: 3000, 3306, 5000, 5001, 5432, 6379, 7077, 8080, 8081, 8086, 9000, 9001, 9090, 9092, 9200, 27017

Quick Start

# 1. Clone the repository git clone https://github.com/hoangsonww/End-to-End-Data-Pipeline.git cd End-to-End-Data-Pipeline # 2. Create environment file cp .env.example .env # 3. Build and start all 20 services make build make up # 4. Verify services are running make health make urls # 5. Trigger the batch pipeline make trigger-batch # 6. Trigger the warehouse ETL make trigger-warehouse # 7. Run Spark jobs directly make spark-batch make spark-stream

Key Make Commands

CommandDescription
make upStart all 20 services (~18GB RAM)
make up-liteStart core services only (~8GB RAM)
make downStop all services
make buildBuild all Docker images
make rebuildFull rebuild from scratch (no cache)
make testRun 35 Python tests
make lintLint Python code with flake8
make healthShow health status of all containers
make statusShow running container status
make urlsPrint all service URLs
make spark-batchSubmit Spark batch ETL job
make spark-streamSubmit Spark streaming job
make trigger-batchTrigger batch_ingestion_dag in Airflow
make trigger-warehouseTrigger warehouse_transform_dag in Airflow
make list-dagsList all Airflow DAGs
make kafka-topicsList Kafka topics
make logs-kafkaTail Kafka logs
make cleanStop services and remove all volumes
make formatFormat all code (Python, C#, HTML/CSS/JS)
make format-checkCheck formatting without modifying
make deploy-localDeploy via Docker Compose (full stack)
make deploy-liteDeploy via Docker Compose (lite, 8GB)
make deploy-k8sDeploy to any Kubernetes cluster via Helm
make deploy-awsDeploy to AWS EKS (Terraform + Helm)
make deploy-gcpDeploy to GCP GKE via Helm
make deploy-azureDeploy to Azure AKS via Helm
make deploy-onpremDeploy to on-prem K8s (k3s, kubeadm)
make deploy-teardownRemove deployment from any target

Deployment

The pipeline can be deployed to any environment using a single command:

graph LR subgraph "Local / On-Prem" DC[Docker Compose<br/>make deploy-local] DL[Docker Compose Lite<br/>make deploy-lite] OP[On-Prem K8s<br/>make deploy-onprem] end subgraph "Cloud Providers" AWS[AWS EKS<br/>make deploy-aws] GCP[GCP GKE<br/>make deploy-gcp] AZ[Azure AKS<br/>make deploy-azure] end subgraph "Any Kubernetes" K8S[Helm Chart<br/>make deploy-k8s] end DC --> |20 services| Pipeline DL --> |16 services| Pipeline OP --> |Helm| Pipeline AWS --> |Terraform + Helm| Pipeline GCP --> |Helm| Pipeline AZ --> |Helm| Pipeline K8S --> |Helm| Pipeline
TargetCommandRequirementsResources
Local (full)make deploy-localDocker16GB RAM, 14 CPU
Local (lite)make deploy-liteDocker8GB RAM, 7 CPU
Any K8smake deploy-k8skubectl, HelmK8s cluster
AWSmake deploy-awsTerraform, AWS CLIEKS cluster
GCPmake deploy-gcpgcloud, HelmGKE cluster
Azuremake deploy-azureaz CLI, HelmAKS cluster
On-premmake deploy-onpremkubectl, Helmk3s / kubeadm / Rancher

Helm Chart

The helm/e2e-pipeline/ chart deploys the full pipeline to any Kubernetes cluster:

# Add repos and install helm repo add bitnami https://charts.bitnami.com/bitnami helm repo update # Deploy with provider-specific values helm install e2e-pipeline ./helm/e2e-pipeline \ -f helm/e2e-pipeline/values-aws.yaml \ # or values-gcp.yaml, values-azure.yaml, values-onprem.yaml --set postgresql.auth.password=YOUR_PASSWORD \ --set minio.auth.rootPassword=YOUR_PASSWORD \ --namespace pipeline --create-namespace

Terraform (AWS)

Full AWS infrastructure (VPC, EKS, RDS, S3) in terraform/:

cd terraform cp terraform.tfvars.example terraform.tfvars # Edit terraform.tfvars with your settings terraform init && terraform plan && terraform apply

Includes: VPC with public/private subnets, NAT Gateway, EKS with autoscaling nodes, RDS PostgreSQL (encrypted, multi-AZ), S3 data lake (versioned, encrypted, lifecycle policies), 3 security groups.

Service URLs

ServiceURLCredentials
Airflow UIhttp://localhost:8080admin / airflow_admin_2024
Grafanahttp://localhost:3000admin / admin_secret_2024
MinIO Consolehttp://localhost:9001minio / minio_secret_2024
MLflow UIhttp://localhost:5001--
Spark Master UIhttp://localhost:8081--
Swagger (.NET API)http://localhost:5000/swagger--
Prometheushttp://localhost:9090--
Elasticsearchhttp://localhost:9200--
Kafkalocalhost:9092--
PostgreSQLlocalhost:5432pipeline_user / pipeline_secret_2024
MySQLlocalhost:3306pipeline_user / pipeline_secret_2024
MongoDBlocalhost:27017--
Redislocalhost:6379--
InfluxDBhttp://localhost:8086--

API Documentation (.NET 8 Backend)

The .NET 8 API runs on port 5000 with interactive Swagger documentation at /swagger. Built with ASP.NET Core, Serilog structured logging, Polly retry policies, and Dapper micro-ORM.

graph LR C[Client] --> MW[Middleware<br/>X-Request-ID + Serilog] MW --> BC[BatchController] MW --> SC[StreamingController] MW --> WC[WarehouseController] MW --> MC[MLController] MW --> GC[GovernanceController] MW --> CC[CIController] MW --> HC[MonitoringController] BC --> MySQL & MinIO & Airflow SC --> Kafka & Airflow WC --> PostgreSQL & Snowflake MC --> MLflow GC --> Atlas CC --> GitHub

Endpoints (16 routes)

MethodEndpointControllerDescription
POST/api/batch/ingestBatchControllerExtract MySQL โ†’ validate โ†’ upload MinIO โ†’ trigger Airflow
POST/api/stream/produceStreamingControllerProduce message to Kafka topic
POST/api/stream/runStreamingControllerTrigger streaming monitoring DAG
POST/api/warehouse/transformWarehouseControllerTrigger Snowflake warehouse ETL
GET/api/warehouse/healthWarehouseControllerCheck warehouse + Snowflake connectivity
GET/api/warehouse/snowflake/statusWarehouseControllerSnowflake config status + schema info
GET/api/warehouse/aggregations/daily-ordersWarehouseControllerDaily order aggregations
GET/api/warehouse/pipeline-runsWarehouseControllerPipeline run history
POST/api/governance/lineageGovernanceControllerRegister data lineage in Atlas
POST/api/ml/runMLControllerCreate MLflow experiment run
POST/api/ci/triggerCIControllerDispatch GitHub Actions workflow
GET/api/monitor/healthMonitoringControllerAggregated health of all services
GET/healthBuilt-inFull health check (6 dependency checks)
GET/health/readyBuilt-inReadiness probe (critical deps only)
GET/health/liveBuilt-inLiveness probe (always 200)
GET/swaggerSwashbuckleInteractive API documentation

Backend Architecture

LayerComponentsTechnology
Controllers7 controllers (Batch, Streaming, Warehouse, ML, Governance, CI, Monitoring)ASP.NET Core
Services10 services with interfaces (Db, Kafka, Minio, Batch, Streaming, Atlas, MLflow, GE, CI, Monitoring)Dapper, Confluent.Kafka, AWS SDK
Health Checks6 checks (MySQL, PostgreSQL, Kafka, MinIO, Airflow, MLflow)ASP.NET Health Checks
Options8 validated config classes with ValidateOnStart()Options Pattern
ResiliencePolly retry (3x exponential backoff), request timeoutsPolly
LoggingSerilog with console + file sinks, request ID correlationSerilog

Data Warehouse Schema

The warehouse uses a star schema in Snowflake (with PostgreSQL fallback for local development). When SNOWFLAKE_ACCOUNT is set, data flows to Snowflake via staging tables. Otherwise, PostgreSQL serves as the warehouse.

Dimensions

TableDescription
dim_customersCustomer master data
dim_productsProduct catalog
dim_dateDate dimension (calendar attributes)
dim_devicesIoT device registry

Facts

TableDescription
fact_ordersTransactional order data linked to customer, product, date
fact_sensor_readingsIoT sensor measurements linked to device, date
fact_pipeline_runsPipeline execution metadata and status tracking

Aggregations

TableDescription
agg_daily_ordersDaily order totals and revenue summaries
agg_hourly_sensorsHourly sensor reading averages and counts

Airflow DAGs

DAGScheduleDescription
batch_ingestion_dagDailyExtract from MySQL, validate with Great Expectations, upload raw data to MinIO, Spark transform, load to PostgreSQL
streaming_monitoring_dagEvery 15 minMonitor Kafka broker health, check consumer lag, alert on anomalies
warehouse_transform_dagHourlyStage data in Snowflake, load dimensions/facts, refresh aggregations (PG fallback)

Testing

The test suite contains 35 tests across 5 files.

make test
Test FileScope
tests/test_pipeline_config.pyEnvironment variables, connection strings, service configuration
tests/test_kafka_producer.pyKafka producer logic, message serialization, topic configuration
tests/test_data_validation.pyGreat Expectations suite, schema validation, data quality rules
tests/test_warehouse_sql.pyWarehouse DDL, star-schema integrity, aggregation queries
tests/test_snowflake.pySnowflake SQL schema, connector module, DAG/BI/API integration
tests/test_docker_infrastructure.pyDocker Compose structure, service definitions, port mappings

CI/CD Pipeline

The GitHub Actions workflow (.github/workflows/cicd-pipeline.yml) runs on every push and PR to master/main.

StageJobDetails
Lintlintflake8, black (formatting), isort (imports)
Testpython-tests35 unit tests with pytest
Builddocker-buildBuild matrix: airflow, spark, kafka-producer, dotnet-api
Validatedocker-compose-validationValidate docker-compose.yaml syntax
Integrationintegration-testStart core services, verify health (Kafka, PostgreSQL, MySQL, Redis)
Gatepipeline-completeAggregates all job results for branch protection

Project Structure

โ”œโ”€โ”€ airflow/
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ dags/
โ”‚       โ”œโ”€โ”€ batch_ingestion_dag.py        # Daily batch ETL
โ”‚       โ”œโ”€โ”€ streaming_monitoring_dag.py   # Kafka health monitoring
โ”‚       โ””โ”€โ”€ warehouse_transform_dag.py    # Hourly warehouse ETL
โ”œโ”€โ”€ spark/
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ”œโ”€โ”€ spark_batch_job.py                # Batch ETL (MinIO โ†’ transform โ†’ PostgreSQL)
โ”‚   โ””โ”€โ”€ spark_streaming_job.py            # Real-time Kafka consumer + anomaly detection
โ”œโ”€โ”€ kafka/
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ””โ”€โ”€ producer.py                       # Sensor data generator
โ”œโ”€โ”€ storage/
โ”‚   โ”œโ”€โ”€ aws_s3_influxdb.py               # S3 + InfluxDB integration
โ”‚   โ”œโ”€โ”€ hadoop_batch_processing.py        # Hadoop batch processing
โ”‚   โ”œโ”€โ”€ mongodb_streaming.py              # MongoDB streaming integration
โ”‚   โ””โ”€โ”€ redis_integration.py              # Redis caching layer
โ”œโ”€โ”€ great_expectations/
โ”‚   โ”œโ”€โ”€ great_expectations.yaml
โ”‚   โ””โ”€โ”€ expectations/
โ”‚       โ””โ”€โ”€ raw_data_validation.py        # Data quality suite
โ”œโ”€โ”€ governance/
โ”‚   โ””โ”€โ”€ atlas_stub.py                     # Apache Atlas lineage registration
โ”œโ”€โ”€ ml/
โ”‚   โ”œโ”€โ”€ mlflow_tracking.py                # MLflow experiment tracking
โ”‚   โ””โ”€โ”€ feature_store_stub.py             # Feature store integration
โ”œโ”€โ”€ monitoring/
โ”‚   โ”œโ”€โ”€ monitoring.py                     # Prometheus + Grafana setup
โ”‚   โ”œโ”€โ”€ prometheus.yml                    # Prometheus scrape config
โ”‚   โ””โ”€โ”€ grafana-deployment-dashboards.json
โ”œโ”€โ”€ bi_dashboards/
โ”‚   โ””โ”€โ”€ bi_dashboard.py                   # BI dashboard utilities
โ”œโ”€โ”€ sample_dotnet_backend/
โ”‚   โ”œโ”€โ”€ Dockerfile                        # Multi-stage .NET 8 build
โ”‚   โ”œโ”€โ”€ appsettings.json                  # Full config (DB, Kafka, MinIO, Airflow, Snowflake)
โ”‚   โ”œโ”€โ”€ appsettings.Production.json       # Production overrides (Serilog level)
โ”‚   โ””โ”€โ”€ src/DataPipelineApi/
โ”‚       โ”œโ”€โ”€ DataPipelineApi.csproj        # .NET 8, Dapper, Confluent.Kafka, Serilog, Polly
โ”‚       โ”œโ”€โ”€ Program.cs                    # ASP.NET Core setup, middleware, health endpoints
โ”‚       โ”œโ”€โ”€ Controllers/                  # 7 controllers
โ”‚       โ”‚   โ”œโ”€โ”€ BatchController.cs        #   POST /api/batch/ingest
โ”‚       โ”‚   โ”œโ”€โ”€ StreamingController.cs    #   POST /api/stream/produce, /run
โ”‚       โ”‚   โ”œโ”€โ”€ WarehouseController.cs    #   POST /api/warehouse/transform, GET /health, /snowflake/status
โ”‚       โ”‚   โ”œโ”€โ”€ MLController.cs           #   POST /api/ml/run
โ”‚       โ”‚   โ”œโ”€โ”€ GovernanceController.cs   #   POST /api/governance/lineage
โ”‚       โ”‚   โ”œโ”€โ”€ CIController.cs           #   POST /api/ci/trigger
โ”‚       โ”‚   โ””โ”€โ”€ MonitoringController.cs   #   GET /api/monitor/health
โ”‚       โ”œโ”€โ”€ Services/                     # 10 services (Db, Kafka, MinIO, Batch, Streaming, Atlas, MLflow, GE, CI, Monitoring)
โ”‚       โ”œโ”€โ”€ Models/                       # BatchRequest, StreamingRequest DTOs
โ”‚       โ”œโ”€โ”€ Options/                      # 8 config classes (Database, Kafka, MinIO, Airflow, MLflow, Atlas, GE, GitHub)
โ”‚       โ””โ”€โ”€ HealthChecks/                 # 6 checks (MySQL, PostgreSQL, Kafka, MinIO, Airflow, MLflow)
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ init_db.sql                       # MySQL schema + seed data
โ”‚   โ”œโ”€โ”€ init_warehouse.sql                # PostgreSQL warehouse DDL
โ”‚   โ”œโ”€โ”€ deploy.sh                         # Universal deploy script (local/K8s/AWS/GCP/Azure)
โ”‚   โ”œโ”€โ”€ deploy-blue-green.sh              # Blue/green deployment orchestration
โ”‚   โ”œโ”€โ”€ deploy-canary.sh                  # Canary deployment with metrics
โ”‚   โ””โ”€โ”€ setup-advanced-deployments.sh     # Argo Rollouts + monitoring setup
โ”œโ”€โ”€ snowflake/
โ”‚   โ”œโ”€โ”€ snowflake_connector.py            # Snowflake connection + loading utilities
โ”‚   โ””โ”€โ”€ init_warehouse.sql                # Snowflake warehouse DDL (star schema + tasks + grants)
โ”œโ”€โ”€ helm/e2e-pipeline/                    # Helm chart (any K8s provider)
โ”‚   โ”œโ”€โ”€ Chart.yaml                        # Chart metadata + sub-chart deps
โ”‚   โ”œโ”€โ”€ values.yaml                       # Default values (all providers)
โ”‚   โ”œโ”€โ”€ values-aws.yaml                   # AWS EKS overrides (gp3, ALB, ECR)
โ”‚   โ”œโ”€โ”€ values-gcp.yaml                   # GCP GKE overrides (pd-ssd, GCE, GCR)
โ”‚   โ”œโ”€โ”€ values-azure.yaml                 # Azure AKS overrides (managed-premium, ACR)
โ”‚   โ”œโ”€โ”€ values-onprem.yaml                # On-prem overrides (local-path, reduced resources)
โ”‚   โ””โ”€โ”€ templates/                        # 8 templates (airflow, spark, dotnet-api, kafka-producer, configmap, secrets, namespace)
โ”œโ”€โ”€ kubernetes/                           # Raw K8s manifests (Argo Rollouts, ingress, service monitors)
โ”œโ”€โ”€ terraform/                            # AWS IaC (VPC, EKS, RDS, S3, security groups, IAM)
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_pipeline_config.py
โ”‚   โ”œโ”€โ”€ test_kafka_producer.py
โ”‚   โ”œโ”€โ”€ test_data_validation.py
โ”‚   โ”œโ”€โ”€ test_warehouse_sql.py
โ”‚   โ”œโ”€โ”€ test_snowflake.py
โ”‚   โ””โ”€โ”€ test_docker_infrastructure.py
โ”œโ”€โ”€ packages/                             # Frontend assets
โ”œโ”€โ”€ .github/workflows/cicd-pipeline.yml   # CI/CD pipeline
โ”œโ”€โ”€ docker-compose.yaml                   # 20 services
โ”œโ”€โ”€ docker-compose.ci.yaml                # CI-specific compose
โ”œโ”€โ”€ .env.example                          # Environment template
โ”œโ”€โ”€ Makefile                              # Build and operations commands
โ”œโ”€โ”€ requirements.txt                      # Python dependencies
โ”œโ”€โ”€ index.html                            # Landing page
โ”œโ”€โ”€ ARCHITECTURE.md                       # Detailed architecture docs
โ”œโ”€โ”€ QUICK_START.md                        # Quick start guide
โ””โ”€โ”€ DEPLOYMENT_STRATEGIES.md              # Deployment strategies

Configuration

All configuration is driven by environment variables defined in .env.example.

SectionKey Variables
PostgreSQLPOSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD
MySQLMYSQL_DATABASE, MYSQL_USER, MYSQL_PASSWORD, MYSQL_ROOT_PASSWORD
KafkaKAFKA_BROKER, KAFKA_TOPIC, KAFKA_ACKS_MODE
SparkSPARK_MASTER_URL, SPARK_DRIVER_MEMORY, SPARK_EXECUTOR_MEMORY
AirflowAIRFLOW__CORE__EXECUTOR, AIRFLOW_ADMIN_USER, AIRFLOW_ADMIN_PASSWORD
MinIOMINIO_ROOT_USER, MINIO_ROOT_PASSWORD, MINIO_BUCKET_RAW, MINIO_BUCKET_PROCESSED
GrafanaGRAFANA_ADMIN_USER, GRAFANA_ADMIN_PASS
MLflowMLFLOW_TRACKING_URI
RedisREDIS_HOST, REDIS_PORT
MongoDBMONGODB_URI, MONGODB_DB
InfluxDBINFLUXDB_URL, INFLUXDB_TOKEN, INFLUXDB_ORG, INFLUXDB_BUCKET
SnowflakeSNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_WAREHOUSE, SNOWFLAKE_DATABASE
GovernanceATLAS_API_URL, ATLAS_USERNAME, ATLAS_PASSWORD

Snowflake Setup

To enable the Snowflake data warehouse (optional -- PostgreSQL is used as fallback):

# 1. Set Snowflake credentials in .env SNOWFLAKE_ACCOUNT=your_account.us-east-1 SNOWFLAKE_USER=your_user SNOWFLAKE_PASSWORD=your_password # 2. Initialize the Snowflake warehouse schema snowsql -a $SNOWFLAKE_ACCOUNT -u $SNOWFLAKE_USER -f snowflake/init_warehouse.sql # 3. The warehouse_transform_dag will automatically use Snowflake when configured

To customize, copy the example and edit:

cp .env.example .env # Edit .env with your values

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'Add your feature')
  4. Push to the branch (git push origin feature/your-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License.


For questions or feedback, reach out on GitHub.

ๅ…ณไบŽ About

๐Ÿ“ˆ A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.
airflowapachedockerelasticsearchflinkgrafanagreat-expectationshadoopinfluxdbkafkakuberneteslookerminiomlflowpostgresqlprometheuspythonsparksqlterraform

่ฏญ่จ€ Languages

Python25.2%
Shell15.4%
C#14.7%
HTML11.6%
Jupyter Notebook10.9%
HCL10.0%
CSS6.5%
JavaScript3.0%
Makefile1.6%
Dockerfile0.9%
Go Template0.1%

ๆไบคๆดป่ทƒๅบฆ Commit Activity

ไปฃ็ ๆไบค็ƒญๅŠ›ๅ›พ
่ฟ‡ๅŽป 52 ๅ‘จ็š„ๅผ€ๅ‘ๆดป่ทƒๅบฆ
41
Total Commits
ๅณฐๅ€ผ: 11ๆฌก/ๅ‘จ
Less
More

ๆ ธๅฟƒ่ดก็Œฎ่€… Contributors