Kashish Lakhara

kashish@devops ~

~/kashish $ whoami

Kashish Lakhara

AWS SAA Certified DevOps Engineer, managing multiple production Kubernetes clusters across AWS, GCP, Azure, and on-premise environments, building resilient infrastructure with GitOps, observability pipelines, and cloud-native tooling.

Passionate about distributed systems, infrastructure reliability, and the open-source cloud-native ecosystem.

~/kashish $

About

AWS SAA Certified DevOps Engineer managing 15+ production Kubernetes clusters across AWS EKS, GCP GKE, Azure AKS, and on-premise environments including air-gapped clusters bootstrapped with kubeadm. Strong focus on infrastructure automation with Terraform and Ansible, GitOps workflows with FluxCD, and observability pipelines using Prometheus, Grafana, and OpenSearch.

Comfortable working deep in the stack from bare-metal cluster bootstrapping and etcd operations to Kafka, VerneMQ MQTT, and Longhorn distributed storage. Currently learning Go with a focus on contributing to the OpenTelemetry collector-contrib project.

about.yaml

name: Kashish Lakhara

role: DevOps Engineer @ Cloud Solitaire Technologies

location: Ahmedabad, Gujarat, India

aws_cert: Solutions Architect Associate

experience:

- Multi-cloud Kubernetes

- Prometheus · Grafana · AlertManager

- Longhorn · CrateDB · Patroni

- FluxCD · Terraform · Ansible

- HAProxy · MetalLB · Proxmox VE

view blog get in touch

Experience

DevOps Engineer

Cloud Solitaire Technologies · Ahmedabad, Gujarat

April 2025 – Present

› Automated bare-metal Kubernetes cluster setup end-to-end with Ansible and kubeadm control-plane bootstrap, etcd init, worker join, CNI install, and kube-reserved memory + eviction thresholds for production-grade on-premise deployments.
› Resolved etcd disk exhaustion in production via defragmentation then rolled out automated compaction across all on-prem clusters eliminating the issue before it could cause an outage.
› Deployed HAProxy + Keepalived for bare-metal HA with virtual IPs across master nodes; validated failover end-to-end on Proxmox.
› Diagnosed KubeAPIErrorBudgetBurn alerts traced to etcd I/O errors via smartctl; identified a failing 8-year-old HDD, coordinated live SSD replacement, and migrated all workloads without downtime.
› Created a Kubernetes CronJob to auto-renew API server certificates before expiry preventing cluster authentication failures in on-premise environments where managed cert rotation isn't available.
› Set up Azure infrastructure from scratch with Terraform (AKS, Application Gateway for Containers, Azure Front Door) and wrote Azure DevOps pipelines deploying to AKS via self-hosted runners.
› Migrated production infrastructure from on-premise to AWS EKS, implemented Karpenter for node autoscaling, and analysed EC2 usage patterns to evaluate Savings Plans post-migration.
› Built centralised logging on GKE with OpenSearch and Fluent Bit namespace-isolated indexes for Kubernetes events, application logs, and kube-system logs.
› Configured AlertManager routing to Microsoft Teams with alert severity segregation across 15+ clusters, giving clients a single pane of glass for infrastructure health.

Stack

Kubernetes & Containers

KubernetesDockerkubeadmKarpenterHelm

Cloud

AWS EKSGCP GKEAzure AKSTerraformAnsible

Observability

PrometheusGrafanaAlertManagerFluent BitOpenSearchOpenobserve

GitOps & CI/CD

FluxCDGitHub ActionsAzure DevOpsJenkins

Storage & Data

LonghornMinIOCrateDBPatroniPostgreSQL

Networking

HAProxyMetalLBIngress-NGINXGateway APICoreDNSVerneMQ

Infrastructure

Proxmox VELinuxBashPythonKafka

Writing

kubernetesetcddevops

When a 7-Year-Old Disk Takes Down Your Control Plane

How a 7 year old SSD caused etcd WAL fsync latency to spike above 300ms, burned the Kubernetes API error budget, and how we diagnosed it layer by layer.

May 30, 2026

All posts →

Get In Touch

Open to collaborating on interesting projects and technical challenges.

Location

India

kashishlakhara04@gmail.com

Blog

techwithkashish.com/blogs