Senior DevOps Engineer ยท United States view

Anirudh Vaka

Production infrastructure on AWS + on-prem Kubernetes.
Founder of PrepAtlas + HumanifyCV.

I build and operate production infrastructure โ€” and I ship products on top of it. By day, I lead DevOps for an enterprise SaaS platform and operate an on-prem Kubernetes data center I built from bare metal. Outside work, I run two paid SaaS products โ€” PrepAtlas (AI-grounded exam prep) and HumanifyCV (AI text humanization). Currently exploring Senior DevOps / Platform / SRE roles in the United States (open to H1B sponsorship) or fully remote.

Uptime
99.9%
on-prem K8s, 2 years
Pipelines
200+
GitHub Actions + Azure DevOps
Faster releases
60%
via containerization
Cloud cost cut
25%
AWS + Azure right-sizing
Paid SaaS products
2
PrepAtlas, HumanifyCV
Production DevOps
3+ yrs
since Jan 2023

Selected Projects

Two products I run end-to-end with paying users, and two production platforms I architect and operate for work.

Product ยท Live

PrepAtlas

AI-grounded exam prep platform for Indian students

The problem. Indian exam prep platforms surface confident-but-unsourced answers โ€” students can't verify what they're memorising, and hallucinated facts get propagated as truth.

The approach. Retrieval-Augmented Generation grounded in a curated corpus. Every answer cites the source passage(s) it was generated from. Queries are embedded, matched against pgvector in Supabase Postgres, and the top-K passages are passed as context to Claude โ€” answers that can't be grounded are refused rather than hallucinated.

The outcome. 20+ paying users in beta on a $35/month AWS stack. Sub-200KB JS on critical paths. Wrapped as an Android TWA via Bubblewrap so the same Next.js bundle ships native-feel on Play Store.

  • Vector DBpgvector inside Supabase over Pinecone โ€” one fewer service, RLS on the same Postgres, and quotas covered by the existing free tier.
  • Mobile shippingBubblewrap TWA over React Native โ€” same Next.js bundle, no duplicate codebase, Play Store install in under a week.
  • Hosting$35/mo AWS EC2 + nginx + pm2 โ€” predictable cost, no surprise bills, easy to step up to ECS if traffic warrants it.
  • Performance budgetSub-200KB JS on critical paths โ€” measurable, enforceable, falls straight out of Next.js bundle analysis.
20+ paying users in beta$35/mo hosting cost<200KB critical-path JS
Next.js 15React 19TypeScriptTailwindshadcn/uiSupabase (Postgres + Auth + RLS + Storage)pgvectorAnthropic Claude APIAWS EC2nginxpm2Bubblewrap TWA
Product ยท Live

HumanifyCV

AI text humanization + resume optimization SaaS with production-grade auth and payments

The problem. Resume-optimisation tools spit out generic AI-flavoured prose and leak credentials through casual OAuth flows. Trust matters: you're handing it your career history.

The approach. Production-grade auth as a feature, not an afterthought. NextAuth v5 with email verification, TOTP 2FA backed by AES-256-GCM-encrypted secrets, and WebAuthn passkeys for passwordless sign-in. Razorpay payments modelled as a discriminated union so refunds, captures, and disputes can't silently miscompile. Anthropic Claude Sonnet runs the actual humanisation.

The outcome. 30โ€“40 paying users on AWS ECS. Sentry for runtime, AWS SES for transactional email, 31 Jest / Testing Library tests on the auth + payment paths specifically.

Auth Layerproduction-grade
NextAuth v5Email verificationTOTP 2FAAES-256-GCM secretsWebAuthn passkeys
App LayerAWS ECS
Next.js 16React 19TypeScript strictPrisma 7Postgres
Integration Layertyped boundaries
Claude Sonnet 4.6Razorpay (discriminated union)AWS SES (Nodemailer)
Observability & Qualitynever-silent failures
SentryJest + Testing Library31 tests
  • Payment modelRazorpay events as a TypeScript discriminated union โ€” captured / refunded / disputed can't be confused at the type level.
  • 2FA storageTOTP secrets AES-256-GCM-encrypted at rest with a key from secret manager. Plaintext never touches Postgres.
  • PasskeysWebAuthn FIDO2 over passwords โ€” phishing-resistant, no shared secret, signs in with the device biometric.
  • Test priorities31 tests concentrated on the auth + payment paths โ€” most damaging failure modes are the regressions caught here first.
30โ€“40 paying users31 tests on auth + payments0 plaintext secrets at rest
Next.js 16React 19TypeScriptPostgres + Prisma 7NextAuth v5WebAuthn passkeysTOTP 2FAAnthropic Claude Sonnet 4.6RazorpayAWS ECSAWS SESSentryJest + Testing Library
Visit humanifycv.com โ†’
Client engagement ยท Ongoing

AICPA & CIMA Enterprise Platform

Multi-region AWS deployment + label-driven GitOps for a London-based enterprise SaaS client.

The problem. Sequential branch model, manual cherry-picking between release and main, manual SNOW change tickets, ECS deploys done by hand. Release errors were frequent and slow to attribute.

The approach. Replaced the sequential model with a label-driven GitOps topology. Labels on PRs drive the pipeline โ€” adding fbe spawns a full-stack ephemeral environment (ECS + RDS + S3 + SQS + SNS) provisioned via Terraform; adding staging deploys to ECS staging; merging to main auto-opens a ServiceNow Change Request, authenticates to AWS via OIDC keyless, deploys, and closes the CR.

The outcome. ~70% reduction in release errors. PR setup time from days to minutes. Scheduled Friday-night cleanup auto-destroys all FBEs to save weekend spend.

ServiceNow ITSM
CR lifecycle automated end-to-end via REST
Teams ChatOps
Deploy + incident notifications
Trivy + Grype
Shift-left container CVE scanning
  • Trigger modelPR labels โ€” fbe, staging, prod โ€” over branch-per-environment. No drift between branches, label removal cleans up.
  • AWS authOIDC keyless from GitHub Actions to AWS โ€” no long-lived access keys to rotate, no leaked-secret blast radius.
  • Region strategyMulti-region with Route 53 latency routing + active-active failover. Verified failover via game day.
  • Change managementServiceNow CR opened + closed by the pipeline โ€” no manual ticket toil, deploy + CR are atomic.
~70% release errors reduceddays โ†’ minutes PR env setup200+ CI/CD pipelines
GitHub ActionsAWS ECSAWS RDSLernaTerraformServiceNowOIDCTrivy / GrypeRoute 53
Bare-metal infra ยท Production

TimeChamp On-Prem Infrastructure

Production Kubernetes data center built from bare metal at CtrlS Hyderabad. Two years at 99.9% uptime.

The problem. TimeChamp's existing platform was a Windows monolith on IIS, deployed by hand, with no observability and unpredictable cloud spend. Customer growth was capped by release velocity and on-call burden.

The approach. Architected and built a full on-prem Kubernetes data center from bare metal โ€” racked Dell PowerEdge servers at CtrlS Hyderabad, designed VLAN segmentation, configured FortiGate 200F + IPSec VPN with dual-ISP failover. Migrated the monolith to Docker + Kubernetes (~15 nodes, 100+ containers on Hyper-V). Deployed Prometheus + Grafana + Loki for observability, 200+ Azure DevOps pipelines for CI/CD, Cloudflare WAF + CDN at the edge.

The outcome. 99.9% uptime over two years. Release cycle cut ~60% with zero-downtime deployments. $0 cloud spend for on-prem workloads โ€” only burst traffic goes to AWS / Azure.

Physical LayerCtrlS Hyderabad
Dell PowerEdge serversFortiGate 200FIPSec VPN + dual-ISPCisco VLAN segmentation
Orchestration Layer~15 nodes ยท 100+ containers
KubernetesHyper-VDockerHelm
Observability Layermetrics ยท logs ยท alerts
PrometheusGrafanaLoki
CI/CD + Edgeautomation ยท security
Azure DevOps ยท 200+ pipelinesCloudflare WAF + CDNDR Validator (C# + AWS S3)
  • On-prem over cloud-nativeDatacenter colo at CtrlS โ€” predictable cost at scale, full network + storage control, regulatory comfort for customer data.
  • Hyper-V under K8sHyper-V for the hypervisor โ€” leverages the team's existing Microsoft expertise and licensing while K8s does the orchestration.
  • Network HAFortiGate 200F + IPSec VPN + dual-ISP failover โ€” verified by pulling the live ISP cable. Zero customer impact.
  • DR ValidatorCustom tool in C# / .NET โ€” restores every MS SQL backup to a throwaway instance daily, alerts on silent corruption.
99.9% uptime ยท 2 years100+ containers ยท 15+ nodes$0 cloud spend for on-prem workloads
KubernetesDockerHyper-VFortiGateCiscoPrometheusGrafanaLokiAzure DevOpsCloudflareC# / .NET

Tech Stack

Technologies I use daily to build, ship, and operate production infrastructure and side products.

Cloud & Infrastructure

  • AWS (ECS, RDS, Lambda, S3, Route 53, IAM, multi-region)
  • Azure (VMSS, Functions, App Services)
  • On-prem (Hyper-V, bare metal)
  • GCP (fundamentals)

Containers & Orchestration

  • Kubernetes (production, on-prem + cloud)
  • Docker
  • Helm
  • Horizontal Pod Autoscaling
  • Ingress

IaC & CI/CD

  • Terraform
  • GitHub Actions
  • Azure DevOps
  • Ansible
  • Jenkins

Web & Caching

  • Nginx
  • Redis
  • AWS API Gateway
  • CloudFront

Networking & Security

  • Linux
  • Cisco switches
  • FortiGate (firewall, VPN, SD-WAN)
  • VLAN segmentation
  • Cloudflare WAF
  • OIDC / IAM
  • TLS
  • Trivy / Grype

Databases

  • PostgreSQL
  • Supabase
  • MS SQL Server
  • MySQL
  • AWS RDS

Observability

  • Prometheus
  • Grafana
  • CloudWatch
  • Sentry
  • Loki
  • OpenSearch
  • ELK

Languages

  • TypeScript
  • Python
  • C# (.NET)
  • Bash
  • PowerShell
  • SQL

Engineering Writeups

Longer-form posts on the technical decisions behind my products and infrastructure work. Useful pre-reading for an interview.

โ— Live

PrepAtlas engineering deep-dive

Grounded RAG with citations, pgvector over Pinecone, TWA over React Native, sub-200KB performance budget, and a $35/mo hosting story.

prepatlas.in/engineering

โ— Coming soon

HumanifyCV engineering deep-dive

Production-grade auth (passkeys, 2FA, AES-256-GCM), Razorpay events as a discriminated union, the AWS ECS layout, and which 31 tests I wrote first.

humanifycv.com/engineering

โ—‹ Planned

How I built a production on-prem K8s data center from bare metal

Racking Dell PowerEdge at CtrlS Hyderabad, VLAN segmentation, FortiGate failover, choosing Hyper-V under Kubernetes, and what 99.9% uptime for two years actually cost.

anirudhvaka.dev

โ—‹ Planned

Architecture evolution โ€” three lessons from migrating live systems

Lessons from IIS-to-Kubernetes, TeamCity-to-Azure-DevOps, and sequential-branch to label-driven GitOps. What broke, what didn't, what I'd undo.

anirudhvaka.dev

Let's build together

Open to Senior DevOps / Platform / SRE roles in the United States (open to H1B sponsorship) or fully remote. Let's talk.

Senior DevOps Engineer ยท United States view