Talks & Public Speaking

2020

SLOs that Lie

#include at Gojek

At face value uptime is the easiest monitor to setup. But, when it matters it is one of the hardest SLO to set. As long as the values are low it’s simple Up / (Up+Down) %
But when it starts crossing the 99.9x% mark is where this gets interesting. If something is checked only x seconds, was it always up or only up during the check? Who monitors the uptime of the uptime monitor?
As always, there is no silver bullet, just trade offs. The talk is a tale of these trade offs.

SRE War Stories 3/3

Rootconf

Scars, Battle scars, and Expensive scars.
Most often than not, software engineers don’t know the operational challenges of their code - what can fail in production code.
Manjot Pahwa, Rishu Mehrotra, Kalyan Somasundaram, and I discuss:
- Google, LinkedIn and startup world view of SRE
- Often neglected area of work of an SRE - cost
- Cost of SRE and reliability
- Incident management
- Authority and roles
- Day-to-day and incident management
- Operational load and toil

Understanding the SRE Mindset and tools 2/3

Rootconf

Pulumi or Terraform?
How much and what all to automate?
Automated vs automatic?
Ansible or K8s?
Serverless or needless?
Choices, choices, choices - and very expensive these choices are.
An attempt to lay out facts (from popular tech fiction) + trade-offs associated with these choices.

Doing Site Reliability Engineering the Right way 1/3

RootConf

Obsolete software == stable software.
Stability or release velocity?
Treat operations as product.
Reliability != uptime.
Reliability != buffet lunch. Pay for what you order.
Treat SLOs as gears.
Focus on SRE when \$ spend < \$ lost.
SRE Tenets - minimize downtime, find where else is it happening, prevent future failures.
SREs should be able to withstand boredom.
SRE maturity model.

It won't make a noise when it breaks

Distributed Systems Conference

Systems fail but the real failures are the ones from those we learn nothing.
This talk is a tale of few such failures that went right under our noses and what we did to prevent those. The failures covered range from Heterogenous systems, unordered events, missing correlations and just human errors.

2019

Observability and control theory applied to Software

Rootconf Hyderabad, India
DeccanRubyconf, India

Software is opaque, To see what it’s doing you inject observation capability into it. This goes beyond logs & stepping through in a debugger for you have to observe the live system, not your sandbox. How does Control theory use observability to build systems that thrive on the feedback and improve? Slides ⧉

Reliability of Distributed Systems

Rootconf, Bangalore
DevopsDays, Bangalore
DevopsDays, Berlin
Rootconf, Pune

Every product either dies a hero or lives long enough to hit Reliability issues. While you go about fixing this, What is the cost, both in terms of effort and business lost, of failure and how much does each nine of reliability cost? The talk considers a sample and straightforward product and evaluates the depths of each failure point. We take one fault at a time and introduce incremental changes to the architecture, the product, and the support structure like monitoring and logging to detect and overcome those failures. Slides ⧉

My TLS was/is Broken

Gophercon, India

TLS has come a long way and probably one of the least discussed topics in public Talks. The talk walks through understanding Certs, how server-to-server TLS exchange happens. What is CRL and how do you detect Revocation Lists. Problem with CRL lists? What is OCSP and how does it solve the problem of CRLs. What is the problem with OCSP? What is OCSP Stapling? Why languages do not address the problems of identifying revocation and expired certs. How do we bring this all together to bring an actually trustable server to server exchange? Slides ⧉

2018

Structuring Organisations as Distributed Systems

Expert Talks, India

You know Single Responsibility Principle, Dependency Inversion Principle, and SOLID design pattern. You know DRY, loose coupling, CAP theorem. You have probably also heard of benefits of Functional Programming. Can one apply these learning and Programming principles to scale an organization as well? Slides ⧉

Anatomy of a Container

Gophercon, Vietnam
Gophercon, India

What are Containers and How is Docker made? It's a bunch of namespaces and cgroups put together to build the process isolations that we see. What are namespaces and how do they operate? The talk invokes one Linux namespace at a time, as system calls from a Golang code, up to a full-fledged container. Slides ⧉

2017

Understanding Cgroups & NameSpaces in Linux

Linux-Lab, Florence

Cgroups and Namespace are the shoes and shorts of the container race, not in any particular order. They have been around for a while but not too many see the usage and power they have. The talk is a consortium of cookbooks where these were used to sole Infrastructure problems I have encountered. Slides ⧉

ABC of Distributed Data Processing

ExpertsTalk - Pune
Distributed Systems Meetup, Thoughtworks

The talk is an anatomy of the data processing systems, their building blocks, methods & purpose. We split the system into layers; defining the relevance, need, & behavior of each. We study common frameworks & tools, what layer do they fit in & later showcasing typical architectures and deployments. Slides ⧉

HTTP: Can we do better than that?

Redis India Tour

How much does your application weigh when you build it using HTTP Constructs? Can you achieve the same availability and reliability using another alternatives? Slides ⧉

2016

Designing Scalable & Distributed Systems using & Gilmour

Gophercon, London
Rootconf, Bangalore
Gophercon, Dubai
Gophercon, Bangalore
Deccan RubyConf, India
Devfest, Berlin

Microservices have rapidly evolved over the years as a popular way of developing applications. But they bring their own set of challenges in the form of what design pattern to use, monitoring, logging, error detection, scaling and service discovery.
The talk explores the common characteristics and design patterns to be considered while dealing with service oriented architectures. It also talks about Signal-Slots, RPC architectures, monitoring, log and error handling, function point scaling, and common unix philosophies that help you design scalable distributes systems.
Diving into code samples, demos and production deployments; I would like to showcase Gilmour, a cross language library we have authored for effective microservices that exchange data over non-HTTP transports. Slides ⧉

Behaviour Driven Oops using Traits

Devfest, Berlin

We all build software, and we see ourselves using OOP in some manner or the other. Inheritance is one of the core properties of OOP. What are the common variants of Inheritance? Single, multiple, and mixin based inheritance.
All of these suffer from conceptual and practical limitations. Ir-respective of the choice of language, our design ends up the same way. Usually a mesh of interconnected types. As the size of the project goes and we introduce more types, the complexity and cost of testing that system keeps increasing. The Internet is full of memes on that. We go about identifying and illustrating these problems.
We then talk about traits. A trait is essentially a group of pure methods that compose classes and is a primitive unit of code reuse.
In this model, classes are composed from a set of traits by specifying glue code that connects the traits together and accesses the necessary state. We demonstrate how traits overcome these problems, and help you build simpler and reusable code.
This talk is based on a research paper published in Traits: Composable Units of Behaviour"

2015

Why we used (a lot of) Go at a Python shop.

Golang Meetup, Pune

Slides ⧉

Analysing your CDN Usage for Profit

FossAsia, Singapore

CDNlysis syncs Amazon Cloudfront log entries from S3 bucket and streams them to multiple database backends. You can later query for:
- Understanding how the Bandwidth is being used.
- Finding out the most popular and most downloadable content.
- Generate trends for your most popular Videos, Audios, Slides etc.
- Understand geographical behaviour of the Requests.
- Amount of Bytes transferred to & fro the Cloudfront distributions.
- Find out the most profitable referrer from where your content is being accessed. etc.

Gottp: A micro backend framework in Golang.

Gophercon India, Bangalore

Gottp was designed using backend servers in mind and offered:
- Background Workers
- Call Aggregation using Non-Blocking or Blocking Pipes.
- Optionally Listens on Unix Domain socket.
- In-built error traceback emails.
- Optional data compression using zlib/gzip.
- Automatic listing of all exposed URLs

Interview: Gopher Piyush Verma.

2012

Build a High Capacity Queue Task Processor with MongoDB in Production.

MongoDB Pune

We called it "Party", a Persistent Queue processor responsible for handling all the non real-time stuff varying from Video encoding, Indexing Search Documents, Updating cache, tracking participant progress, sending out emails, payments, releasing Node.js sockets etc. Details ⧉