Misadventures of an Early Engineer

February 12, 2022

The red text beneath the login form read "Unexpected error." I was looking over the shoulder of a support technician named Eric who had just tried logging in to our product for the first time. His cubicle looked identical to the other eighty or so scattered around the room except for the custom-built mechanical keyboard with unmarked keycaps and neon-backlight sitting on his desk. We'd been getting along great, but now he was looking back at me with an expression of betrayal on his face that told me exactly what he was thinking: “Please tell me you haven’t wasted my entire Saturday.”

I turned around and felt my throat tighten as I saw the same error message appearing all around the room and his colleagues starting to look up in confusion.

All I could think was, "Oh, shit."

It was a sweltering Saturday morning in Houston in 2018, and I was in the network operations center for an IT managed service provider (MSP). MSPs manage IT as a service for other businesses - everything from domains and websites, email and cloud services, to security operations and network infrastructure.

This company was one of our first ever customers and by far the largest. We'd organized a weekend training session for their support team to learn how to use our startup's web app to rapidly locate information about the systems they managed so that they could respond more efficiently to support requests. But now, only a handful of them could even log in, and as I walked between the cubicles seeing the same error pop up on every screen I passed, I could feel frustration starting to buzz around the room.

Our CEO waved me over. "Hey man, it looks like we have a problem." He was projecting calm as always, but his smile was forced. Even though he said “we” he meant “you” because it was definitely my problem. Despite joining the team just a few months ago, I was one of just two software engineers in the entire eight-person company at the time and the only one in the room that day.

That's how I found myself desperately hunched over my laptop, hands shaking, searching through a codebase that still felt foreign to me, and wracking my brain. This exact build had worked fine when we were testing it during the past week. What was different about today? Why couldn't anybody log in?

Every second felt like an eternity as I tried to keep my rising panic at bay. “Focus, you can do this,” I told myself. The error message was generic and unhelpful in narrowing down the problem, but I knew the issue must have something to do with the login functionality because that's where people were getting stuck.

Then it hit me. Today was the first time we'd ever had a large group of people accessing the product simultaneously. I was still learning the codebase, but I knew we had some half-baked security feature setup to prevent too many bad login attempts. Sure enough, when I logged into the database, I could see that we were incorrectly tracking the total number of login attempts. Instead of counting per user, it was counting each attempt across the whole system. Gigantic facepalm.

I called the other engineer on the team to confirm what I was seeing and that what I was about to do wasn't completely crazy, and then fired off the command to wipe the table that tracked these attempts. The table emptied. I asked Eric to try to login again and held my breath as I listened to his clacking keyboard. A second later, the dashboard of our product appeared on his screen. "I'm in!" he exclaimed.

Relief flooded me as I heard someone behind me say "I just got in too. It’s working now.”

For the rest of the morning, I walked around the room answering questions, and occasionally crouching down behind a cubicle to discreetly re-wipe both the database table and the sweat off my forehead, grinning to myself at the ridiculousness of the situation. Despite our near-disaster, the training ended up being a success.

It's been three and a half years since that day, and it feels like that time passed in a blink of an eye.

In-person Saturday training sessions have been replaced by monthly feature update webinars, a Slack community, bi-weekly customer onboarding calls, and an outstanding documentation site to support more than 1500 MSPs and tens of thousands of users globally. The product that began as a bowl of digital spaghetti has transformed into a collection of well-tested microservices that can scale to collecting tens of terabytes of configuration data. Our backlog used to be so short it fit on a whiteboard. Now our product management team maintains a year-long roadmap and meets with users daily to collect usability feedback. And our engineering team has grown into a delivery machine with a mature software development lifecycle.

In other words, if you were looking at the team and the company today, you would be forgiven for believing that we knew what we were doing all along. You'd also definitely have trouble imagining just how fragile and humble things in the early days were. Sometimes I even have trouble remembering it objectively myself, and I was there! That's why I love this story. It helps me remember more clearly despite the rosy haze of nostalgia.

The reality is that the early days were hard.

They were hard because everything was so uncertain. When I joined, there were just five of us: the two founders, a sales guy, an engineer who had been building out a prototype of the core product over the past six months or so, and me. We didn't have any funding, the product was barely functional, we didn't have any paying customers yet, and we didn't know what we needed to do to get them.

This uncertainty created tension. The sales guy wanted to build more data collectors, the founders wanted to focus on platform integrations, and we engineers wanted to over engineer a graph database for storing IT configurations. We were working as fast as possible and creating a graveyard of technical debt in our wake. Everybody cared, everybody wanted to make it work, and everybody was smart and headstrong, which meant that we each would start tugging in the direction we thought things needed to go. I assume this happens a lot in early-stage startups, and the only ones that survive this have founders that have a strong vision and can sell that vision to the early team. It's a cliche for a reason: if you can't all start rowing in the same direction, you won't make any forward progress.

Fortunately, the founders did have a powerful vision for the product they wanted to build. Unfortunately, we couldn't create it all overnight, so it felt like the engineering priorities shifted almost daily. We had many features that were only half-implemented, brittle, or buggy. We were continually hammering on different parts of the product, slowly beating it into shape, and simultaneously they were actively meeting with prospects and trying to close our first deal. We were trying to meet in the middle -- get the product to "useful enough" to where somebody would be willing to pay for it.

I remember being dumbstruck when we closed our first deal because I knew just how buggy and brittle the product was at that time. But I realized that most of those early customers weren’t buying because the product worked so much as they were buying into the vision for what we were trying to build. Our earliest customers were like an extension of the team in this way. A drip of validation was addicting, and it sustained us for a little bit longer. So we kept listening to our customers, and kept iterating on the product. The product got better, and the drip of new customers became a trickle.

In those days, there was tremendous pressure on engineering. Nobody knew for sure which piece of the roadmap we should bite off next and how polished it needed to be until somebody would buy it, and we had about 2% of the bandwidth that we needed to finish all the tasks in our queue anyway.

In the beginning, I used to tell myself, "as soon as we solve this, we'll be able to take our foot off the gas just a little bit." This was a fallacy. The challenge currently facing us was simply obscuring the endless future challenges queued up behind it. Once we got our first couple of customers, we quickly shifted from pure "execution-and-delivery" mode to "keep-doing-that-but-also-fix-all-these-damn-bugs-or-we-might-lose-this-customer" mode.

In fairness, this pressure was self-imposed versus a toxic work environment. We didn’t want to let our teammates down, and this made it feel like the world was on our shoulders and led to more than my fair share of anxiety attacks when the product broke. One time a new customer imported a CSV of 50,000 users into his instance accidentally, burning through an AWS Service limit and completely hanging the instance up for the afternoon. A few months later, there was a four-day stretch where I was woken up by our monitoring alarm at 3 AM and had to crawl out of bed to get the app back online and then obsess the rest of the day over finding the root cause. These were challenging but invaluable learning experiences. Nothing burns the importance of observability into you more than a week of sleep deprivation and stress.

Even though it was a pressure cooker, there was nothing more fun than being in the trenches with the early team. Everybody was working so hard. The vision was ambitious and exciting, and it felt like we could see a future that hardly anybody else seemed to be able to see. It was exhilarating to finally have some customers that were truly getting value out of the product.

After we found some traction and raised some money, we started growing the team. The first few people we hired back then always came from our personal networks. They either were people we encountered at the co-working space, consulting with us already, or they were an old friend or acquaintance.

Once we got a little bigger, we began hiring outside our networks. That was a lot harder and time-consuming (especially for engineering hiring) because we needed to write job requisitions, advertise positions, get familiar with an applicant tracking system, and formalize our interview loops. It took us six months to land our first outside-of-network engineering hire. Before that, we had two candidates get to the offer stage only to realize that they weren't that serious about joining our team and had been going through the hiring pipeline for practice and to use our offer as leverage. It was a good learning experience, but it was demoralizing to be on the receiving end of that. Don’t do this, especially to small startups where time and energy are limited. Anyway, hiring got easier after we built out our applicant funnel and the profile of the company started to grow. Once we hired a few solid folks, our collective network grew and we started getting referrals again.

As the team grew, all of our processes, both internal and external, started evolving constantly. More people on the team meant we needed more efficient ways for information to travel through and out of the company. There were many knobs we experimented with: changing how teams and departments were organized, making sure all meetings had a clear purpose and were scheduled at the right cadence, making sure everybody knew which communication channel was for what (when to Slack versus email versus make a Confluence page), and tweaking how we used our task tracking tools to better align with the way we were working together. There's an art to it because changing a process causes mental friction as the team adjusts, so you find that you have to balance the necessity and frequency of changes with the severity of the communication bottleneck you are trying to address. The key is just to reflect with the team regularly (retrospectives are the perfect time for this) and build consensus on the team for small, experimental iterations. "Let's try this change out for the next two weeks, and if it's not working, we'll try something else."

We started slowly shifting from every engineer on the team knowing everything about all the code, to the minority of the team having that legacy knowledge, to nobody on the team that knew about everything. It was around this point that some engineering decisions and code from the early days started to look completely insane. This was a transcendent experience to have because these were often problems of my own making. Years ago, when considering a design tradeoff I told myself, "if we get to that point, it will be a good problem to have." And sure enough, we finally got to that point!

It made me realize just how evolutionary the development process is. As we built, we were slowly refining our understanding of how the product delivered value to our customers. Thus, we were guaranteed to have stuff in there we didn’t need anymore because priorities shifted, or we overengineered something because we weren’t sure what direction the feature would take. We knew that we should probably back out that thing that only 0.5% of the customer base is using, or add tests and refactor that old service that's become difficult to work with, but our product team wanted us to stay focused on delivering roadmap promises. There was not enough bandwidth to perfectly execute in both directions and so you always have to work out a compromise. Don’t get frustrated by this! Remember: everybody is tugging in the direction they think is right for the company. The product team (and all departments outside of engineering really) want the same overarching thing you want in the end – happy customers and a successful outcome – so it’s important to make them aware of land mines they can’t see and at the same time listen carefully to make sure you haven’t lost sight of the larger picture. It's so much easier to build consensus when people feel heard.

At some point, I looked up and realized I was now in a “real company”. We’d raised multiple rounds of funding, there were 100+ people on the team, and we had thousands of customers. I had a weekly management meeting with other department heads where I gave an update on my team’s KPIs, and a monthly meeting with the CFO and our VP of Engineering to review the budget and hiring plan. I didn’t have as much headspace to think about code and architecture anymore because I was doing lots of one-on-ones, interviews with candidates, and meeting with the product management team to talk about shifting roadmap priorities. In other words, I had become a much smaller cog in a much more well-oiled machine, and just like everybody else in the company at this point, I wasn’t mission-critical anymore.

This was such a bittersweet feeling. On the one hand, it was wonderful because it meant we had been successful in building processes and a product together that could stand on their own, which was always the goal. On the other hand, I felt a sense of loss. I missed the days of hacking for 10 hours straight, when the backlog was on a whiteboard and the whole team could fit around one table. I missed personally knowing every single person in the company and recognizing the name of each company using our product. I even missed how broken and messy everything was in the early days when it seemed like a stiff gust of wind could cause an uptime alarm to go off, or you needed to truncate a misbehaving database table to prevent an in-person training session from devolving into a disaster.

Early in my career, I wanted nothing more than to be part of a high-growth startup company. I used to think that in order to be successful a startup had to get the massive majority of strategic decisions correct, as if building a great company was like tiptoeing your way across a tightrope.

Now that I’ve actually been a part of one, I’m convinced that it’s much less about getting everything right so much as it is about focusing on what you can control: patience and persistence. The journey is long and you are going to mess up a lot along the way. It’s an endless cycle of showing up every day, making mistakes, recognizing them, and then applying what you’ve learned to do better the next day. If you want to maximize your chances of success, find founders (or be a founder!) that has the humility and fortitude to commit to this endless cycle of getting incrementally better. Then, just show up every day with a smile on your face, ready to grind. When you look back after a few years you’ll be amazed at how far you’ve come.

By the way...

The company I'm talking about in this post is called Liongard. They are hiring!

Join the discussion on Hacker News.