Recently, I was asked to give an interview about my experiences building with Node.js by the excellent folks over at Talentbuddy.
Can you please tell us a little bit about what you do now and the path you took to get here?
I’m currently leading the efforts around Netflix UI’s transition to Node.js. Which means I’m in the fortunate position to work with a group of superlative engineers and help shape the direction of the Node platform. I also maintain the popular open source Node framework restify.
We have moved Netflix.com from a struts-based Java application to one that runs on Node.js utilizing technologies such as React, Restify, and Bunyan. We’re now in the process of creating a Node-based platform to enable broader adoption of Node across not just UI engineering but the entire Netflix organization.
I graduated with a degree in Computer Engineering from the University of Waterloo. I started my career at AWS, where I helped launch the AWS IAM service, and scaled out AWS’s metering architecture on what was then a bleeding edge technology, Hadoop. From there, I went to work at Joyent, discovered Node.js and was part of the core team that launched the Manta object store and compute service, built entirely on Node.
To be clear, we haven’t migrated “everything” to JS and Node. We have backend infrastructure which is Java based. This is slowly changing as more and more teams get ramped up on Node and other newer frameworks. The legacy website stack was a monolithic Java application mostly maintained by UI engineers. As you can imagine, context switching between the Java backend and the JS frontend was often expensive.
With the rise in popularity and increase in robustness of Node and its ecosystem, the UI teams wanted to work in a language that was already native to the presentation tier. It made sense for engineers already experienced in JS to make the transition to JS on the server.
We have improved performance and reduced infrastructure costs with Node. We now run ¼ of the EC2 instances on Node compared with the legacy Java stack, whilst serving the same number of subscribers at lower latencies.
Node has been a boon to the productivity of engineers. We’re able to iterate much more quickly, given the dynamically typed nature of JS. The dynamic nature has enabled us to rapidly develop, test, build and deploy presentation layer and server side code, including templates and client side assets. All of this means we’re able to more quickly innovate our user experience for customers.
This wasn’t without its challenges. We were the first non-Java production service at Netflix. This meant we had to rethink how we integrate with all of our Java dependencies and libraries — such as the excellent Hystrix fault tolerance library that is the foundation of all services at Netflix. This has spurred innovation and resulted in Prana — an open source Java-based proxy service born at Netflix, which allows integration with all of Netflix’s Java-based libraries and services. As a result, Netflix now supports a polyglot ecosystem. Check out our talk for more information.
What are the biggest challenges a company should expect from developing and running a large scale Node app? How did you overcome them?
We’re right at the front door when it comes to our customers. Any degradation in our app means customers are unable to enjoy the experience they’ve come to expect. Since our performance directly impacts customers, it’s imperative that we can quickly diagnose, pinpoint and fix issues that affect our stack. These can be performance bottlenecks, errors, memory leaks, or even Byzantine faults.
It’s hard enough trying to debug these issues in a development environment, when you have many tools available and are not affecting customers. They’re much more troublesome in production, when customers are directly impacted, tooling is limited, and there are orders of magnitude more requests to sift through.
We use several techniques to solve this problem. First, our application has been designed and built with visibility and observability in mind. They are not afterthoughts that are haphazardly tacked on. This is why we chose to use the Restify and Bunyan frameworks as they give us the ability to observe key metrics of our application at runtime. For more details, check out our talk on building observable applications. Usually from these metrics, we can quickly diagnose and fix application errors.
Second, we leverage post-mortem analysis for problems we can’t diagnose expediently, or for issues such as memory leaks. We can capture a core dump of our running process at any time in production; this lets us capture all of the state of the process and then reboot it — allowing us to temporarily mitigate the problem whilst we asynchronously investigate the root cause. We can then use tools such as mdb to walk all of the JS objects in the heap, or examine the stack, all at a later time, when customers are not impacted. Node core dump analysis is extremely valuable and should be in the toolkit of every production Node engineer.
Last, for performance issues, CPU profiling via flamegraphs gives us a way to easily visualize and improve performance. You may recall a blogpost on our techblog where we used flamegraphs to help debug a performance issue. We’re able to pinpoint specific sections of code that spends the most time on CPU with this technique and can focus on optimizing relevant code paths that are hot on CPU.
Did you get any important benefits from Node.js that you didn’t consider initially?
With Node, we have the ability to profile CPU utilization with flamegraphs by sampling stack frames with the Node –perf-basic-prof option and perf events on Linux. We did not have the option to do this on the legacy Java stack, as this feature did not exist in the JVM. Java is only now catching up, thanks to Netflix’s open source work, championed by our in house performance expert Brendan Gregg.
Netflix is a tech company and an entertainment company. We see a convergence on web technologies, but it’s hard to predict where the industry will go as a whole. For Netflix, updatability is an important part of our strategy — it allows us to rapidly innovate, rolling out new features and A/B tests without having to rev the platform. That’s easy to achieve on the website or even mobile apps but harder on TV-connected devices. Some companies will choose to go the route we have and others will build native apps for their devices. It just depends on what trade-offs are important to their business.
The UI engineering org was the perfect confluence of people and technology for Node. It allowed them to not have to move out of their preferred language on the server. With the success of our Node application, we’re starting to take the next steps at Netflix to make Node a first-class citizen in the Netflix ecosystem, as other orgs have shown tremendous interest.
With that said — there are still many differences between client side and server side JS. The biggest and perhaps most impactful is that the server is a multi-tenant environment. Engineers must be mindful of this, especially coming from the browser. Errors, performance bottlenecks, and memory leaks will affect everything hosted on the app, not just an isolated browser page.
This is why we build our app with visibility and observability in mind so we can fix these issues. We’ve also compiled best practices around building Node applications and hold regular workshops and discussions with engineers.
How would you describe the culture in the engineering teams at Netflix?
We have a high-performance engineering culture with a strong peer group. Our culture deck is famous for its ethos on “freedom and responsibility”. We’re driven to continuously improve our product experience for our customers. Engineers are empowered to innovate on all aspects of our technology to achieve those ends. We were able to spin up our new stack on Node.js quickly as a direct result of our culture.
What are the main drivers in your organization that help developers grow?
We don’t have a lot of levels at Netflix. We’re pretty flat. We’re not title driven. We focus on people’s impact. You can help shape your role and path at Netflix by getting involved in the aspects you’re passionate about. Identify gaps and help fill them. Be creative and innovative.
We are looking for engineers who are passionate and curious, have solid CS fundamentals, and are excited about working on challenging problems.. We look for these attributes because they are ones shared amongst engineers at Netflix. Of course, enthusiasm for our products and culture is also important.
We have a culture of innovation that permeates throughout the company. We’re not afraid to challenge the status quo and experiment. We’re shaping the future of global entertainment because we believe there is a better way to watch.
We’re operating at a scale that can offer exciting new engineering challenges to developers. With more than 60 million members in 50+ countries watching over 10 billion hours of content a month, our client applications are among the most successful and widely used across a range of devices, including phones, tablets, game consoles, TVs, and desktop/laptops — all of which use JS as their presentation foundation.
Our high-performance engineering culture makes for an amazing peer group. We have a lot of fun working together but we also challenge each other to push outside of our comfort zones. We create ground-breaking technologies and better products through our collaboration.
We’re very active and plugged into the JS community, with engineers on both the Ecma TC39 committee and Node Customer Advisory Board. We also create and contribute to many open source frameworks. For example, we’re getting ready to open source Falcor, an asynchronous, multiplexed network transport layer built in pure JS on top of HTTP.
What were the most important things you did as a beginner engineer that had the highest impact on your career?
I came out of university knowing only that I wanted to work on distributed systems software and was fortunate enough to be surrounded by a group of great mentors at AWS. The people and projects you work on really mold your career early on. These in turn shape and influence the kind of engineer you become.
From my early experiences I learned about the Unix philosophy, which is one of the most important principles of software engineering. Turns out they got it right back in the 70s. Some advice I would give to engineers starting their journey:
- Find great mentors.
- Don’t be afraid to get your hands dirty and do as much as possible.
- Don’t be the smartest/wisest person in the room. If you are, find a different room.
- Embrace the Unix philosophy.
How do you learn new things? Do you attend conferences, learn through building products at your job or work on your own projects outside your job?
I do a combination of all of these. Conferences are great at getting exposure to new and different ideas. However, in order to deeply understand the internals of a particular piece of technology, I find it more fun to get my hands dirty. I also read and follow many inspiring thought leaders from the engineering community to stay current on new ideas.
The challenging aspect of our industry is that the technology never stands still. So we as technologists must stay vigilant and motivated to keep up — or risk falling behind and becoming obsolete.
The most important piece of advice I would give would be to not tie yourself to the platform or language du jour — instead try and embrace over-reaching and timeless philosophies such as Unix. (I realize this is rather ironic given this interview…) In all seriousness, years from now we may have moved on from Node, but the enduring Unix philosophy will still be relevant. Instead of preparing yourself to be a JS engineer, try to become an adaptable software engineer.
It helps to become proficient at the fundamentals of computer science such as data structures, algorithms, and operating systems. Armed with the fundamentals of how computers work, you will find it much easier to work up and down the stack and across platforms.
Remember, nothing is magic, and everything’s just code. If it’s unclear how something works underneath the hood, go spelunking through the source and try and figure it out. You’ll learn so much more in the process. Which brings me to guessing — just don’t do it. Never guess when trying to debug an issue; always use the scientific method and derive your findings from concrete data. If you don’t have data, then go get it first!
Avoid premature optimization. Try and keep your code as simple as possible. Don’t engineer the space shuttle to go down to the corner store for milk.
Last but not least, don’t stay complacent or comfortable. If you’re getting cozy with frontend development, try your hand at backend and systems software. Remember, our industry is constantly evolving, it’s on you to keep up with it.