By Ashish Pathak
Up until 2018–19, we were a small, tightly-knit team of developers working on our driver app for Android. We were shipping fast, build times were quick, and not stepping on each others’ toes was easy.
Off late, we have reduced our clean build times by 33% and the time to test the change on the developer machine by 80%. We now have the ability to track crashes, ANRs, test coverage, and binary size feature-wise. The code reviewers are assigned automatically. This gives a lot of health metrics to us and we can make more data driven decisions on where to spend our precious, limited engineering bandwidth.
Seems super organised? This wasn’t the case in the beginning. Growth hit us like a truck! Our product requirements grew rapidly and it became evident that a small team of developers will no longer suffice to handle these fast changing needs of the product. We were also heavy on experimenting. So, oftentimes, we had two implementations of the same feature which we would toggle based on our experiment results. All of this quickly grew our code size significantly—from just a few developers to 3 clusters with at least 2 pods each. All of our code was inside just one monolith app module.
With all this growth, we started noticing the slow build times. Based on the gradle enterprise data, builds were taking about 30 minutes on CI and 10-15 minutes on local machine. Even incremental builds were taking 5-10 minutes.
Apart from these, from time to time there would be merge requests with a lot of merge conflicts. We also introduced clean architecture in our app which helped significantly but often people who were not fully comfortable with it, would still make mistakes like not separating the API and the implementations.
Meanwhile, we were also planning to solve more administrative and quality related problems. For example we were trying to define and enforce budget for:
- Binary size added per feature
- Crashes/ANRs per feature
- Where should we increase QA automation?
- Feature ownership
- More safety nets, monitoring & alerting.
When teams analyzed the above problems, they quickly found out that having all of our code in a single app module is a huge problem, especially for feedback loops(the red -> green -> refactor cycle of TDD. We were not able to follow it before for some time because of build times). To achieve this, it became evident that we need:
- Faster builds
- Faster feedback loop on IDE?
- Enforced separation of concerns by design.
- Team wise productivity metrics
Fast builds ?
We were already using the Gradle Enterprise and we also integrated the remote build cache. A quick look at the scan revealed that most of the time in the build was spent on the kapt and compilation. We wanted to have a small impact area for any code so that there is less recompilation. But, with constantly changing code, it was difficult to execute on. It was clear that modularizing the codebase would help us have a small impact area for any change. Also, we would need a mechanism in place that would prevent the ripple effect of the change slowing down the compilation. Hence, modularization is the key to address those sentiments.
Enforced Separation of Concern by design ?
There are two terms that drive Separation of Concern: Coupling and Cohesion.
During that period, we mostly addressed tight coupling between components, even after making a shift to the clean architecture, we were having problems enforcing that everywhere. Part of the reason was that there was no clear code ownership defined. With all code present in the app module, and the entanglement it had, modularization essentially would drive high cohesion codebases organically and it seemed like a viable option where we could incrementally change the small parts of the codebase until all code is cleaned. We thought if every module had clearly defined interfaces to the outside world, we could change the implementation the way we see fit. Also, different modules could experiment with different architecture patterns without worrying about the rest of the codebase.
Team-wise productivity metrics ?
We also wanted to maintain the team-wise productivity metrics. Metrics like which teams are adding most to the app size? Which team has the highest rate of crashes and ANRs? Whom to tag for code review when creating PR? Refer bullet points above. The administrative capabilities would make the team management and code management easier. Modularizing seemed a viable option here because we could then ask teams to own specific modules. So, anything happens in those modules, we have clear ownership defined. We can measure stuff on team or feature level.
After we, as a tech team, were aligned on why we need to modularize the driver app, we had one more major hurdle to clear: Convincing the management team to allocate us enough budget so that we couple modularize the driver app. It was not an easy task by any measure. Modularization is not a quick one time activity that we can assign people to and execute. It required careful planning and more importantly time. Defining scope of the change itself was a very daunting task. We still tried our best to add as many details as we could before talking to the management.
Pitching modularization ?
We started discussion with our leadership about measuring the team-wise productivity metrics.
For example:
- Stability metrics: Eg. Visibility of crash rate per module.
- Health metrics: Eg. Test coverages per modules
- Administrative capabilities: Eg. PR reviews / Code Owners
- Maintainability metrics: Eg. App size.
These items gave us an entry point to pitch modularization. With most of the code in the app module, it is hard to address administrative problems cleanly. The code areas are spread across the app and are entangled with each other, restricting us to define clear boundaries.
Stability metrics gives us insights into which features are contributing to crashes more messing up with Android Vital scores.
Health metrics gives us good confidence on overall health and quality of the code.
Maintainability metrics tackle more long term issues which arise as a side effect of adding more features.
With these metrics in place, we can allocate budgets to all features for these metrics and enforce those from the CI. Eg. feature cannot have more than x number of crashes, less than x% test coverage and cannot consume more than x amount of bytes in the whole app. These things were also a burning concern for us to address Android vital issues. It is not like we could not track it before but with modularization, we could track it at more granular levels, figure out actionable items, and avoid those more neatly.
This made it easy to convince the team of the value proposition of modularization.
In the next blog, we delve deep into the way we approached the implementation. How did we analyze? How did we plan? How did we execute this plan?
Check out Part-2 of the blog here:
Curious to know how we do what we do? Check out more blogs from our vault.