Optimizing Slow Accesses

While most software developers like to think of themselves as computer scientists in the purest sense of the term, with job duties that would include intimately understanding and exploiting efficiencies of the x64 processor platform, optimizing that critical-path O(log n) algorithm to perform in O(log log n) time, and other acts of mathematical creativity and scientific application, that’s not what most software developers do (or should be doing if they are).

Most software developers are building business (retail B2C, B2B API’s, or LOB‘s), not scientific applications — and that means most are developing I/O-bound, not CPU-bound applications. Specifically, most business applications are creative user or application programming interfaces around relatively mundane CRUD operations on a data store. Even more complex applications that perform data synchronization or novel calculations of co-variance or multivariate regression consume maybe 5% of their time crunching data, and the other 95% of the time retrieving and sending it on.

So, when you design an enterprise application and get past the ideation phase and start scaling out your next-generation game-changing application from a cute demo to a serious and robust application serving millions of requests, why would you bother with refactoring your string concatenation in loops into string builders, aiming for zero-copy, or optimizing for CPU performance? You should not and you should: You should not be optimizing for CPU performance, unless you have optimized all your slow accesses away — and you should be optimizing for CPU performance because hopefully you’ve already squeezed all the blood out of the I/O turnip you can.

But you haven’t. I know you haven’t. You know you haven’t if you are being honest. Have you ever looked at your database queries per second for specific-entity queries? For instance, let’s say a user logs into your enterprise application, and a service on your application tier needs to retrieve the record of a user. That service might call another service to make a record of the user’s login. Then the user navigates to another page in your application 60 seconds later. How many times did any component of your system retrieve the user by their unique identifier? If the answer is, “I don’t know”, you haven’t scratched the surface of scaling an enterprise application, much less my most important axiom of doing so: “Don’t Repeat Requests“.

This is a lot harder than you might think, because enterprise web application development lends itself to repeating requests, and it is not an easy problem to solve, because you are essentially creating state on an application tier for a web tier that hosts a stateless HTTP application protocol. When functionality is segregated into multiple services with distinct responsibilities, there is some duplication of I/O access that happens to fulfill a request that is unavoidable. Unless you and everyone on your team completely understands this disjoint and works collectively to design solutions that do not repeat requests, you will repeat requests as part of the natural design of any system.

Caching Isn’t a Magic Bullet, But It Is a Bullet

If you thought this post was going to end at “implement second-level caching on your ORM of choice”, you’re wrong, but you should be doing that for sure. This is usually as easy installing a caching server like Couchbase, configuring your ORM in a few lines of code or configuration files, and wala – you are still repeating your requests, but this time, answering your repeated requests will be a lot faster than any SSD-backed database server will ever be.

(I say ‘usually’, because this depends on how you’re using your ORM. If you use your ORM as an expensive way to execute stored procedures, your ORM will be at best a pass-through for database methods and will not give you the benefit of entity caching that could be reused for multiple queries that include that entity as a result. As with all caching, YMMV depending on how you have designed your layers.)

Once you enable caching, measure. Measure how many times you ask for that user record when a user logs in and performs some actions over time. You’ll be amazed that when you view this from a database request level, you will still be asking for the same user over and over again as long as not every component is using the cache for database entities with a consistent cache key. It’s very hard to get right, both from an application configuration and a caching server configuration perspective — do not assume, but do measure.

Remember, the most important thing to remember is not to get really fast answers to your repeated questions, but stop asking the same questions over and over again! Caching at the ORM is your tourniquet to stop the bleeding of your performance into database I/O buffers and wait times, but caching at the inter-component request level is critical. Let’s say you have an enterprise web application that retrieves a forecast for a city for a given period of time. The web client makes the request for the locale and date range to your application tier, which translates that into queries of whatever entities comprise your data model. With ORM second-level caching in effect, the next request for the same locale and date range will not ask the question of the database this time, but the answer will come instead from the second-level cache… but stop right there. The question was asked again at a higher level, you’re just answering it in a more intelligent way the second time around.

Enterprise web applications need to cache the responses of service requests using a cache key that accounts for the parameters of the request. Hopefully your web application faithfully implements a repository pattern, and if so, you implement a cache into this layer to eliminate repeated requests to the service layer to start with. This is not easy. This is hard because your ORM’s database caching is likely a black box implementation of complex cache expiration logic that performs all sorts of clever tricks to know when an entity has become ‘dirty’ and needs to be retrieved again from the underlying database rather than use the cached copy. If you’re developing business applications, you’re probably not accustomed to being clever at this level, and you will need to spend the time to implement this manually throughout your repository pattern (unless you thought ahead and can add caching as an aspect) and to bust your caches.

Challenges of Busting Caches

Busting your own caches – that is, invalidating a cached entry when you have reason to know the cached version is no longer good – is one of the trickiest things to get right in this stage of Don’t Repeat Requests. Let’s take a service method called GetUser() that returns the user and an object graph of some interesting things that cover multiple data entities from the database. At the web tier, we start caching that call when we make it so subsequent calls from the web tier won’t even request this from the service while its in cache. But what else could change the User object in the database? If the user themselves can, then that’s easy enough to know to bust a cache on a User repository .Save() method, but if other unrelated processes can, such as say, a back-end service process that bulk-updates users for some reason, then this gets more challenging to ensure you’ve identified all the paths that could invalidate the data and make sure each have access to bust the cache for the GetUser() response as cached by the web tier as well as the User entity as represented in any other request (think GetUser(), GetUsersByWhatever(), and all the other variants that may also need cache busting). When GetUser() actually includes data sourced from other entities, you have to think about the dependent object graph in the data model and ensure you’ve accounted for these as well. You just have to consider but not handle this recursive analysis for deep object graphs — it only matters as much as it matters for the user experience.

This kind of task must be reserved for the architects and most senior engineers who know your system design and inter-dependencies inside and out to avoid data consistency errors. A key point is as long as all data validation logic is performed at the lowest layer under any custom caching work you perform, data consistency errors will at worst create a poor user experience. If you don’t – if you have critical client-side validation that is not mirrored under caching on the service-side of your architecture, you have bigger security risks and other problems than caching, but this will definitely impede your ability to deploy service request caching and scale your application.

Caching From Within

Within any area of your application, beware anti-patterns that repository patterns can create. If you author MethodA() that calls MethodB() that calls MethodC(), all of which individually call UserRepository.GetUser(), then you’re recursively repeating yourself. Repository patterns are nice because they reduce the repetitive session and connection management functions involved with making a web service or database call, but they make it easy to forget that they’re very, very heavy methods.

Do not be afraid to accumulate. Do not be afraid to pass object graphs through method parameters to save I/O. You could think about the call stack as your cache here, and while you shouldn’t load it up as an unnecessarily heavy omnibus object to pass around to any method, and while you definitely should not front-load all your I/O before calling a logical method chain before conditional logic or exception management could make some of the calls unnecessary, intelligently design methods so they don’t take the smallest parameter set possible, but create the best scalability when working in concert.

Caching Outside Your Boundaries

If you’re writing enterprise web applications for a product that is not dying or decaying, you’re writing it in HTML5 today. And if your web design isn’t from a Frontpage 98 template, you’re probably using AJAX requests either to improve user experiences and reduce perceived page load times or maybe you’ve gone whole-hog into an SPA design. With HTML5 and a relatively modern web browser, you have LocalStorage. Use LocalStorage.

You should be using LocalStorage to cache and bust non-error responses to AJAX requests to your web services and REST endpoints. Just because you’ve thinned out the pipes from services to the database and from the web tier to the services tier, why stop there? Why continue to allow browsers to repeat requests to your web tier as a user moves back and forth between areas or pages? If you rest on your laurels on a job-well-done, but still repeat unnecessary I/O queries at a level higher up in the chain, then you’ve made your application more performant but not truly scalable — you’ve just shifted the blame.

The F5 Test

I propose what I will call the “F5 Test” for scalability. When you’ve cached all you can cache, and every layer is implementing the “Don’t Repeat Requests” mantra, open up your database profiler and your Couchbase cache hit dashboard. Log into your application’s dashboard, reporting, or whatever page you want to test, then clear your profiler and cache hit counters. Press F5. You should see very, very little activity on a reload, and you should be able to explain what you do see.

But, for what you do see, justify each and don’t make excuses for yourself:

If your dashboard makes repeated requests because you feel it “always needs to be up-to-date”, then you’re doing it wrong. Cache and use server-side events to refresh your cached copy.
If you load a user object to determine whether they have a login session, then do you have a good reason for not using browser evidence such as a signed SAML assertion to validate a session instead of using a database lookup to verify a user exists and is authorized?
If you see something you can’t explain, investigate. I wish this was as obvious as it is intuitive, but many times software developers will be content with an arbitrary improvement (I made 232 database calls on login go down to 47) rather than to do the homework to find out why 47 isn’t 5. Maybe there are 42 extraneous requests made by a service that doesn’t use the cache even though you thought it did. Maybe one of those 42 requests causes database locking escalations that won’t scale with load.

Optimizing Query Plans

Oh yeah, and optimize query plans. This is important work, but it’s not the outer-most layer of the onion. It’s important to remember the difference between scalability and performance:

Performance should be determined by the user experience from dispatching of the request to final rendering of the result to the user in their browser. Performance is not “how much CPU does the system use under load” – that is resource utilization, though many people use performance for both concepts.
Scalability is two-fold: How many users can I get a certain level performance on a certain hardware basline (scaling up), and can I and how often will I have to throw money at more hardware to handle more users at the same level of performance (scaling out)?
Improving performance may or may not improve scalability
Improving scalability rarely improves performance
Management will not understand the difference

Optimizing query plans can impact both: improving a query plan from 6 seconds to 1 second improves performance. It could improve scalability if your queries are over complex joins or large data sets that couldn’t be pinned in memory automagically in your database server. But optimizing query plans for speed alone is not a function of scalability — optimizing them for I/O is where it’s at. Simple improvements like changing JOIN’s to EXISTS’s where feasible allow the query engine to skip unnecessary I/O is what opens up buffers and improves throughput through the disk subsystems where the big performance and scalability penalties hit. It just so happens complex queries that have I/O in intermediate steps also have high CPU due to hash matching, rewinds, and other operations that perform calculations on large amounts of data generated from unnecessary I/O.

It’s work you should do, but you shouldn’t do it first for scalability reasons.

After-Thoughts: Don’t Report Stupid Results

Building highly-scalable applications from the ground up with a large team is impossible. You iterate scalability just as you iterate product features. Actually, hopefully you iterate scalability tasks along with user stories, but in actuality, complex enterprise web applications are usually architected with the best of intentions with intelligent designs, but reach a breaking point at some level of load on some hardware platform that cause a stop-drop-and-roll effort to improve the scaling up and out of an application. Companies with deadlines and tight deliverable schedules don’t consistently evaluate and factor the required work to make and keep an application scalable over time into iterations. If someone tells you differently, they’re probably in sales and they’re definitely lying.

That being said, software developers, do not succumb to the pressure to deliver scalability improvements by reporting true but irrelevant statistics to management.

“I sped up database calls for GetUser() by 300%!” suggests anything that gets a user should see a three-fold improvement in speed. If that database call is 1% of the login process time, then it will have no material impact.
“I reduced the size of page requests from 500K to 250K!” means “I doubled the performance or scalability of the application” to management, but in reality, it means neither.
“I found a problem between ServiceA and ServiceB and cut out three extraneous calls between them!” means nothing to anyone. Did you remove three calls that are made once an hour by a batch process, or three calls made for every user login? What was the impact of those calls on performance and scalability before and after the optimization?
“ServiceA is a big problem and has a lot of errors. I removed a lot of exceptions on ServiceA. Exceptions cause performance problems.” is problematic on several levels. Why were the exceptions being thrown? Did removing them fix or just sweep a real problem under the carpet? If it was justified, what improvement did it have on the overall system?

When software developers communicate their changes, it implies they have meaningful impact. However, many software developers fail to measure the before and after impact of their changes on the whole system, but typically only evaluate them in the microcosm of the area they changed. This is about as useful as management suggesting areas they should fix based on intuition or high-level reporting tools.

While most devs don’t do scientific computing, scaling applications is an empirical task that demands meaningful measurement in a realistic testing context. There is spec document or product owner guidance on improving scalability: you must treat it as a scientific experiment. Observe, hypothesize, have a control (the pre-change measurement), experiment, report data. If you fail to discretely value each change with before and after metrics, you’re just shooting in the dark. Cowboy coding gets teams into scalability messes, not out of them.

Especially, though, don’t give updates on enhancements that you cannot verify improve scalability with before and after numbers. If you fix a problem that doesn’t improve the overall system scalability, which happens often in scalability improvement iterations, highlighting your accomplishments when there is no observable improvement suggest you are either ineffective or not working on the right items. Worse, in crunch times, providing such updates gives a false sense of accomplishment to management. Improving scalability, or performance for that matter, has no done-state. But providing meaningless accomplishment notes to management will accelerate the sense of “we’re done enough”, when in fact, you may not have even identified the most significant issue to your scalability for your particular scenario.

And if you haven’t, let me do it for you: You’re repeating your requests. Trust me on that one. :-)