Design Webapp from Scratch Part 2 — Presentation Layer

Han
7 min readMar 31, 2022

--

Following Part 1, imagine we have a web app architecture like the following:

Figure 1. Standard web application architecture.

And The Core is already coded up by other teams, and your job is to implement a presentation layer.

Components in this layer handles three things:

  1. validate the request parameters
  2. Construct command to call Corefor each request from outside
  3. Generate response appropriately from the return result from Core

Each point worth a couple of blogs, but let’s just walk through them briefly, we’ll go into details later.

TL; DR: scroll to the bottom

Validate Request Parameters

The part includes Authentication of user, verify the input parameters are valid (may be even partial authorizations). The Input parameter validation is supposed to be a static checker, not business logic checker. For example, you could validate the date string is actually a date, or some int parameter is in a certain range. (If you are thinking about Domain primitives for validation, that’s great, you are ahead of me). Note that complex validation such as, “does this date match to any of my blog dates?” should not be in here. These kind of logic is business related.

This part is also good for some quick RBAC authorizations, such as does the user have the permission to access certain service. This is actually a quite controversial topic. Because I have seen designs where full RBAC authorizations are implemented in controller, so the calls to service contains no user data. This way the service layer can implement cache easily because it does not return user specific data. But on the other hand, you cannot really know what kind of data you need until you are actually in the service, so doing RBAC before you call the service is quite hard and limited. That is why I mention partial authorizations, because I do believe some level of RBAC could be implemented here and issue an early return for the benefit of performance, but investing too much effort for full authorization framework in controller is not ideal.

The validation section is preferably a static checker, with little or none database travels. Technique such as JWK authentication (avoid query the database to know the user is valid), or cache a permission hashtable for fast authorizations are very helpful.

Construct Command to call Service

Calling service in controller/command line client should be pretty straight forward (Unless it is cross language, and you are building rpc). You just import the service in controller and directly call it.

import userService from 'services';
class UserController {
@GET
async getAll(){
const users = await userService.getAll();
return formatResponse(users);
}
}

GraphQL, however, takes quite a bit effort to make it performative. Which is one of my focuses in this series of blogs.

GraphQL transfers responsibility to backend

In the old days, we have Rest API, where we define a series of endpoints and return response using some common definition that both frontend and backend agrees.

User {
id: Int
name: String
friends: Int[]
}
GET /users/:id
GET /users/id[] // get list of users
GET /users
POST /users
DELETE /users
PATCH /users/:id

If the js client in browser would like more data, the frontend developer would need to write the entire data fetching logic. For example, if you need to get a user’s friends:

1. Get user id = 1
2. call GET /users/id[] to get all the user's friends.

Possibly if this person got too many friends, one GET call will time out in browser, so frontend developer may need to fetch 50 at a time.

1. Get user id = 1
2. for 50 in a batch call GET /users/id[] to get all the user's friends.

This mixture of both “business logic”(get user’s friends), and “optimization logic”, (take batch of 50 at a time) is not ideal for a frontend developer whose major job should only be dealing with GUI (considering style, match the html with designer’s handout, dealing with interactions, etc), not to optimize how to request all kinds of data. Also as the request of data become more complicated. the “optimization part” may get more tedious.

Hence the rise of GraphQL! A lot of posts have talked about GraphQL’s advantage of getting all the data in one query, so frontend developer never have to worry about these data fetching optimization again. However, the difficulty of data fetching still exists, it’s just transferred to backend developers. In a GraphQL schema shown below,

User {
id: Int,
name: String,
friends: User[]
}
Query {
users(id: Int): User
}

The “optimization logic” reside in the web server now.

optimization done in server

Now it is the backend developer’s responsibility to optimize the resolver. The difference is, in Rest API, the frontend developer work on optimizing his need, but in GraphQL, the backend developer need to optimize the resolver in a more generic way, so that it does not just behave well for one query, but badly for another.

Path to optimization

Dataloader

There are numerous tutorials about how to use dataloader for GraphQL, so I won’t talk in details about dataloaders. In general, it is just a library that can batch multiple calls for single item, into a call for multiple items.

const loader = new DataLoader("function to handle array of ids")loader.load(1)  // load(key)
loader.load(2)
loader.load(3), will be batched and calls the function that handle array of ids.

A couple of things that should be noted:

  1. Dataloader caches the result in a key:value map in its lifetime. So it is highly discouraged to create Dataloader out of request context. It is recommended to create dataloader per request, and destroy it after request is finished. If you really want to use dataloader as a long live object, you should disable dataloader cache. Replace the cache with a expiring LRU cache is also not the solution, because you might have multiple servers, and LRU out of sync between them can cause GUI to be very inconsistent. Now naturally, you would think, “what about a centralized cache for dataloaders in different servers? like memcache”, not only that does not work with dataloader (dataloader’s accept cache.get() method must be synchronous), but also this is a bad idea, which I will explain in the Caching section.
  2. Dataloader ideally would prefer a key that is easy to understand, both for deduplication when multiple calls of load(key) are invoked with the same key , but also for easy retrieval of the value in cache. If the key is just an id, int or string, that is great, but if the key is a complex json object, it is quite difficult to dedup. Even if you provide a cacheKeyFn to convert the key into string by stringify, you may still have to sort them. (because two equivalent json object may stringify to different string because the order of their keys are different.

Caching

  1. Don’t use in-memory cache globally

In-memory caching is the fastest cache to use, but unless it is to cache result of a pure function or immutable data (data that does not change, in which case the getXXX function is also pure), it is discouraged to use in-memory cache in a global scale. In other words, in-memory cache should always be used in some context.

The reason is that, if a function is not pure, for example, the returned value of same input is different due to timestamp, database data update, then in-memory cache might be updated in one server, but still not updated in another server. This causes cache inconsistency. For example:

const cache = {}
class User {
update(id, data){
const newUser = userRepo.get(id).update(data);
cache[id] = newUser;
}
}

Whenever you update user you always update the cache with the newest user. It might work for server1, because the update request was handled by server1, but for all your other servers, 2...n their cache is not updated.

2. Use memcache with caution

The next best thing of in-memory cache, is some centralized memcache, some nice cloud servers offers very fast memcache solution. Some of them response time is just a couple of ms. However, there is a gotcha that I have seen happening many times: memcache used to store id:model . I am not saying this is wrong, but you must be cautious about this kind of caching.

For example, in the above example of “getting one user’s friends”

// setup memcache
const memcache = // set up connection
async function getListOfUsers(ids){
const uncached = [];
const result = [];
for(id: ids){
const cachedUser = await memcache.get(id);
if (cachedUser){
result.push(cachedUser);
}else{
uncached.push(id);
}
}
const rest = await userDb.getMany(uncached); // get list of users
// add rest to result
result = Arrays.concat(result, rest);
return result;
}

The above code is innocent enough right ? for each id, check if it exist in cache, collect the users that exist in cache, for the uncached ids, we make one database trip to get the rest, then merge the two arrays. However, in practice, this approach is much much slower than just a simple call to the database. There are two main reasons:

  1. Memcache is a cloud cache. No matter how fast memcache is, it is still a network request, it is a couple milliseconds at least.
  2. Simple indexed db retrieval is limited by network latency. database retrievals with multiple ids(indexed) is usually very fast (with reasonable size 100 or so), fast enough to ignore the execution time between getOne and getMany , the bottleneck of the request, guess what, is still the latency.

So if you compare the above approach, which may need as many as N trips to memcache, vs a simple database.getMany() , which is 1 network trip, in most cases, the simple solution wins in performance.

So as it concludes my point,

do not use memcache to small tiny data

It is useful, but you must be cautious.

Summary

In this post, I have talked about what the Presentation Layer should be, and talked about GraphQL, (as one of the presentation layer component). I explained

  1. GraphQL is not magic, it transit the responsibility from frontend to backend.
  2. Two ways to optimize GraphQL: dataloader and caching and some notable points that should be considered.

In next post, I will explain in detail with some demo code.

--

--

Han

Google SWE | Newly Dad | Computational Biology PhD | Home Automation Enthusiast