Writing some actually decent logs

Best practice and advice with logs

Logging isn’t often seen as a crucial, upfront task. An engineer once said to me, “Well, logs are kind of like comments, right?”. No, unnamed engineer. They’re not kind of like comments. They’re so much more than comments. They’re one of the most flexible ways of viewing your system performance. They’re often overlooked, unloved or misused.

I’ve seen some awful logs. I’ve SSH’d into a server at 3 in the morning, only to find logs like processing... over and over again in a file. Useless. So, in light of my sleepless nights and frantic troubleshooting, I thought I’d throw together some of the practices that can take your logs from logs to LOGS.

No not these types of logs, oh just read the damn article.

Put all your eggs in one basket.

Otherwise known as Log Aggregation. The idea here is to have a single portal for all of your application logs. You don’t want to be jumping in and out of servers to find out which instance of your application is failing. The most common tool that people use is the ELK stack. Guru99 produced a great deep-dive into what it is and how to set it up. I would strongly recommend putting together some kind of log aggregation as your first port of call.

And occasionally eat some of those eggs

Something that is often forgotten is pruning logs. Elasticsearch instances become overloaded with vast quantities of irrelevant logs. Tools like Curator come in handy for blowing away old logs. You need to have a conversation about the retention period of logs, taking into account regulatory and investigative requirements.

Write your logs in a parseable format

This is linked to the recommendation above. When you write your logs in JSON (for example), you’ve got some interesting opportunities opened up to you. JSON limits readability but it creates queryable, searchable logs. This is where Log aggregation takes a whole new step forward. For example, using Slf4j in Java, one could write a log like this:

log.info("A user has been created", value("userId", user.getId());

Now, when you’re searching through your logs, you can execute a query like the following in Kibana (the K in the ELK stack):

userId: 1234 AND message: "A user has been created"

No more sifting through a file and writing random grep commands on the fly. Simply query your elasticsearch storage. The same goes for things like correlation IDs, so you can trace the movement of a message all the way through your system.

Include a human readable message

Okay hotshot, your logs are in JSON and stored in Elasticsearch. You’re making good use of Kibana to query them and you’re having a great time. But don’t forget the basics! Include a simple, human readable message that describes what is going on. Keep that in its own field, as part of your log. This doesn’t need to be a long sentence, or even a sentence at all. Even something like:

log.info("UserCreated", ...);

There needs to be some field that specifies, succinctly, what is going on. This sounds obvious but it is often overlooked and log messages like processing user pops up. What does processing actually mean? Are you deleting them? Who knows! Just like your variables, application names and REST resources, be clear and to the point with your logs.

Log consistently

Often, log messages can differ a little bit inside your application. To avoid this, I would recommend abstracting some of your key logging into a service all of its own. This prevents repetition, but it also creates a single place that you can check, to see what the correct log messages are.

public class MyController {    public void handleRequest(Request req) {        MyLogger.logIncomingRequest(req);

}}

public class MyLogger {    public static void logIncomingRequest(Request req) {        log.info("...");    }}

This pattern has some trade offs. You’re hiding your logs behind a service and you lose the annotation value that logs can provide. The advantage is that when your core logs are all in the same class, it is far easier to keep them consistent.

Log at multiple levels

Once you’ve captured your core business events in your logs, you might be tempted to stop. No! Carry on, you’ve got so much more to do! In between all of those UserCreated messages you’ve got database writes, database reads and object creation. This is why the good Lord created multiple log levels. A good example of effectively using these logs is:

log.debug("UserCreationRequestReceived", ...);log.debug("CheckingToSeeIfUserExists", ...);log.trace("DB Query Completed", ...);log.trace("MappingToUserObject", ...);log.debug("UserDoesNotExist", ...);log.debug("Creating User", ...);...

log.info("UserCreated", ...);

This seems excessive, but when you’re in a crisis situation, you want to have this kind of visibility available to you. So how do you control the noise?

Change log level without a restart

When you log at multiple levels, you need to decide which logs are going to be silenced. For example, if you set your log level to info, you’re going to suppress your debug and trace level logs. If you set it to trace, you’re going to get everything.

What is essential here is that you can quickly dial up your logging detail in a crisis. If you’re already making use of Spring Boot, you can do this for free with Actuator. This is very language specific, but many tools already exist for the well known languages and frameworks. This is worth implementing. It forms a key part of your disaster recovery automation.

Include queryable, meaningful values in your logs

If you’ve got an API and you want to link together all the activities associated with a given request, attaching a queryable value to your log lines is a very powerful way of grouping common activities together. This can be something as simple as a UUID generated at the point of request. Imagine some logging code that looks like this:

log.info("UserArrived",     value("sessionId", "1234"),    value("userId", "5432"));// Some stufflog.error("UserError",     value("cause", "User account disabled"),     value("sessionId", "1234"),    value("userId", "5432"));// Some more stufflog.info("UserAccountEnabled",     value("sessionId", "1234"),     value("userId", "5432"));

Now, when you’re looking in Kibana for a given session, all you need to do is query for that session ID. Boom, you’ve got absolutely everything that occurred in your system. If you wanted to know about that particular session, you can search for sessionId: 1234 and if you want it all for that user, userId: 5432.

This will work across microservices too, creating a single thread that links all of the user behaviour together. The trick is picking the correct values. Session IDs are okay but they can be a catch 22. First you need to find the session in order to find the ID. Things like user IDs are better, since that information will likely be available to you when you start your investigation. This creates a property known as “traceability”.

I’m logging out my ideas (in verbose mode) on twitter if you’d like to follow!

Writing some actually decent logs was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.

Publication date

07/02/2019 - 17:51