Crash Reporting and KMP

Dog crashed

If you plan on pushing apps to production, you'll need some way of finding out when things aren't working as expected. I won't try to convince you that you should be doing this. This post will just focus on the how.

The world of performance and stability monitoring for all forms of "digital products" is fairly wide. There are a lot of services out there, and many are specialized to do different things. Also, Kotlin Multiplatform covers, well, a lot of potential platforms. To begin, I want to really drill down into a narrow definition of what we're going to be concerned with.

  1. KMM only. That is native mobile. Specifically Android and iOS.
  2. Crash reporting, and very specifically Crashlytics, a bit of Bugsnag, and a discussion of other possible implementations.

We'll also divert a bit into discussing crash reporting in the context of SDKs, where you have all of the same problems, but don't control the deployment and generally don't have access to that data.

Crash Reporting Overview

Broadly speaking, the basic features of crash reporters are the same across products.

Hard Crash

The platform runtime allows you to register a handler that gets called when your app is crashing, and lets you do something before the process is shut down. That's for an uncaught and unexpected issue. I believe most systems send the reports on next app start, but I've spent much more time on iOS recently, so there may be some recency involved (I.E. Android may work differently there).

Soft Crash

This would be better described as a "handled exception" as you're not really crashing. You've encountered an issue in code, and you're going to continue on, but would like to know about it. Most, if not all, systems allow you to send a soft crash report.

On the name "soft crash", I would say there is not much commonality across products and platforms on what to call things, so I'm just making my own terms. What Crashlytics calls a "caught exception", New Relic calls a "handled exception". They also all have different names for logs, keys, etc. I'm just going to add my own names, but they're all the same basic thing, and I'll attempt to map the product's name to what we're calling them.

Logs

Most tools let you write to their runtime log, which maintains a running log buffer. They're usually, if not always, capped in length, and the oldest entries get purged to make space for newer ones. When a crash happens, the log is in cluded in the report. Think of it like the black box on a plane.

Product names can include "custom logs" and "breadcrumbs".

Keys

This is generally a set of name/value pairs that can be useful for categorizing and searching. You can stuff arbitrary data in there, although I tend to do that in the log if we don't need to search for it.

Most products have other proprietary features and certainly many differentiating features, but we're only concerned with what the client can send and what we're supporting for KMM. The list above pretty much covers it.

Android

Android is the simpler case. You configure your crash reporting client like you would for any Android app. Since KMM is all just "Kotlin" as far as Android is concerned, when a crash happens, you'll get the same report you'd get if you weren't using KMP/KMM.

The only real KMM consideration is how to add log statements, and how to send soft crashes, from shared code. Each product ships an SDK specific to each platform, at least until one of them decides to ship a KMP-specific version (reach out, if interested 😇). The solution is simple. Create an interface in common Kotlin that lets you log and lets you send exception reports (aka Soft Crashes).

interface Crashy {
  fun log(message: String)
  fun softCrash(t: Throwable)
}

You create an implementation in your Android-specific code and make that available in common, generally by "passing it in" on app start.

class CrashlyticsCrashy {
  override fun log(message: String) {
    FirebaseCrashlytics.getInstance().log(message)
  }
  override fun softCrash(t: Throwable) {
    FirebaseCrashlytics.getInstance().recordException(t)
  }
} 

I haven't run the above code. It's just to show the bare minimum of what you'd need to do for shared Kotlin crash reporting info. We've created Kermit modules to handle this for you.

Anyway, the common code defines the interface Crashy and you pass in CrashlyticsCrashy when Android starts up. The common code can call the interface and interact with the Android-specific Crashlytics instance. If you wire that up correctly, that's kind of all you need to worry about for Android.

iOS

As you can imagine, we'll do something similar for iOS. Outside of Kotlin, they are similar systems and support similar features. You initialize your crash reporter, and it'll send crash reports.

If you don't care about Kotlin stack traces, you don't really need to do anything special with Kotlin and KMM.

I think it's important to stress that point. Yes, if you want logs and soft crashes from Kotlin, you'll need to implement a version of Crashy for iOS, probably in Swift, and pass that in on app start. However, even if you don't, you'll get crash reports on iOS without any other configuration.

The problem is in how Android and iOS report crashes, and at a more basic level, in how their languages pass around errors. Kotlin came from the JVM, and on Android, it runs in that same environment 🙋. If an exception is thrown deep in a stack, it'll bubble up until caught, and you can extract the stack trace from that.

On iOS, if you get an error condition that would cause an exception on Android, it'll basically fall over right there. The system can extract the stack traces at that point and report them, but there's no "bubbling up". You can't put a try/catch at the top of a deep stack and catch everything coming out of it.

That works fine on iOS generally, but Kotlin compiled for iOS works more similar to how Kotlin on Android and the JVM work. If you make a call into a Kotlin function from Swift, and there's an exception in the Kotlin function, it'll bubble up to where you called it from Swift, and then the Kotlin interop system will helpfully crash the app for you. From the perspective of, say, Crashlytics, it failed at that point, and that's the stack trace you get.

You'll see a lot of konan::abort(), which basically says "I caught something out of the Kotlin code. Let's fall over here, thanks". What actually caused that crash is lost, because Crashlytics (or whatever) don't know about Kotlin or what to do with that info.

I will say from experience, that extra info makes all the difference. It's not just the stack trace, which is important, but also the exception message, which will often just tell you what's wrong, and in many cases, what you probably need to do to fix it.

If you do nothing else, add an uncaught Kotlin exception handler that logs the exception message, and maybe the stack trace. That will be far, far better than nothing.

What we really want from the iOS crash is the following:

File names and numbers, functions names, in a stack. You can get this on iOS from Kotlin, although it's currently a little hackier than I'd like.

iOS Crash Reporting

A little aside about non-Kotlin crash reporting on iOS. When a crash happens, the system can get some minimal info and function pointers to the stack, but not really much else. That info is generally "stripped out" during compilation, and kept in external maps. Those maps need to be sent to the service processing crashes (Crashlytics, etc) to "symbolicate" the pointers back to human-readable info.

If you've ever set up iOS crash reporting and see instructions about uploading dSYM files, or see errors about missing dSYMs, that's what that is. A dSYM file is a "debug symbol file". It maps function pointers to source code, basically.

Kotlin creates a standard Xcode Framework that you include in your iOS project. Xcode and friends know how to include the Kotlin-generated dSYM info along with everything else, so the map from function pointers to source exists, and should be sent to the server without any special config.

Kotlin Stack Traces

The server has the map, but because of how iOS crash reporting works, we don't record the part where you were inside Kotlin land, so the map doesn't help. We need to get the Kotlin stack and send that to our crash reporter.

The basic process is to:

  1. Catch an exception
  2. Get the function pointers from it
  3. Send them somewhere

How you "catch" the exception depends on hard or soft crash. Soft crashes are caught with try/catch in standard Kotlin code. A hard crash will probably come by registering an unhandled exception handler with the Kotlin runtime. This will be called before we tell iOS to fall over (the konan::abort() above).

You get the function pointers by calling a convenient helper that's only available in Kotlin/Native: getStackTraceAddresses().

fun Throwable.getStackTraceAddresses(): List<Long>

This returns a list of function pointer addresses. That list of addresses is what can be mapped back to useful human-readable info.

Sending them somewhere is a trick. So far, all crash reporters I've used that can take a custom stack do so in the form of a "soft crash". There's no way to include the report in a hard crash. Also, not all crash reporters let you send a custom stack. As of today, Crashlytics and Bugsnag do, New Relic does not 🚧.

So, not all crash reporters can do it at all, and the ones that do need a soft crash. The soft crash is fine if you caught the crash with try/catch and intend to keep running. It's less fine if you don't catch it yourself. Currently, our solutions register a Kotlin handler that creates a soft crash report, with the stack pointer addresses, that will properly symbolicate. However, after that is done, the crash proceeds into regular iOS crash land, and submits an additional hard crash without the extra stack info (again, konan::abort()).

That means, to be clear, on Crashtlytics and Bugsnag, currently, an unhandled crash that originated from Kotlin code will generate 2 reports. One soft crash with Kotlin-specific info, and one hard crash that is a standard iOS crash (with the Kotlin stack lopped off).

This isn't ideal, but we mitigate that by adding a random UUID that will allow finding the matched pairs. Unfortunately, the way the systems group them differs, so it's for sure not perfect and there is room for improvement, but you can make it work.

Future Directions

Crash reporting vendors could expose a way to make a single hard crash possible. In theory, somebody from the community could build a custom client from open-source clients (assuming they're available), but we haven't gone quite that far yet. Most tools now support React Native and Flutter, among other ecosystems, so it's probably best to just be noisy and let them know the Kotlin community wants this.

Kermit and CrashKiOS

Touchlab publishes 2 libraries related to KMP crash reporting. The first, CrashKiOS, has been around for a couple years now. It is helps wire the uncaught handler and prepares data needed to be sent to your crash reporter, but does not do any cinterop, so to integrate it you are directed to copy/paste Swift code. It is more work, but not too much. We do need to update the docs, but this is still the way many may need to integrate (manually).

Kermit is a KMP logging library that has modules for Crashlytics and Bugsnag. The modules exist to support logging and soft crash reports to their respective services. You can configure Kermit to include these specific LogWriter instances to ensure you can send log statements and soft crash reports from your integrated logging (with Kermit). The modules also have helper functions to set up the unhandled crash logger.

Using Kermit makes setup easy, but there are some complexities around how to actually link the iOS-side dependencies. Please see the Kermit and Crashlytics post for more detail on setup.

🙋‍♂️ ART, or whatever, technically, but it's all JVM-like

🚧 I tried Sentry the morning I was giving a talk about this. Droidcon SF 2019. They were there with a booth, so I figured I'd give it a shot. I could not get it to work, but I spent about 90 minutes total, and I've never used it professionally, and that was 2 years ago, so...

Kevin Galligan

Kevin Galligan