Saturday, August 20, 2011

SCHNEL! Or why patience is a virtue except when testing on Windows Phone

Mystery solved! As I had promised in my last blog entry, I added exception reporting to my two apps, A Pose for That and Yoga-pedia to determine exactly what was going on with the exceptions that the Microsoft Marketplace was reporting but I had never seen. I needed to know:

  1. The cause of the exceptions (the stack traces were too cryptic for me to figure out)
  2. How to fix the problem(s)
  3. Solve the mystery as to why I have never seen a crash even though there is little doubt that they are indeed happening out there in the wild. If I can’t be confident that my testing is complete, I can never be confident that my app will behave when in matters most – in production.

Remember these three objectives because you will be tested later.

Now, I know that my entries are sometimes kind of long – so here are the conclusions…. And if you want to know how I back them up – then (hopefully) you will enjoy the rest of the post.

(PLUS, there’s a teaser at the very end).

Conclusions:

  1. Always account for “loss of context” in WP7 apps – probably the try-catch is the best approach but I will defer to “real developers” for the specific strategy. At least with Silverlight, impatient users can always force your app into an invalidOperationException.
  2. Culture matters in both user preferences and user expectations (and therefore user satisfaction). If at all possible, represent all relevant cultures in your test populations. How do you know what the relevant populations are? Analytics of course…
  3. Software quality, user experience and user profile are all intimately connected. Systems that only monitor user behavior (marketing) or only profile software stability (debugging) or only profile runtime configurations (marketplaces) are inherently weaker than an approach that accounts for the influence that each has on the others.
  4. Without Runtime Intelligence (or another comparable application analytics solution), no development team can be confident in either the quality of their app or their users’ experience.

And here’s the how and why I have come to these conclusions…


HOW – first I had to add my own exception reporting.

Exception Reporting: Adding exception reporting with Runtime Intelligence is very simple. All I had to do was add one exception reporting attribute as follows (from within Dotfuscator for Windows Phone)




Note that in the properties of this attribute I am asking that the method ExceptionExtendedData method be run. A Runtime Intelligence system probe attribute works fine during normal operations, but if I want custom data after an unhandled exception, this is a more reliable technique. Here is the method that I put in the App class:

As a side note, if I wanted to track thrown exceptions or handled, I could place the exception attribute down at the method level to get much more targeted data. Anyhow, after this simple step, I deployed the re-instrumented app to the marketplace and (sadly) watched the exceptions roll in…


Runtime Intelligence Exception reporting
Logging into my Runtime Intelligence portal account and selecting the date range I was interested in and then selecting “Exceptions”, presents me with the following:


I can see the total exceptions over time; the type of exceptions (I am only getting one – and that seems like it might be good news) and I have a list of all of the specific exceptions on the right. Clicking on any one of these shows me the detail as follows:


The graphic above shows screen captures from three different stack traces.

Good news item 1 is that (unlike the marketplace stack traces), I can see the diagnostic message. This may not mean much to the serious developers who enjoy offsets and cryptic traces – but I need these to go back to MSDN and other resources to see what is really going on and what I can do about them.

It turns out that there were seven different exceptions coming from my app – BUT ALL OF THEM HAD TO DO WITH TIMING – not some error in my general logic (in other words, I’m not dividing by zero or trying to display a non-existent image, etc.). For some reason my app is getting vertigo in my customers’ hands and losing track of what page was current resulting in any number of “InvalidOperationException.”

Good new item 2 is that there is a pretty standard way to manage this behavior; the try-catch statement. I’m in no position to explain how this works, but visit the link above for a great explanation.

So with basic Runtime Intelligence exception reporting I have addressed my first two requirements; to diagnose my app’s problem and identify a fix. BUT – I have not addressed the deeper and perhaps more troubling issue of why I have never seen this problem myself – what’s this all about? If I can’t improve my quality control, I can never feel comfortable that my app will perform in the wild as it does for me.

Good new item 3 is that I have Runtime Intelligence to give me EVEN MORE context on my app and my users. The fundamental flaw in almost every exception handling solution I have ever seen is that they (by necessity) can only look at the app when exceptions occur – they are too heavy-weight and/or too invasive to run all the time everywhere – no so with Runtime Intelligence.

If you ONLY have exception data, you are robbed of one of the most effective diagnostic heuristics available – the process of comparing populations in order to identify material differences between them and thus leading to a likely root cause. This is the fastest and cheapest way to figure out why I had never seen a crash.

What I did next was to compare the set of users who experienced exceptions with the general population of users and myself – was there something specific about their phones? Their software? Their behavior?

It turns out that the answers to these three questions are no, no and YES!

Process of elimination: First, I compared the system data of exception users and phone with the general population as defined in ExceptionExtendedData defined above… I won’t bore you with all of the metrics I was able to eliminate, but I will show one; manufacturer.

The two pie charts show the relative percentages of manufacturers in the general population of my users with the population that had exceptions – one can eyeball these and pretty quickly see that there is virtually no difference. The bar chart puts a fine point on this by showing the relative difference in share; Dell had only 1% of the total share and was not statistically significant – looking at the other three manufacturers, we can see that there is no more than a 20% variance between the two populations. This kind of range was consistent across all of the metrics I had been collecting except one.

Schnel!

In my last blog I had noted that there had appeared to be a disproportionate percentage of German speaking users in the exception population and it turns out that this was not a random blip – it showed up again in this latest exception data as follows:


The top bar chart shows the relative percentage of users by culture that experienced exceptions alongside the relative percentage of that culture in the general population. The second bar chart shows the relative difference in share by culture and it is truly surprising (at least to me).

Germans crashed my app 13X more often than norm, Austrians and the Dutch crashed the apps 4X what their relative share would suggest with the Malaysians right behind.

Given the relative distance between these populations and the different carriers and jurisdictions that these populations live under, it seems pretty clear that what these users have in common is their behavior. These users are simply more impatient than the rest of my users. They hit the “show pose” or “take me to the marketplace” or whatever more quickly and more often and so they are that much more likely to cause my app to lose its place.

Not only am I more patient (being an American and at one with the universe ;), but because I know my app and the areas where it may take a beat (or two) to respond – I naturally did not repeat my commands impatiently at those critical times – and therefore, I did not crash my app! Mysteries solved!

Conclusions: (AGAIN)

  1. Always account for “loss of context” in WP7 apps – probably the try-catch is the best approach but I will defer to “real developers” for the specific strategy. At least with Silverlight, impatient users can always force your app into an invalidOperationException.
  2. Culture matters in both user preferences and user expectations (and therefore user satisfaction). If at all possible, represent all relevant cultures in your test populations. How do you know what the relevant populations are? Analytics of course…
  3. Software quality, user experience and user profile are all intimately connected. Systems that only monitor user behavior (marketing) or only profile software stability (debugging) or only profile runtime configurations (marketplaces) are inherently weaker than an approach that accounts for the influence that each has on the others.
  4. Without Runtime Intelligence (or another comparable application analytics solution), no development team can be confident in either the quality of their app or their users’ experience.

TEASER – WOULDN’T BE AWESOME IF WE COULD DO ALL OF THIS PROFILING AND EXCEPTION ANALYSIS WITH HTML5/JAVASCRIPT TOO? STAY TUNED (IN)!