Fitbit Flex vs Zip vs Jawbone Up: Kind of old but maybe interesting

Explaining myself

I sit on a lot of data. As someone who does a lot of research for a living, I have collected an enormous amount of data. Lots of my data tend to be video footage, and most anyone who has done research with video (especially in education) knows you get much more than you can reasonably process. This is a known issue, but the point is that I have data I collected but have not deeply analyzed. Similarly, I have some data that I have collected and analyzed some, but never got around to sharing it. That's the case with some data I tracked over the summer more than a year ago with a Fitbit Zip, a Fitbit Flex, and a Jawbone Up. I'm surprised with the modest interest in a post I made comparing my Apple Watch to the Fitbit Charge HR, and the seeming interest in that post makes me feel like I could still share some of this old data. Part of this is because in that post I kept alluding to my hunch that wrist-based devices undercount steps relative to waist, hip, or pocket devices. This is why.

The very informal and now quite old mini-experiment

So some time ago (and I'm sure the algorithms used to compute steps have changed and improved, but the underlying issue that leads to the deviation you will see below I suspect is sort of unavoidable still), I was wearing simultaneously those three devices. I was curious about two things:

  1. How did the tracked numbers of steps on the three devices compare with one another? Two were wrist based (the Flex and the Up), and two were from the same company (the Flex and the Zip). I had expected there should be some similarities somewhere, and maybe a lot of similarities everywhere.
  2. I was trying to walk more, and I remembered several days where I made a deliberate effort to walk extra and hit my 10,000 steps. How did that look in terms of my data? I was hoping some sort of skewed plot with a lot of 10,000 and a few bad days.

Since it was a lot of time ago, I can't recall the exact procedures, but knowing me I'm pretty sure I was diligent about wearing all three devices simultaneously when I could. The standard screw ups happened, which included batteries not being charged or some device crashing unexpectedly. I did switch arms for the wrist devices partway through so that we weren't strictly seeing a dominant/non-dominant arm effect. So for days that didn't have comparable data, I threw those out for some analyses where comparisons were made. If distributions could still be plotted, I'd keep those data points for those days. 

The time span

The dates I covered were June 8 - July 25. The year was 20-*cough* *cough*-13. (Yeah, the data are over two years old, which is geriatric by internet standards. I analyzed them in August and then just got busy with life and didn't think anyone but me cared about these minutiae.) There are holes in the data for each device. For example, I forgot a charger when I went on a conference trip that involved a lot of walking. (Drat!) Batteries died. Software/firmware crashed. These things happen. However, there were enough data to run a few comparisons. Again, these data were from some time ago, so it is possible that if I were to replicate this now, the numbers would be a little different. I don't expect that the gaps that appear below will be closed, because it is hard to infer steps from arm movements.

Results

The tl;dr punchline is the wrist-based devices undercounted. However, the Up and the Flex undercounted quite differently from each other.

Most of these visualizations are made in TinkerPlots, a software tool designed for elementary and middle school students that I like to use a lot in my research. It is also, frankly, one of the easiest tools to quickly use to make a histogram instead of having to write R code. Right now, I'm in favor of expediency than sophistication (although I think TinkerPlots can be pretty sophisticated). First, here is a plot showing steps as counted by the Zip, organized in bins that are about 3000 steps wide.

The stack of gray dots on the right with the asterisk are points that had null values. These are days when the device crashed or the batteries were dead.

The stack of gray dots on the right with the asterisk are points that had null values. These are days when the device crashed or the batteries were dead.

So that looks pretty normal-isa? The center is probably around 7000 (which is a little disappointing because I thought I was being awesome and hitting a lot of 10,000 step days. I remembered those because I gave myself pats on the back for them, and I remember making the extra effort to hit those on a number of days.) We can turn up the resolution some and look at the data in smaller bin sizes just to see more of what was going on.

So this time, in bins of 1,000 steps, it does look like my steps tend to be in the 7000s.

So this time, in bins of 1,000 steps, it does look like my steps tend to be in the 7000s.

This shows that the data are a little 'bumpier' but normal is an idealized shape with lots of points and this isn't that many statistically speaking. Anyway, notice that there are only 2 days in the 9,000 range, but there are 4 days in 10,000. There are a decent number actually between 10,000 and 13,000. I consider these to be the product of a motivational bump. If it is nearing the end of the day, I don't want to end in the 9,000s when I can go around the block and hit 10,000! Of course, there is the exception of two days where I didn't get that motivated. (Cut me some slack). It seems like 8,000 seems too far from 10,000 for me to get that extra push in there.

Time to look at a different Fitbit product but worn on my wrist. Here is the distribution for the Fitbit Flex.

Yeah, so the bin sizes change. Sorry. I'm not trying to publish this in an academic journal - it's just my blog. So I'm not going to sweat it. TinkerPlots will usually optimize the horizontal display to accommodate the data that it is fed, and while…

Yeah, so the bin sizes change. Sorry. I'm not trying to publish this in an academic journal - it's just my blog. So I'm not going to sweat it. TinkerPlots will usually optimize the horizontal display to accommodate the data that it is fed, and while I can force it to certain bin sizes, I didn't.

With bin sizes of 2,000 steps (and you can tell my awesome walking day of 20,000 from the Zip was one of the days when I left my charging cable for the flex at home and was traveling), the center looks like it is somewhere in the 7000 range. So they look pretty close although careful inspection suggests that some of the days look like they counted lower. But to see really low counts, look at what the Up had wrought.

Still bin sizes of 2000 steps. At least I can sometimes be consistent.

Still bin sizes of 2000 steps. At least I can sometimes be consistent.

Ouch. My 20,000+ day isn't even at 20,000. The mode is less than 6000. This is definitely much smaller numbers than what the Flex gave me and less still than what the zip gave me. Let's see how far off these are. Now I have quick percentage deviations relative to the Zip, which is much more like a traditional, accepted pedometer in its location and functionality (although I haven't used a pedometer that you have to bang on the screen to change displays).

I can't spell 'diff' and there is some weird scientific notation thing going on, but let's not focus on those things.

I can't spell 'diff' and there is some weird scientific notation thing going on, but let's not focus on those things.

A good number the data points are undercounted, with nearly half of the undercounts being within -10%. That isn't too bad, I think. There are certainly some weird messed up days where there are errors up to the -50% mark, and there is a tiny bit of over counting too but only up to the +20% mark. How does the Up fare? Not so good.

Well, the vast majority of Flex data points are between -20% and -40% undercounts relative to the Fitbit Zip. Yeesh. 20% off seems good for a sale or discount (although everything I see now is buy one, get the other half off - sneaky marketing and sales people). It does not seem good for an activity tracker. That is, it is not good unless you want to definitely push yourself extra, in which case systematic undercounting but pushing to 10,000 steps each day likely means you get way more than 10,000 steps. But, it could also make 10,000 seem really hard to meet, and according to the Zip, I won't care if I can't even make it to the 9,000 mark.

Anyway, there you have it. This is my old 3-device comparison from some time ago. Fitbit seems pretty consistent with itself. Not all wrist-worn trackers are created equal. You still have to bang/tap on the zip to make it work. TinkerPlots is quick and easy but changes bin sizes on you. I hoard data. That's all for today.