Keeping Big Data Out of the Uncanny Valley

Posted by Taylor Haney on Mon, Apr 29, 2013

We wrap up our April guest blog series on Big Data with a great post from John Stauffer, CTO of True Fit. John discusses the great point of how to properly use big data to target customers but not spook them with the data collected about them. As a reminder, our next guest blog series starts in May and the topic is Mobile, if you are interested in contributing please e-mail me at taylor [at] MITX [dot] org.

truefitJohn is a pragmatic software development leader with a 20+ year track record of developing successful product and architecture strategies and leading teams to execute on them. Prior to True Fit, John was Chief Architect at 170 Systems where he led the transformation of the company’s existing product architecture to a modern, scalable, and maintainable java-based architecture, leading to an acquisition by Kofax. Prior to 170 Systems, John was Chief Architect for Oracle’s Retail Business Unit after its acquisition of ProfitLogic. As Chief Architect for ProfitLogic, John was responsible for the creation of products for Retail Markdown, Pre-season Planning, Allocation and Promotion Forecasting and Optimization that are used by the world’s largest retailers.


The catalog sitting on my counter was the single creepiest piece of mail I had ever received. It was full of dog-oriented chotskies aimed at overzealous dog owners. 

As the owner of a pair of English Bulldogs filling out our nearly-empty nest, my wife and I are certainly the target market. I have a Bulldog statue proudly sitting on my desk, and I've given up the fight against the Halloween costumes that my wife insists on buying each year for our bulldogs (Knight and Princess last year, Greaser and Poodle Skirt before that).

What really made me uneasy was the fact that I recognized the pillow on the cover with the bulldog's face that looked just like our beloved Lulu. I had seen that same pillow about a week earlier, staring back at me from a website I had visited in a fit of sentimentality brought on by a viewing of My Dog: An Unconditional Love Story. Now I’ve said too much.

Was it coincidence that a site I visited for the first time a week ago was now sending me a catalog for the first time, or was I the subject of some amazing Big Data scheme that was able to determine my identity and address from a single visit to the site? I couldn't rule out the possibility that some clever startup was selling sophisticated retargeting software to online catalogers – “just because you're paranoid doesn't mean they aren't after you.” (Joseph Heller, Catch-22)

I'm not really bothered by the fact that my name and address have been sold time and again. In fact, this catalog could have just as easily been the result of registering a litter of puppies with AKC last year. None of those low-fidelity mechanisms would bother me. So why am I creeped out by the idea that visiting a web site could turn into a catalog in my mailbox?uncanny valley

I think the answer is that this (possibly hypothetical) application had fallen into the Uncanny Valley of Big Data. The Uncanny Valley hypothesis was formulated by roboticist Mashahiro Mori to describe his observation that people are better able to relate to robots when their appearance is more lifelike up to a point. As their appearance crosses a threshold of similarity, people are instead repulsed. The principal is perhaps best epitomized by criticism of the the 2004 film The Polar Express. The gap between the point where more lifelikeness is beneficial, and true lifelikeness is the valley referred to in hypothesis' title.

In a world of Big Data personalization, we face a similar valley of expectations. In our case, the valley is defined in terms of the how personal the data is, the level of control I have over sharing the data, and the utility I am receiving from the application. There is a gap between low fidelity solutions we are comfortable with because we understand how they work, and highly personalized solutions that are almost magically delightful (e.g., how did Netflix know I'd like that film?)

For example, I am comfortable with low-fidelity list selling mechanisms that fill my mailbox with junk mail, but I'm creeped out by the possibility that visiting a web site may allow someone to find my identity and address. The difference is that I understand how the mechanism works, and I chose when to share my information.

Similarly Target found that sending personalized mailers to women they identified as being pregnant was disconcerting, but sprinkling pregnancy-related coupons into a seemingly generic mailer targeted at those same women was highly effective. This application avoids the valley by maintaining a little extra distance between the company and the user.

Another bridge across the valley is to provide a service that is shaped around value to the customer. For example, my company True Fit, provides shoppers with highly personalized fit ratings & size recommendations to help them confidently buy clothes and shoes online. The experience of receiving expert fit guidance makes shoppers feel very comfortable with anonymously sharing a little information about their bodies and style (no personally identifiable information) to confidently purchase clothing and shoes online and eliminate sizing guesswork. Just 60 seconds to answer a few fun questions, combined with our powerful big data fit engine, and each shopper instantly gets their own personal fit rating and size recommendation for every style. In this case, ratio of customer input to customer value tips dramatically in favor of customer value.

When using Big Data to develop solutions, being clever isn't enough. It's important to be aware of the customers that will be using the system and build solutions designed with them in the center to make sure that we don't spook out the very people we're trying to court.