BI and the (un)necessity of now

Posted by Taylor Haney on Mon, Jun 8, 2015

This guest blog post is written by David Drollette, Sr. Director, Analytics at Wayfair.

David Drollette2015David Drollette is the Senior Director of Analytics at Boston e-commerce retailer Wayfair, overseeing Business Intelligence and Data Science functions. David joined Wayfair in January 2006, working in various finance positions before pivoting to a leadership role within analytics. He holds a degree in Mathematics and Physics from Ithaca College.

I can almost hear the lighting of torches and gathering of pitchforks as I utter the phrase "Real time is not right for you." I know, I know. Real time is buzzy. Real time is exciting. Real time is what you need to have in order to call yourself "cutting edge." The rest is just batch, and batch is boring. I come here in full agreement of the value inherent in real time, but also bring caution from witnessing the paralysis incomplete or ineffective real time BI can bring. Let's discuss the building blocks of real time and the steps necessary for a successful deployment that adds value.

Data Availability - The small data side of our real time equation is laughably easy for many companies. Transactional (OLTP) data is available in seconds in the operational data store (ODS). Those with a robust operational data stack have a reasonable appetite for lightweight queries against that system. Data availability challenge solved.

Though the above is true of many companies, plenty more can still encounter small data challenges; especially those whose ODS is guarded with restricted access rules, heavily armed DBAs, and the occasional server rack fortified by razor wire. In those cases, there are two options: Implement a batch ETL infrastructure which can mule data to analytical databases with minimal production load, or send production data to two places at once (ODS and analytical DBs). While there are many companies who can help with both options, the latter option can also help immensely with some of our real time big data challenges.

Enter the messaging system; a move away from batch (ETL) and toward a true real time implementation. In the spirit of staying solution agnostic, I will simply say there are some great messaging providers out there capable of delivering the two-places-at-once, multi-subscriber data feed alluded to above. Once we have selected and successfully implemented our messaging infrastructure, we are ready to tackle challenge #2 of a real time BI implementation.

Data Consumption - Now that the data is available in such a way that it does not adversely affect business operations, we are ready for the consumption phase of our real time project. It is this phase where the title of the post comes into play. In a traditional BI implementation, consumption involves reporting in the form of files and dashboards, published for use by analysts and CxOs alike. This is where my objection lies.

There exists no person who can efficiently and effectively monitor a real time data feed for all the anomalies, trends, and outliers which should be spotted in order to add maximum value to a business. To unlock the full potential of the real time data availability we've staged above, we need to put machines on the case.

Real time stream processors and complex event processing represent a great way to make real time data meaningful. They can take the live stream and help identify trends and data points which are "business interesting", generating alerts, buzzing pagers, and sending emails to the proper, pre-configured audience. Once we've layered meaningful intelligence onto our real time data, only then should we feel free to Edward Tufte-ify the data stream.

Just be warned: We've peeled the analysts and CxOs away from the constant update of real time charts and graphs with our best in class real time insights system. We've gotten them back to the business of running a business and they're happier for it. They may not be in a hurry to adopt a dashboard whose objective is already met by a system they trust.

Real time is not right for you...but it is perfect for your machines.