In attending Hortonworks’ DataWorks Summit 2017 in San Jose, CA this past week, I was struck by some key differentiators in various breakout sessions and keynotes. What was particularly of note was that the shift in Hortonworks’ toolset has important implications for both developers and executives alike. This is Final Part in a series of insights for both technologists and decision-makers in companies who have Big Data on their minds. Let’s take a look:
How Companies Are Succeeding With Hadoop
There are some commonalities that stand out among companies experiencing success with Hadoop. The first is that companies tend to succeed when they focus on implementing processes that scale as well as data itself scales in Hadoop.
What that means is that IT departments that lean into one-off, custom-made processes to obtain and import data (which cannot be leveraged repeatedly) will inevitably be overwhelmed for data ingestion requests from their organization -- leaving them no time to actually extract the valuable insights needed. Instead, IT departments that develop and deploy robust, repeatable (and hopefully business-self-serve), ingestion processes will free up their own time to work on more value-add aspects of the platform.
Secondly, companies are finding that the value proposition of the Hadoop platform is increasingly tied to understanding the structure/schema of the data for performance, governance and business/data science enablement. Those that invest early in strong data governance find that it pays off down the line by making data understandable, discoverable and consumable. Without strong data governance, data will not be utilized, and the data lake is much more likely to devolve into the dreaded “data swamp,” having a low ROI and without the ability to support data warehousing, streaming and other modern workloads.
Thirdly, for companies that already have robust data lake implementations, those that are continuing to improve the ROI on their Hadoop platforms, are doing so by accessing data closer and closer to the source, where the data itself is intrinsically more valuable due to its timeliness. Many of the components showcased at the DataWorks Summit target these capabilities directly – from improving governance in Atlas with Schema Registry, or lowering the barrier to streaming data at the source with (Mi)Nifi and Streaming Analytics Manager.
What this means to you:
A good plan is your friend – before diving into this platform and putting money and talent into big data. Establishing strong governance is an investment, and as with most good business decisions, you need to prioritize what data is most valuable to meet your organization’s goal. Bottom line is, you have to lay a good foundation for enabling data science/analytics with the platform in order to get the outcome and insights you require.
Strong governance and clear strategy (for which data assets are critical and foundational for the data lake), are absolutely necessary for your team to derive any successful business outcome from the data landed in this process. While planning and deploying governance and strategy may cost more in time/effort initially, the repeated ROI is worth every single penny (virtual or real) spent.
If you’re ready for your big data journey, we’d be happy to discuss Hadoop best practices, strategy and governance with you. Reach out.