Deepak Ghodke
In 2016, many more organizations began storing, processing, and extracting value from data of all forms and sizes. In 2017, systems that support large volumes of both structured and unstructured data will continue to rise. The market will demand platforms that help data custodians govern and secure big data while empowering end users to analyze that data. These systems will mature to operate well inside of enterprise IT systems and standards. Specialized systems for “big data” will come together with existing analytics platforms, fueling adoption in the business.
Here are our industry predictions for 2017.
Big data becomes fast and approachable. Sure, you can perform machine learning and conduct sentiment analysis on Hadoop, but the first question people often ask is: How fast is the interactive SQL? SQL, after all, is the conduit to business users who want to use Hadoop data for faster, more repeatable KPI dashboards as well as exploratory analysis. In 2017, options will expand to speed up Hadoop. This shift has already started, as evidenced by the adoption of faster databases like Exasol and MemSQL, Hadoop-based stores like Kudu, and technologies that enable faster queries.
Big data is no longer just Hadoop. In the past, we’ve seen several technologies rise with the big-data wave to fulfill the need for analytics on Hadoop. But for enterprises with complex, heterogeneous environments, answers to their questions are buried in a host of sources ranging from systems of record to cloud warehouses, to structured and unstructured data from both Hadoop and non-Hadoop sources. In 2017, customers will demand analytics on all data. Platforms that are data- and source-agnostic will thrive while those that are purpose-built for Hadoop and fail to deploy across use cases will fall by the wayside.
The convergence of IoT, cloud, and big data create new opportunities for self-service analytics. It seems that everything in 2017 will have a sensor that sends information back to the mothership. IoT data is often heterogeneous and lives across multiple relational and non-relational systems, from Hadoop clusters to NoSQL databases. While innovations in storage and managed services have sped up the capture process, accessing and understanding the data itself still pose a significant last-mile challenge. As a result, demand is growing for analytical tools that seamlessly connect to and combine a wide variety of cloud-hosted data sources.
Self-service data prep becomes mainstream as end users begin to shape big data. The rise of self-service analytics platforms has improved Hadoop's business-user accessibility. But business users want to further reduce the time and complexity of preparing data for analysis. Agile self-service data-prep tools not only allow Hadoop data to be prepped at the source but also make the data available as snapshots for faster and easier exploration. We’ve seen a host of innovation in this space from companies focused on end-user data prep for big data such as Alteryx, Trifacta, and Paxata. These tools are lowering the barriers to entry for late Hadoop adopters and laggards and will continue to gain traction in 2017.
The author is Country Manager, Tableau Software, India