In today’s world, Data has intrinsic value and it has become a key asset for every organization to revolutionize its business operations. Every organization is overwhelmed with the volume and variety of data which includes customers profile, sales information, product specification etc. from a wide range of sources which includes SNS, IoT, ERP/CRM system and so on. Organizations, despite having new tools and technologies, find it difficult to collect, store, process and analyse these large volume of data to generate new business insights. Hence, it is important for the organization to have a clear data strategy to capture the massive amount of data effectively and harness meaningful business insights.
What is Data Lake?
Data Lake is a centralized repository that allows an organization to store structured as well as unstructured data at any scale in its native format, process and analyse to derive new business insight. It helps in transforming information management into a proactive and real-time practice by enabling organizations to swiftly react when new business challenges are identified as they can make use of data throughout its entire life cycle, thus drastically decreasing the time to insight creation.
How to Design your Data Lake?
In order to effectively design a data lake, organizations must apply an agile approach piloting with prioritized use cases, testing and refining them, in contrast to an extensive one-time project to connect all data to the data lake. The design should depend on the organization’s business goals, priorities and selection of use cases. The data lake journey should begin with proper alignment of an organization’s IT team and other business units. They should work in conjugation to develop an agile approach for building a data lake and must have the same outlook while answering questions such as-How do our existing capabilities of data management look like? How do we tackle the complicated streaming data? How complex is our data acquisition process? What type of skills, tools, and technologies are available in our organization?
Walking through all these questions and design principle would enable businesses to build an agile development model, thereby helping them realize the business benefits of data lakes quickly and placing a limit on future reworks and iterations.
An organization should follow the below mentioned key guiding principles to design a data lake:
• Foster data-driven innovation by making raw and modeled data open to capable data scientists
• Favour Open Standard and Technology Independent Solutions in developing the data lake
• Avoid duplication of investment and functionalities in an organization’s data systems by providing data warehousing and self-service BI services to different divisions within the organization
• Centralize organization’s data and enable timely information extraction
• Ensure adaptability to a changing customer landscape, and support for variety, volumes, and velocity of data sources by design
• Adopt proactive and strict information security measures to mitigate unauthorized accesses, undesired private information disclosures, and cybersecurity threats while conserving information value and utility.
Key steps that an organization should follow to build its data lake to gain maximum business impact:
1. Create business goals and prioritize at least 2-3 use case
2. Select data platform based on organizations business requirement
3. Choose the tools & technologies that are easy to operate and satisfy the user’s requirement
4. Supplement your existing talent resources with specialized data lakes consultants to leverage their experience. Also, train the existing staff for Hadoop, analytics, lakes, etc.
5. Have a clear data governance strategy in place. Avoid dumping everything into the data lake and ensure that all the data residing in the data lake is properly cleansed, classified and protected or it will ultimately get clogged-up and become a data swamp which is nothing but a murky business liability.
The best practice would be to narrow down on specific use cases around themes such as predictive analytics, omnichannel marketing, customer engagement and so on depending on your business requirements, and enjoy the agility of data lakes by accessing and analysing any data in its native format to get deeper business insights.
By Deepak Jha, Deputy General Manager, AIPF (Artificial Intelligence Platform), NEC Technologies India Pvt. Ltd.