Data Lake – Sink or Swim
Data Lake is the next buzzword – but what is it – and how will it bring you business benefit?
Regardless of where you are now with your data, take a minute to consider your data future. Leading businesses are on a journey towards connecting their enterprise data together. As business is increasingly becoming pure digital, access to data is a key priority. Speed of development and deployment are becoming more and more critical. A data lake can help you meet the demands.
As with all data projects (big or small) successful data lake projects stem from clear definition, support by use cases and best practices that IT and business stakeholders alike can understand and benefit from. Without strategic direction and a level of governance, data lakes run the risk of becoming unsupervised data swamps of information that provide no business value.
Insightful decision making from your data.
BusinessMinds create strategies, manage, design and implement:
Management and Control
Defined as a massive, and relatively cheap, centralised storage repository a data lake can hold all types of data until it is needed for business analytics or data mining. A data lake holds data in its rawest form, unprocessed and ungoverned.
Having a data lake because it a cheaper way to store and manage data (saves IT money) is not enough for business benefit (making the business money). A successful lake must be able to provide real-time response to queries and give users an easy and uniform access interface. In this free data environment, enterprise data warehouses, discovery-oriented environments, and highly specialised analytic and operational technologies should coexist.
To ensure your data lake’s success, we have compiled a list of tips
Start with the use case. You need to have some use cases in mind before constructing a data lake. They could be existing ones or any problem that your business wants to solve but currently can’t.
Hire technically adept facilitators. Data scientists and engineers can provide the necessary expertise required to make the data lake a successful data and analytics tool. You may choose to work with data consultants or in house staff. The key is in hiring technically adept facilitators.
Involve all facets of the business. Data lakes are resources for the entire organisation, not just IT. Therefore, all interested parties should be involved in planning data lake projects. Data lakes are central to your organisations data architecture, and cannot be implemented in isolation. A data lake project should involve business leaders, IT, storage leaders, and end users.
Business benefits won’t come from technology. The business value of a data lake has little to do with the underlying technologies selected. Business value is derived from the analytics skills you can apply to the lake. Data lakes aren’t a replacement for existing analytical platforms or infrastructure. Instead, they complement existing efforts and support the discovery of new questions.
Use your multiple tools and products. Customise your lake based on your existing and available technology stacks.
Domain specification. Data lakes must be specific to cater to your industry’s unique needs. Make sure that IT intervention is not necessary to enable users to obtain data when they need it. A user interface that allows keyword, faceted and graphical search will be necessary.
Automated metadata management. Automated and mandatory metadata management is a must to avoid your lake becoming a swamp. Attributes like data lineage, data quality, and usage history are key to usability.
Configurable ingestion workflows. New sources of external information will be available continuously. Have an easy, secure and trackable content ingestion workflow mechanism that can rapidly add new information into the data lake.
Integrate with the existing environment. Many businesses have existing enterprise data management systems. The data lake must be integrated and be able to support these to avoid replacing or damaging the existing environment.
Allow the data lake to mature gradually. With the data lake, users can take what is relevant and leave the rest. Individual business domains can mature independently and gradually. Perfect data classification is not required. Users throughout the enterprise can see across all disciplines, not limited by organisational silos or rigid schema. Data maturity results as a natural outgrowth of the ongoing user interaction and feedback at the metadata management layer—interaction that continually refines the lake and enhances discovery. (See figure below).
While data lakes are quite new, they can be a low cost, effective and successful data management solution for businesses provided that they allow users to analyse an extensive array and volume of data when and how they want it. The key is to design and implement one that is tailored specifically to address business needs.
This process approach eliminates spreadsheet mania – the problem of having multiple spreadsheets containing unlinked data sources and figures that cannot be tallied against each other.
Getting Business Benefit from a Data Lake – Here’s Some Ideas!Analyse unstructured data – and only apply a schema (or definition of the data) at the time of analysis Move long-running ETL to the data lake to reduce the overnight batch window As cheaper hardware can be used – and data lake software is lower cost – then it’s a good place to store large amounts of data that is used infrequently Analyse real-time streaming of data being received by applying real-time predictive models and actions to this data Allow business users to introduce external data they would like to analyse against current structured data warehouse data. Over time it can be determined whether this data is valuable. If it is, it will be easy to justify the effort of create structured data to be stored in the data warehouse
Gain enterprise wide visibility or put an end to data silos – an opportunity to capture a 360-degree view of your customers
Analyse your customers’ social media trends