Data Lakes: An Introduction

In the world of business intelligence and analytics these days, data is considered king. In fact, “big data” is the industry buzzword that data and analytics practitioners are using, and this refers to the massive amount of information that is available at one’s fingertips today.

This data is also characterized as real-time, meaning that it changes and updates almost every second of the day. From round-the-clock news coverages and stock market updates, to social media posts and feeds, real-time data is huge.

Data warehousing

Traditional data management systems collate information straight from the source, process it and store it in a data warehouse. Companies would then access this information as needed, running it through further analytics to generate reports that would then guide business decisions—from marketing and sales, logistics and delivery to customer service management.

With the deluge of real-time information in the world today, this approach is no longer relevant, efficient or effective. Traditional systems are not equipped to tackle live streams of information. Real-time data management solutions such as database replication software are the business tools that are now emerging.

Real-time data replication

Real-time data replication does not capture primary data itself, but points such as log changes. Monitoring and tracking log changes are copied into an analytical system, which generates the reports.

Companies can then use the insights in a more timely manner to respond to trends, behavior or decisions that consumers are making at a particular point in time. Real-time data replication strategies are very useful for many businesses in various industries, such as retail, banking and finance, shipping and logistics, and even government work such as law enforcement and border security.

Data lakes

Given the new way of handling and processing data, the efficiency and relevance of data warehousing has also come into question. Real-time data analysis is necessitating a different method of storing data. Experts and industry practitioners are dubbing these as “data lakes.”

Consider how data is stored in traditional systems such as data warehouses. Experts define them as being centralized storehouses of integrated data from one or more unrelated sources. They store both present and historical data, and are used for creating trending reports such as those submitted to senior management for annual and quarterly assessments.

This definition pretty much gives a picture of the nature of traditional data analysis—highly structured, periodic and potentially time-consuming. Information in data warehouses are also limited in a sense that they are already pre-selected and pre-processed with specific uses in mind. It is also usually quantitative and highly specialized.

Data lakes immensely change the way data is collated, handled, and used in analytics.

  • All data is retained. Even data that may not have apparent use in present scenarios would be available for analysis later on. No data is turned away, so to speak, or discarded. Data lakes contain fresh, raw data.
  • All data types are supported. Even non-traditional data types can be stored in data lakes and analyzed. These include web server logs, social network activity, as well as texts and images.
  • All users are welcome. Majority of data users in an organization are involved in operations. They would be interested in the same type of reports that answer their usual questions immediately. Some bring in outside data to generate wider analysis, but a few also do deep analysis or deep dives into information. Data lakes are able to accommodate all needs of these different data users.
  • Change is welcome. Data lakes adapt easily to demands and changes from users. Data warehouse design usually take time due to the complexity of data loading and creating structures. In data lakes, users are empowered to go beyond these structures and explore answers to new questions.
  • Insight generation is quicker. As a result of all these advantages, a data lake is more immediate in providing insights that can guide more effective business decisions.

Data lakes do provide a number of advantages, and would be an important consideration for enterprises looking to make the most of real-time data and information management.

About Mohit Tater

Mohit is the co-founder and editor of Entrepreneurship Life, a place where entrepreneurs, start-ups, and business owners can find wide ranging information, advice, resources, and tools for starting, running, and growing their businesses.

Speak Your Mind

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.