Hadoop based Data Lakes: The Beast is Born

What happened to Hadoop. Hadoop was the next big thing in… | by Derrick Harris | ARCHITECHT

1997 was the year of consumable digital revolution – the year when cost of computation and storage decreased drastically resulting in conversion from paper-based to digital storage. The very next year the problem of Big Data emerged. As the digitalization of documents far surpassed the estimates, Hadoop was the step forward towards low cost storage. It slowly became synonymous and inter-changeable with the term big data. With explosion of ecommerce, social chatter and connected things, data has exploded into new realms. It’s not just the volume anymore.

What is Big Data? Let's answer this question! | by Ilija Mihajlovic | Towards Data Science

In part 1 of this blog, I had set the premise that the market is already moving from a PPTware to dashboard and robust machine learning platforms to make the most of the “new oil”.

Today, we are constantly inundated with terms like Data Lake and Data Reservoirs. What do these really mean? Why should we care about these buzz words? How does it improve our daily lives?

I have spoken with a number of people – over the years – and have come to realize that for most part they are enamoured with the term, not realizing the value or the complexity behind it. Even when they do realize, the variety of software components and the velocity with which they change are simply incomprehensible.

The big question here would be, how do we quantify Big Data? One aspect to pivot is that it is no longer about the volume of data you collect, rather the insight through analysis that is important. Data when used for the purpose beyond its original intent can generate latent value. Making the most of this latent value will require practitioners to envision the 4V’s in tandem – Volume, Variety Velocity, and Veracity.

Translating this into reality will require a system that is:

  • Low cost
  • Capable of handling the volume load
  • Not constrained by the variety (structured, unstructured or semi-structured formats)
  • Capable of handling the velocity (streaming) and
  • Endowed with tools to perform the required data discovery, through light or dark data (veracity)

Hadoop — now a household term — had its beginnings aimed towards web search. Rather than making it proprietary, the developers at Yahoo made a life-altering decision to release this as open-source; deriving their requisite inspiration from another open source project called Nutch, which had a component with the same name.

Over the last decade, Hadoop with Apache Software Foundation as its surrogate mother and with active collaboration between thousands of open-source contributors, has evolved into the beast that it is.

Hadoop is endowed with the following components –

  • HDFS (Highly Distributed File System) — which provides centralized storage spread over number of different physical systems and ensures enough redundancy of data for high availability.
  • MapReduce — The process of distributed computing on available data using Mappers and Reducers. Mappers work on data and reduce it to tuples and can include transformation while reducers take data from different mappers and combines them.
  • YARN / MESOS – The resource managers that control availability of hardware and software processes along with scheduling and job management with two distinct components – Namely ResourceManager and NodeManager.
  • Commons – Common set of libraries and utilities that support other Hadoop components.

While the above forms the foundation, what really drives data processing and analysis are frameworks such as Pig, Hive and Spark for data processing along with other widely used utilities for cluster, meta-data and security management. Now that you know what the beast is made of (at its core) – we will cover the dressings in the next parts of this series. Au Revoir!

The New Customer Satisfaction Era

Using technology to measure and improve customer satisfaction

Let us start with an oft repeated question,” What do you know about your customer’s preferences”?

The answer could be any of the standard responses which talk about their tastes in your merchandise based on past transactional records. It could be also one of the slightly more personalised answers which talk about the customer’s likes and dislikes basis whatever they have filled in their surveys and feedback forms. Does this tell you all you need to know about your customers? Does this help you make the customer experience of that customer something which he/she will remember? Something that gets ingrained into the sub-conscious decision-making component of their minds. That is the holy grail which most CX organisations are after.

Where does data come into the picture?

NPS, CSAT and CES - Customer Satisfaction Metrics to Track in 2021

With 91 properties around the world, in a wide variety of locations, the Ritz-Carlton has a particularly strong need to ensure their best practices are spread companywide. If, for example, an employee from their Stockholm hotel comes up with a more effective way to manage front desk staffing for busiest check-in times, it only makes sense to consider that approach when the same challenge comes up at a hotel in Tokyo. This is where the hotel group’s innovation database comes in. The Ritz-Carlton’s employees must use this system to share tried and tested ideas that improve customer experience. Properties can submit ideas and implement suggestions from other locations facing similar challenges. The database currently includes over 1,000 innovative practices, each of them tested on a property before contributing to the system. Ritz-Carlton is widely considered to be a global leader in CX practises and companies like Apple have designed their CX philosophy after studying how Ritz Carlton operate.

What does this tell you- Use your Data wisely!

The next question that may pop up is, “but there is so much data. It is like noise”. This is where programmatic approaches to analysing data pop up. Analytics and data sciences firms across the globe have refined the art of deriving insights out of seemingly unconnected data to a nicety. What you can get out of this is in addition to analysing customer footprint in your business place, you get to analyse the customer footprint across various other channels and social media platforms.

Data Science vs. Data Analytics vs. Machine Learning

This aims to profile the customers who are most susceptible to local deals/rewards/coupons basis their buying patterns.

How is this done? The answer is rather simple. Customer segmentation algorithms (both supervised and unsupervised) enable you to piece together random pieces of information about the customer and analyse the effect they have on a target event. You will be surprised at the insights that get thrown out of this exercise. Obviously caution needs to be exercised to ensure that the marketeer doesn’t get carried away by random events which are purely driven by chance.

Okay- so I have made some sense out of my data. But this is a rather cumbersome process which does not make any difference to the way I deal with my customer on a day-to-day basis.

“How do I get this information on a real-time basis so that I can actually make some decisions to improve my customer’s experience as and when it is applicable?”

This takes into the newest and most relevant trend into making data sciences a mainstream part of decision making. How do we integrate this insight deriving platform into the client’s CRM system so that the client can make efficient decisions on a real time basis?

Reinventing your Relationship with Technology - PGi Blog

In Anteelo, for one of our leading technology clients, we have built an AI-based orchestration platform which derives the actionable insights from past customer data and integrates this into the customer’s CRM system so this becomes readily available to all marketeers as and when they attempt to send out a communication to their customers.

What does this entail? This entails using the right technology stack to build a system which can delver insights from the data science modules at scale. I prefer calling it out as a synergy of both data sciences and software development. Every decision that a marketeer is trying to make must be processed through a system which will invoke the DS algorithms in-built on a real time through the relevant cloud computing platforms. Insights will be delivered immediately, and suitable recommendations will also be made on a real-time basis.

Tips For Making Truly Personalised Photo Albums | Professional Printing Services | nPhoto Lab

This is the final step in ensuring that personalised recommendations being made to every customer are truly personalised. We in Anteelo call it “The Last Mile adoption”. This development is still in its nascent phase. However, companies would be wise to integrate this methodology as a part of their data science integrated decision making since it is very unlikely that they will hit the holy grail of customer satisfaction without delivering real-time personalised recommendations.

error: Content is protected !!