The Data Lake – understanding the concept

June 8, 2019 by · Leave a Comment
Filed under: Business Analyst Skills, Data 

As data capture has grown so have some of the techniques of handling the data. For about 10 years now, the Data Lake has started to appear in the business world as part of the data capture concept.

Originally when I started out, data was distributed all over the place with business analysts having to ask for extracts from various departments to get an overall view of the company. It was time consuming.

Next came the large data warehouse accepting in data from all over the company to a central store. However it could take years to get that data into the data warehouse. At one place I worked, it was a minimum of 2 years to absorb data into the data warehouse. Delay in getting data in was caused by the need to model the data and understand it completely before it could be absorbed. Data modelers would have to work out if new tables were needed and BAs would have to justify the business cost of storing the data. Add onto this that existing reports would be expected to use the data from the data warehouse and these reports would all have to be rebuilt to use the new data structure.

As companies have evolved to produce even more data, the data warehouse wait time was increasing significantly. Waiting for centralized data however did not tie in well with corporate strategy of being able to know what is going on around the company. At this point the Data Lake concept came into being. The Data Lake is basically a collection point for all data from around a company in any type of data structure. Data does not need to be refined to end up in the Data Lake. Good and bad data is collected. Visually the Data Lake term represents departments that generate data as streams that feed the lake.

As the data collects in the Data Lake, eventually some of it will make its way into the enterprise data warehouse based on need and cost justification. By creating a Data Lake approach, it has created a one source of data for people in a company to access. Data scientists can look at what is being captured and see if any of it is of use to what they are trying to analyze.

Pros of Data Lake:

  • Centralized repository of company data which in theory makes it easier to find data.
  • Quick to capture data into as not refined in anyway.
  • Allows the data source departments to focus on supporting their applications / business and not on providing formal data extracts that have to be absorbed by a data warehouse or other team.
  • Don’t have to wait on departmental availability of resources to get access to another department’s data.

Cons of Data Lake:

  • Resources have to be hired to support the collection of data into the data lake and the sharing of it.
  • Failure to get good searchable metadata on the data being store in the Data lake would prevent the data from being discovered at a later date.
  • Resources associated with the original data generation are not part of the Data Lake team which means the personal knowledge on the Data Lake team is limited to non-existent. Data knowledge is totally reliant on the metadata captured at the time the data is stored.
  • Useful and not so useful data is captured as the focus is capturing data.
  • Dependent on cheap storage to justify the large storage costs and the resources to support the physical storage / networks etc.
  • Secure data should not end up in a Data Lake due to risk that it may be exposed.
  • Not for operational reporting where reports have to be generated in 24 hours or less of data being created.

In summary, the Data Lake concept is just a fancy way of saying centralized raw data store created from data provided via different departments in a company. A Data Warehouse can pull data from the Data Lake for storage in the Warehouse at a later date once the need for it to be stored formally has been identified.

Data handling – know when to bring the experts on board.

We all know about the Y2K incident with the 2 digit year however there are still examples of data storage length being inappropriate for the data to be stored.

If you are a Business Analyst that deals with data then it is important to always be questioning the data requirements to ensure that they meet the need of the business / application now and especially in the future.

Industries where data is critical to their function will probably leverage Data Modelers / Engineers / Scientists to manage data definition. As a BA we should not be afraid to state when the  data knowledge is beyond us and ask for the project to employ one of these specialists. Do not try and wing it because the end result can be expensive to the company.

To read up on some of the impacts of data, see this article below from the BBC:

Data Handling that led to disasters

When selling changing products, filters have to be continually reviewed – BestBuy.com example

The other day I was looking for a new laptop computer on BestBuy.com and I have to say with all the features on a laptop computer, this is one of those tougher filter opportunities for companies to present the choices.

If you look at the basic purpose of the filter, it is to allow a user to filter the results displayed on screen to be exactly or as close to what they desire to see. In terms of ecommerce web store fronts, this ability to give the customer what they are looking for, can make or break a sale.

Now if you look at the complexities of computer purchase, this can get into a very difficult situation. There is a reason why people can get a degree in being a librarian. Content Taxonomy is dedicated to providing content in a way that users can find what they are looking for. In this case, the Content Taxonomists have to consider the filter terms for each possible variation in a computer, the Tech guys have to build a system that can cope with all the filters applied and the UX guys have to find room on the screen to display the choices.

Overall I think BestBuy.com does a great job with a Laptop search however there are some opportunities for improvement.

I am going to use an example of what I found when looking on BestBuy.com for a Laptop and what additional filters could be added:

1 – Ability to select SSD instead of standard disks since Solid State Drives are becoming more common.

2 – Battery life expectancy – since some laptops will put in 8+ hours but others will not even be close

3 – Weight for ultraportable is set to one value (around 5.4 lbs) even though laptops can be almost 1/2 this weight today

4 – Graphic Card MB is important but not an option to filter by

The above is just some samples of how product changes can render an existing filter list short of choices. BestBuy.com has done a good job at staying with the trends “Touch Screen”; “Blu-ray player” to name just a few filters they have added as products change.

A company that is selling changing technology has to continually review their products to see if new filters will need to be created and old filters removed to maintain a strong ecommerce shopping experience.

 

5 Common Types of Business Analyst

November 11, 2013 by · 1 Comment
Filed under: Business Analyst Skills 

A lot of people these days call themselves Business Analyst (BA) but put them in a room to talk and you will soon see they are quite different.

This difference also creates problems for employers as they need to find the right BA for their role.

For this post I have broken the types of BA’s into 5 distinct groups that require unique talents to be successful. I am sure that there is more and feel free to comment with your suggestions. It is quite possible for a BA to have been in all 5 roles but the longer they are in one role, the weaker their skill become in the others. In later posts I will go into depth on each role so you can gain an understanding of what the pro’s and cons are.

1 – Business Process Analyst

The focus of the Business Process Analyst is to look at Business Process and see opportunities for improvement beyond the limitation of IT. They may suggest hiring additional people in a role, timing changes of process events and even communication methods. May even suggest IT solutions. Note however that they are not limited to just Information Technology. Generally a Business Process Analyst is knowledgeable about the business to the point that they could perform tasks within the business realm. In fact a lot of Business Process Analysts once worked on the business side, even gaining licenses in the chosen field. Think about Nurses or Doctors advising on business process because they know the role very well. It is unusual for an IT Business Analyst to end up in this role for they do not know the business from not having worked in it. Business Process Analysts, generally are the highest paid of the Business Analyst roles.

2 – Business Data Warehousing Analyst

Data Warehousing is all about data and to be in this role, you need to be comfortable spending you days looking at data elements, tables and database structures. This is a great role if you don’t like interviewing people to understand their job so that you can capture requirements. Most of the focus of this job is gathering data to store in the Data Warehouse and responding to request for data from the Data Warehouse. My friends in this role like it because it does not change much and they don’t have to deal with business users as much.

3 – Business Reporting Analyst

Sometimes this role is included with the Data Warehousing role but other times it is not. These BA’s spend their time pulling data and formatting it for report generation. Knowledge of databases, reporting tools and ways to slice and dice data is usually a requirement for this job. As well as a keen understanding of the requirements that go along with reports – summary fields, paging, sort order etc.

4 – Business Infrastructure Analyst

They gather the requirements around IT infrastructure upgrades. It is a technical role (at least you have to understand technical jargon relating to networks, servers and software upgrades) and rarely involves direct business involvement. Focus is more about making sure IT projects have their infrastructure requirements documented and met. Think along the lines of a Windows Upgrade project and you will get the idea.

5 – Business Application Analyst

Consider this role the more traditional IT BA role however it is going through some changes which I will discuss in a later post. These BA’s work closely with the business to provide IT solutions that tie in with their business process. Generally they are approached by the Business to provide an IT solution or enhancement to their existing business process.

So there you have it, 5 possible BA roles that you may end up in. Each role requires different skills to be successful in and I look forward to discussing them with you at a later date.