Written by Luc Machiels
You have followed all the steps to create a solid data foundation for your company by implementing a modern data infrastructure that can contain loads of data such as a distributed data warehouse or a data lake. You have hired data engineers to create strong data pipelines and integrate your data sources into your data lake in a structured manner. Now you want to optimize and take your data journey to the next level. The question you are left with is, how can you utilize these assets to their full potential?
The work of data science is likely to be distributed throughout the organization and no longer be centralized in the IT department; e.g., both marketing and operations will need data scientists to perform advanced analysis. In order to unlock the potential of your data infrastructure and enable business users to embark on their data science journey you need to provide them access to three things: 1) data, 2) computing resources, and 3) advanced analytical tools. But, how do you give access to these three components to a large group of users, while ensuring data protection and keeping some uniformity to reap the benefits of synergies?
How do we set up a data analytics environment
A data analytics environment utilizes the data infrastructure you have set up in your organization, whether that is e.g., a data lake on-premise or in the cloud. It provides the users with an environment where they can do exploratory analysis on large data sets in a controlled and uniform environment, without having the limitation of their laptop’s computing resources.
When expanding the data science work to business users, it is key to have the user experience at the heart of the analysis. It is important to give the users a feeling of a good interface, which is easy to navigate in, easy to connect to, and consists of tools with which they are familiar.
Understanding the needs of the users is crucial. To set up a data analytics environment you need to complete a thorough assessment of their use cases, their current data science maturity as well as their growth strategy in data science. The assessment will provide you with a common denominator for what tools and packages are commonly used across departments and should be included in a uniform analytics environment.
When creating the architecture for the environment it is important to make the project a multidisciplinary one, involving both IT architecture, security, infrastructure, data engineering, and the user teams. This is an effort to ensure that the users can access the environment in a secured manner, that data is transferred and made available according to the internal policies and current access controls, and lastly that the environment is safe to use for the users and that the data is protected. Some of the key components that all companies must consider, are access management, vulnerability scanning and patching, data leakage prevention, and security information and event management.
When you have built and implemented the data analytics environments consisting of the tools that the users need, the last step is to create a data community. The community can help you materialize the benefits of the environment as it can drive the internal data discussions, provide support, and make your organization more data-driven by ensuring that the data strategy lives among the users.
Our experience
At BrightWolves, when building data analytics environments, we work by:
Assessing the maturity and needs of the data science teams, and taking stock of what exists in the market (e.g., open source technologies)
Creating a proper infrastructure and security design that enables optimal work in your organization (e.g., allowing for open source technologies, proper vulnerabilities scanning, access management, etc.)
Designing a data architecture that provides governed data accesses while allowing users to recognize their data in a simple and efficient manner
Interacting with users to ensure that the new data analytics environment is adopted through trainings in distributed technologies (e.g., Spark or Flink), creating training material and playbooks, helping them translate their code for efficient parallel computing, and much more
These topics are cornerstones in our approach to building advanced data analytics environments. We can build robust environments within your organization that can take your data journey to the next level by giving more users more controlled access to data in a uniform environment. Are you facing issues in setting up data analytics environments or do you want to hear more about how BrightWolves can assist you in this journey? Do not hesitate to contact Luc (luc.machiels@brightwolves.eu).
Comments