GDPR: The drought for your data lake



Data lake – we’ve seen it mentioned in the IT news headlines

The new hope of IT organisations to enable their business units with actual content and valuable insights, rather than just offering servers and empty storage. Almost all companies of size and renown have embarked on this new journey and are building data lakes to sail upon them. Or maybe not?

Recent concerns raised around this data lake use case, especially since the dawn of GDPR has made people rethink the share-everything-with-everyone mindset behind these lakes.  Also, it has raised the matter of data ownership, retention, deletion and correction. Most data lake scenarios are viewed from a primarily technical perspective because that is where the idea comes from. Inevitably, and luckily not AFTER the actual release of many of these lakes into production, the legal and compliance departments have woken up.

As we are involved in quite a number of these projects, we wanted to share the main aspects to keep in mind when building your data lake with GDPR in mind. So here we go:

Employee Data

You can argue of course that once you work for a company, your data belongs to them. But it’s not that easy. First of all, this is a concept that may or may not apply in some countries. Secondly, the concept of storing employee data is one thing, the idea to use it for analytical purposes may require the employee’s consent. And that is where you run into challenges. For example in Germany, companies all have one thing in common: they have extensive employee data and are rarely allowed to use it to their advantage because of the current legislation. Through the GDPR introduction, this type of scrutiny will be imposed on all EU countries and hence become a challenge for many more businesses.

Customer Data

This should be the most traditional use case in data privacy and protection and is one of the key reasons why the GDPR debate is so viral and vibrant these days: it concerns almost all companies. There’s a lot to discuss around this particular point, but one specific aspect that is of some note is the “Right to Explanation”. If you use machine learning on user data, GDPR regulations state that “meaningful information about the logic” behind machine learning models must be made available to users.

Many machine learning models are black boxes, but the type of data used to train them should be made clear to users so that they can make an informed decision to opt out. Users should, at all times be offered the option not to have their data used as part of machine learning and artificial intelligence applications.

Device Data

With IoT and Connected-X, we all feel like we can’t really participate in modern society without sacrificing some of your privacy tied to devices and gadgets. From a legal perspective, the providers of services ask for your consent when you install mobile apps or sign up for a SaaS-type service. This is the easy part. Now, imagine you are a car manufacturer, who could gain plenty of insights and competitive advantage through collection of device/car data in that field, and has all the technology to make that happen but is not allowed to do it.

In actual fact, this is an issue. People used to buy cars without signing a data privacy agreement. Recently, privacy agreements have become an actual necessity in order to even operate the connected car services. As a business, you have to always keep in mind that just because it is device data, does not mean you can harvest and use the data for your advantage. There is a human being or an organisation behind that device who’s using it. You need their consent, otherwise, no data can be legally processed.

Prevent the Drought

So does that mean there is a chance your data lake could dry out very soon? Don’t worry, here are some relatively easy ways to address this challenge:

Anonymisation of data is one way to solve this. This means that the data is being stripped of all potential identifiers to human beings and actual end-user facing devices and collects statistical data for very specific use cases. If that isn’t possible in your given use case it’s a different story. But it must become an inherent part of all the data processing in the solution you design and isn’t bound to the data lake at all – it sits within your application.

Encryption of data can be a very easy and elegant way to address the challenge without even building much of a solution into your cloud platforms. Most of the public cloud platforms provide several mechanisms that allow encryption on various layers of the platform at no additional cost. The great thing is you can automate remediation actions based on alerts if any kind of data is being stored unencrypted into a cloud. Non-compliance to this standard is practically impossible.

Data Management Practice setup is a general requirement in order to make sure you have full visibility and (access) control over all the data your company holds, manages or has access to. Also, it is important to run a proper metadata scheme across all the data types as complete as possible so it is searchable and can be clustered.

There are many more use cases in the Big Data field that require your attention, but I hope we’ve made our point. Just because you have data (in your lake), does not necessarily mean you can actually use it. GDPR demands that you have customer and employee consent, before using any form of data collected. At Nordcloud, we combine strong expertise in Big Data, Machine Learning and IoT field with years of AWS and Azure project delivery, all wrapped up in a deep awareness of data protection and security.

Please feel free to reach out to us if you think the above sounds familiar but perhaps too complex to tackle on your own. We’re here to help.

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

    GDPR and the public cloud industry



    Given the large scale impact of this new regulation, the EU administration have released the requirements already, to give businesses, public agencies and citizens time to adapt already today and some time before it becomes actually legally binding.

    GDPR, a Key topic for 2017 and 2018

    The announcement of this new regulation caused lots of noise in the market and relevant IT media. The main challenge of course is, that not every business in Europe is compliant even with today’s regulations we have. With the General Data Protection Regulation – in short: GDPR – raising that bar quite a bit, more work for CIO’s and CISO’s is coming up; and even a CEO may well have to get involved. The topic has a significant business impact. No Europe based company can afford to ignore it and neither does any company servicing European customers from the USA or Asia.

    Given the importance of this matter to our customers, our partners and ourselves we decided to dedicate a small series of blog posts to it. We want to reflect the core requirements expressed in the GDPR that are relevant in the public cloud context. We will also talk about how we, as a provider of managed cloud services and consultancy to a multitude of businesses across Europe, are affected by it and what we do to remain compliant. Most importantly though, we will talk about the impact to our customers and how they can make sure they are compliant at all times. As always, we’re here to help and guide you towards a secure future in using cloud services.

    The Public Cloud and GDPR

    When new security and data protection standards are released anywhere in the world, they have a lot of impact on the IT parts of a business especially. Hence, in the context of public cloud services, we see a huge attention of both customers and cloud providers alike. Although it will be discussed in a dedicated blog post in more detail, we want to give you a quick overview on the state of affairs in the public cloud market as of today:

    The large players like Amazon Web Services or Microsoft Azure have already implemented a strong set of actual measures to comply with the GDPR today – have a look at their statements:

    and here:
    There are non-binding GDPR Code of Conducts that cloud businesses can comply with to show that they are adhering the regulation already today – check out the most relevant one, CISPE, here:

    GDPR is not (just) about Technology

    No matter where your cloud or hosting provider stands today, you should look at their current degree of compliance to the GDPR and most importantly, you have to make sure they allow your business to remain compliant to your customer expectations in the post May 2018 time. The compliance of your business to regulations is your responsibility, not that of your IT providers.

    Our goal is to increase awareness around GDPR holistically and how it applies to our readers, irrespectively of who they are. That means we don’t focus purely on the technical, but also on the process and organisation side of the challenge at hand. Data Protection is certainly a technical topic when it comes to implementing defence mechanisms. But without understanding the legal and regulatory background, you will just be buying tools. We are looking at things end-2-end and will guide you towards the right setup for your business and customers.

    At Nordcloud, we want you to get the most out of the public cloud. We will help you and your compliance teams understand the requirements of the GDPR and guide you towards a compliant future in the cloud. Look out for our follow-up blog posts that are going to be released on a weekly basis during the summer time.

    Get in Touch.

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.