Dowiedz się więcej na temat naszych produktów. Zobacz nasz blog
  • EN
  • PL
  • Processing personal data is an integral part of the digital reality. Companies face many challenges. Cybercrime poses a serious threat. Society is becoming more aware of its privacy. Legal regulations, including GDPR, set specific requirements. Companies need to approach them flexibly and responsibly. However, it’s hard to determine the impact of these elements on the development of artificial intelligence (AI).

    Cybercrime

    Cybercrime is a significant challenge in today’s world. The intensity of this problem is increasing every year. In business language, terms such as hacking attacks, phishing, and ransomware have become common. Companies processing personal data must invest in IT security systems. They also need to train their teams. This allows them to effectively counter threats. Data security is a priority for every business.

    Social awareness of privacy

    The right to privacy is a basic human right. In the era of digitization, it becomes even more important. Customers are increasingly aware of their rights. They expect companies to be transparent in data processing. Companies that do not meet these expectations risk losing customers’ trust. This, in turn, can lead to a loss of their loyalty.

    The right to privacy is a basic human right. It enables control over information about oneself. It protects against improper use. GDPR, as a European regulation, crucially treats this right.

    GDPR obliges companies to protect the processed data, and companies must inform individuals about the purposes of processing their data. They also must allow them access to these data.

    GDPR introduces the right to be forgotten, which means that an individual can request the deletion of their data in certain situations. Additionally, the right to data portability is another element of GDPR that allows data to be transferred between service providers.

    So, the right to privacy in the context of GDPR is control over personal data. It’s protection against their improper processing.

    Increasing legal requirements

    GDPR is a key regulation regarding data processing. It introduces many requirements that companies must meet. Severe sanctions threaten violations of these regulations. They can reach up to 20 million euros. Alternatively, it can be 4% of the company’s global turnover. This demonstrates the seriousness of the right to privacy in the European Union. Companies, therefore, need to dedicate resources to ensure GDPR compliance. This is also necessary for other legal provisions.

    Artificial Intelligence vs. GDPR. Who will win the battle?
    Photo by Markus Winkler on Unsplash

    GDPR and artificial intelligence

    Artificial intelligence is an area of progress and innovation. However, it must comply with GDPR regulations. Companies are obligated to inform customers about the processing of their data by algorithms. They also need to provide a “right to explanation” for automatic decisions. Such requirements can slow down the development of AI. At the same time, they can encourage the creation of more “transparent” and ethical AI models.

    Solutions and the future

    Challenges in data processing are significant, but surmountable. New technologies can increase privacy and data security. Companies investing in advanced AI must integrate GDPR principles at the design stage. This is known as privacy by design. Training employees and building a culture of respect for privacy is key. This applies to all levels of the organization.

    Conclusion

    Data has become the new “gold” in our world. The ability to manage it is the key to success. GDPR may pose a challenge, but it also offers an opportunity to build trust. Increasing customer loyalty is possible thanks to this. We observe dynamic development of technology, including artificial intelligence. An approach to data that combines innovation and ethics will be the key to future success. What do you think, who will win the battle, GDPR or artificial intelligence?

    Data protection is becoming an absolute priority. Anonymization – a process that protects the privacy of individuals whose data we store – is becoming indispensable. In this context, we present Nocturno, a tool that enables efficient data anonymization.

    The Leading Tool for Data Anonymization

    Nocturno, an advanced anonymization tool, utilizes extensive dictionaries and generators, preserving the characteristics of the database. It anonymizes various systems at once, maintaining data consistency. It handles parallel processing and caching, as well as anonymizing personal data and other sensitive data, such as financial ones.

    Who is Nocturno For?

    Nocturno is perfect for teams planning to anonymize data. This includes teams creating, developing, and maintaining systems with sensitive data. It is also useful for the software testing process and for transmitting anonymized reports to other systems. Nocturno integrates with our other product – Detecto, which helps to easily find sensitive data.

    How Does Nocturno Work?

    Nocturno enables rule declaration and built-in generation of common data types. Given the complexity of modern IT systems, this tool is designed to adapt seamlessly to the database structure. Nocturno ensures consistent anonymized data across databases, supporting multiple types and languages.

    Key Features of Nocturno

    Nocturno is a versatile tool that declares rules for values to be anonymized, an aspect that significantly enhances the data protection process. Moreover, this advanced software is capable of generating the most common data types, demonstrating its adaptability and broad range of functionalities. Nocturno seamlessly adapts to modern IT systems, ensuring robust data handling.

    Additionally, one of the vital features of Nocturno is ensuring the consistency of anonymized data across different databases, facilitating a reliable and uniform data protection strategy. As part of its comprehensive functionality, Nocturno supports various types of databases, including MySQL, DB2, SQL Server, Oracle, and PostgreSQL, offering wide-ranging compatibility for diverse IT environments.

    Furthermore, Nocturno showcases its capabilities by generating various types of data, such as first and last name, PESEL, NIP, REGON, and address, contributing to its extensive data anonymization solutions. And to cater to the global digital era, it even generates data in different languages, proving its readiness to handle multinational and multicultural data privacy requirements.

    Data Anonymization Process

    The first step in preparing a company for the implementation of Nocturno is understanding what data is stored and where. An audit of all systems and databases should be carried out to identify which data is sensitive and requires anonymization.

    Next, it is worth identifying all the teams that will use Nocturno. All these teams should be informed about the new tool and provided with appropriate training.

    It will also be helpful to prepare a test plan. Finally, the company should be ready to introduce changes in its systems.

    Benefits of Nocturno

    Quality: Importantly, anonymization with Nocturno guarantees data quality, essential for high-quality development and testing processes.

    Automation: Nocturno integrates with CI/CD tools like Jenkins for efficient anonymization automation.

    Efficiency: Nocturno’s high efficiency ensures a seamless and inconspicuous anonymization process.

    Testing Support: Moreover, data anonymization allows software testing without having to operate on actual personal data.

    Safety: Nocturno enables effective access restriction to enhance data security.

    Convenience: Automatic anonymization allows for effortless database restoration, enhancing everyday convenience.

    Photo by Michael Rosner-Hyman on Unsplash

    Integration with Detecto

    Nocturno integrates with Detecto, a tool that helps to easily search for sensitive data in various systems and databases.

    Data Anonymization with Nocturno

    Data anonymization is crucial in today’s digital world. Nocturno enables efficient and safe data anonymization, protecting user privacy and compliance with the law. Although it requires some effort and time, the benefits of data anonymization are significant. In the digital age, where data privacy is key, Nocturno offers a reliable and efficient solution for data anonymization.

    It would be hard to miss all the articles on the topic of GDPR and all the various, terrifying sanctions that could be put upon an entity for non-compliance. Few, however, delve into important details, such as the significance of anonymization or data retention, which allow for avoiding all these sanctions and make the work of developers significantly easier. For this reason, we decided to explain in an accessible way what anonymization and retention of personal data are and show why their proper implementation is of such importance in the software development process. Today, let us tackle anonymization.

    What is anonymization?

    Anonymization is a process that allows you to permanently remove the link between personal data and the person to whom the data relates. Thanks to this, what was previously deemed as personal ceases to be that.

    What does it look like in practice?

    The definition above becomes less complicated when presented with an example. Let us imagine, for example, Superman – a comic hero from Krypton who wants to hide his identity and blend in with the crowd.

    Name Superman
    Occupation Superhero
    Origin Krypton

    During the anonymization process, Superman enters the telephone booth, puts on glasses and a tweed suit, and becomes Clark Kent, a reporter from Kansas.

    Name Clark Kent
    Occupation Reporter
    Origin Kansas, USA

    Through the anonymization process, Superman’s data turned into Clark Kent’s, and there is no connection between these two people. This is fictitious data that can be safely used, e.g. in test environments.

    The example above illustrates the process of anonymization itself. Let us now consider why it is important that the anonymization is of good quality.

    Irreversibility

    The foundation of anonymization is its irreversibility. We should never be able to find out what the original data looked like, based on the anonymized data. Clark’s associates should not be able to discover his true identity.

    When we anonymize a data set, usually only a fragment of the data will undergo change. However, we must ensure that non-anonymized data does not allow the anonymization process to be reversed for the entire set. In our example, we would not have to change Superman’s favorite color. However, if we do not anonymize his origin, we would certainly cause a sensation.

    True to reality

    An important qualitative measure of anonymization is also how well it imitates reality. If Superman and all other people in the data set are anonymized as follows:

    Name X
    Occupation Y
    Origin Z

    we have no doubt that the process is irreversible, but its usefulness is questionable. Person X does not look like someone who exists in reality, and the nature of the original data has not been preserved. The length of the names were not preserved, and the data itself looks unbelievable with all of the people having the same name. In the case of IT systems, the tester using such data would run into a lot of issues, he would not even be able to distinguish between people.

    Repeatability

    Another feature of good anonymization is its repeatability. When anonymizing the data set, we want to make sure that each time the data set would be anonymized in the same way. We want Superman to always become Clark Kent, no matter whether it’s the first or the tenth anonymization. This is especially important from the point of view of Quality Assurance. Testers often create test cases based on specific data. If this data were changed every time, the tester’s work would certainly be more difficult!

    Integrated systems

    Today’s IT world is represented by countless systems connected with each other. Hardly any application can function as a single organism. Systems connect with each other, exchanging data and using each other’s services. Therefore, when approaching anonymization, we must consider the process not only for one system, but for many systems at once. The challenge is for anonymized data to be consistent throughout the entire ecosystem. This means that if the Daily Planet (Clark’s workplace) has a human resources system and a blog, then in both applications Superman will become Clark Kent.

    Efficiency

    The last key parameter affecting the quality of anonymization, from my point of view, is performance. IT systems process huge data sets measured in gigabytes or even terabytes. Anonymization of such databases can be time consuming, therefore, we must ensure not only security but also good speed of the anonymization process. One of the things Superman learned after arriving on Earth is that time is money. This saying rings even more true in the case of modern IT.

    I invite everyone interested in the topic of data retention to read my next article, which I plan on publishing shortly.

    Artur Żórawski, Founder & CTO

    Good quality tests require good data – data that is the most accurate representation of reality. A copy of production data is very often used for this purpose. Such a dedicated test environment is often used to reproduce tickets, debugging issues with data and performing stress tests. Setting aside the fact that this practice is most often incompatible with the GDPR, while the production environment is monitored and audited like a fortress, and only a few people have access to it, non-production environments are treated much less restrictively. The number of people with access to them (not including the users) is also much larger. Many serious leaks of personal data were not caused by hacking into the “fortress”, but by abuse of these “unprotected settlements”.

    In the area of ​​test data, there are usually two extremes – personal data  is either processed by testers and developers in production database copies, or, we wait half a year to refresh test environments with artificial data, usually poorly prepared. The solution to this problem could be the implementation of anonymization, but as it turns out, this is not an easy task.

    Challenges associated with designing the anonymization process

    Simple data masking can work in simple cases, but you can quickly see that this is not enough for applications that we usually work with every day. On the other hand, when reviewing existing solutions, we noticed that they did not meet our needs – most often they did not support mechanisms to maintain data consistency between different databases. It was also difficult to find a solution that supported the automation of the anonymization process. The most popular tools didn’t allow for defining your own generators, not only regarding a single record, but also taking into account the distribution of data. By implementing a solution that meets these requirements yourself, one will quickly encounter obstacles:

    Happy medium

    However, there exists a happy medium – ensuring free access to high-quality data reflecting the characteristics of production data, while ensuring the security of the solution and compliance with legal regulations. This happy medium is Nocturno – a data anonymization tool that we designed together as a team. While working on this solution, we decided to take care of:

    – Maintaining full data consistency – not only within the schema or database, but all data sources within the organization (databases of various suppliers, LDAP, file sources, etc.)

    What do we gain by implementing good-quality anonymization?

    By implementing anonymization, we are able to reduce the number of people who have access to personal data to the absolute minimum. Due to the good quality of the anonymized data, its use for software development purposes is transparent and compliant with the GDPR. The process based on Nocturno is easily configurable and maintainable by developers – it can be simultaneously developed in the same codebase as the application.

    Nocturno supports two main implementation scenarios:

    The picture above portrays Nocturno’s role in the automatic process of providing anonymized copies of databases.

    More information about Nocturno can be found here: https://wizards.io/en/nocturno-en/. If you have questions about the anonymization process, please feel free to reach out.

    Marcin Gorgoń, Senior Software Engineer

    Soon it will be twenty years since I joined the world of IT. During this time, I have observed how the environment has changed, how development processes have developed and what new tools have been used. Over time, many processes, including repetitive tasks, were automated. Companies implemented Continuous Integration and Continuous Delivery. All of this change has been motivated by a single thought: let software developers focus on system and business development.

    Enter GDPR

    The entry of GDPR into life shook the IT world and changed the rules of the game. The development process became more complicated and operating on personal data became a big risk that had to be addressed. Working in a software house, we saw these issues clearly because they occurred in each of our projects. In theory, we were prepared for GDPR. We completed the appropriate courses and the company was armed with documents and records. In practice, it turned out that legal restrictions and the uncertainty associated with the entry of this regulation into force impacted our everyday work. Gone was my dream of unhindered development, where we could focus solely on producing quality software.

    Shortly after the appearance of GDPR regulations, we started looking for available solutions. The tools that we were able to find did not meet our project needs because every day we developed entire integrated ecosystems created in various technologies that exchanged personal data. I felt as if I had travelled two decades backwards in time.

    Change of status quo

    Ultimately, a group of people in the company emerged that set themselves the goal of changing the status quo. We knew what was required and how our plan could be implemented. We had never faced such a challenge before. Together, however, we managed to create a set of tools that ended up being a Godsend for us.

    Anonymization of data

    We started by anonymizing data in test environments. We created a tool that was able to handle many applications at once, taking into account the specificity of Polish law, and do its work efficiently.

    The created solution was to support all of our projects, so high configurability and the ability to adapt to various requirements was the priority. We included anonymization in Continuous Integration processes and quickly implemented them in our projects. It turned out that the most painful aspects of GDPR are now handled automatically and no longer cause sleepless nights to the development team.

    Retention of personal data

    The next step was the retention of personal data, which is necessary in almost every system. Taking care of this aspect in a single application is easy. Performing data retention in ten integrated systems is much more difficult, and in a hundred – virtually impossible. It was clear to us that we did not want to repeat the same functionality in all systems that we produce. This is how another tool was born, relieving us of this burden.

    Everything was back on track, just as I had dreamed. Fortunately, GDPR turned out to be only a bump on the road in our projects.

    Wizards

    With all of this in mind, we founded a startup. We came to the conclusion that the problems we had been dealing with were being experienced by many development teams, and we now had the ready solution.

    That is why we decided to create Nocturno and Oblivio, about which you will be able to read more soon on our company profile.

    Artur Żórawski, Founder & CTO of Wizards