Always on Data Masking using DB2 RCAC
Dean Compher
30 March 2020
One way of getting the most representative data for testing is to copy all or a portion of your production Db2 data to a test database. Unfortunately, this is not always feasible because of sensitive data in some columns. Using Db2 Row and Column Access control, you can make sure that data in those columns are masked for certain users no matter how they query the production data. So they can run any query to pull data for testing and that data will be automatically obfuscated without them doing anything. In this article I’ll describe how this can be done using examples.
The examples and methods I use in this document are only intended to show a way to do use RCAC to mask data for testing. The simple masking examples I share are probably not very secure so you should make sure to use stronger masking algorithms for RCAC to use. You should also thoroughly test this before actually using it if the data really needs to be safe.
For more detailed information on RCAC, please see my earlier Db2 Row and Column Access article. RCAC allows you to create an object called a mask in Db2 that ensures that when certain users query a table, that they will get a different result than what is really in the column for which the mask is built. A separate mask is built for each column that you want to obfuscate. No matter how that user queries the table whether directly, through a view or other means, they will always get something different than real value of the column.
I’ll show how to do this using examples. To start with, I have three interesting users defined to my database server:
db2inst1 |
Instance owner that creates the mask objects |
bob |
Super user who is allowed to see the columns as they are |
testdata |
User that should only see obfuscated data. |
To let bob and testdata get access to the table, I started by creating the following roles and granting access to employee and cust_crdt_card tables:
create role EXTRACT_FOR_TESTDB;
create role SUPER_USER;
grant role EXTRACT_FOR_TESTDB to user testdata;
grant role SUPER_USER to user bob;
grant select on table gosaleshr.employee to role EXTRACT_FOR_TESTDB;
grant select on table GOSALESCT.CUST_CRDT_CARD to role EXTRACT_FOR_TESTDB;
grant select, update, delete on table gosaleshr.employee to role SUPER_USER;
grant select, update, delete on table GOSALESCT.CUST_CRDT_CARD to role SUPER_USER;
The tables I use are the ones from the GOSALES sample table and data that are publicly available. You can get it from the GSDB page add the tables to your database and try these examples yourself.
Next I’ll show three examples using different approaches to masking columns. I start by masking the last name by substituting another value from a table of names. Next, I change birth_date by using date arithmetic. Both of these columns are in the employee table. Finally, I’ll create a user defined function and call it to mask the credit card number in the cust_crdt_card table.
A common way of obfuscating a name is to substitute names from a list of names. This typically makes testing better than just substituting random strings of characters. If you don’t have a list of names, address, etc. you can download a list from the web for a small fee. I was able to get a list of 42 names from another sample database, and created a table called sub_name_list with an integer column called seq that starts with 1 in the first row and increments by one in each row up to 42. Here is a description of the table and a few rows of output:
COLNAME |
TYPESCHEMA |
TYPENAME |
LENGTH |
SCALE |
NULLABLE |
SEQ |
SYSIBM |
BIGINT |
8 |
0 |
N |
FIRSTNAME |
SYSIBM |
VARCHAR |
12 |
0 |
N |
LASTNAME |
SYSIBM |
VARCHAR |
15 |
0 |
N |
SEQ |
FIRSTNAME |
LASTNAME |
1 |
CHRISTINE |
HAAS |
2 |
MICHAEL |
THOMPSON |
3 |
SALLY |
KWAN |
4 |
JOHN |
GEYER |
5 |
IRVING |
STERN |
To create the mask for last name, I use a hash function that converts the input string (last name from the table being masked) into a number between 1 and 42. I then use that number to select one of the names in the substitution table (sub_name_list). One desirable characteristic of using this method of substitution is that with the same input string you always get the same output value as long as you don’t change the substitution table. In this example, I’m substituting the LAST_NAME_MB column.
CREATE MASK LAST_NAME ON GOSALESHR.EMPLOYEE FOR
COLUMN LAST_NAME_MB RETURN
CASE WHEN
VERIFY_ROLE_FOR_USER(SESSION_USER,'EXTRACT_FOR_TESTDB') = 1
THEN (select lastname from db2inst1.sub_name_list
where seq = dbms_utility.get_hash_value(LAST_NAME_MB, 1,42))
ELSE LAST_NAME_MB
END
ENABLE;
ALTER TABLE GOSALESHR.EMPLOYEE ACTIVATE COLUMN ACCESS CONTROL;
Note that the column being masked, LAST_NAME_MB, is created as nullable. Using a select statement in the mask like this does not work on columns declared as not null. Another thing to note is that for security purposes you typically only give full access to certain users or roles and everyone else gets a masked version. Since this is specifically for altering data for testing that may not be for security purposes, this mask only obfuscates data for users in a particular role.
It is convenient that this table has a LAST_NAME and a LAST_NAME_MB column that have the same values. So here is the output for super user bob when he queries the employee table.
CONNECT TO sample USER bob USING password HOST localhost PORT 50000;
select LAST_NAME, LAST_NAME_MB from gosaleshr.employee fetch first 5 rows only;
LAST_NAME |
LAST_NAME_MB |
Pagé |
Pagé |
Michel |
Michel |
Clermont |
Clermont |
Jauvin |
Jauvin |
Wiesinger |
Wiesinger |
However, when testdata does the same select, you can see the last_name_mb is changed:
CONNECT TO sample USER testdata USING password HOST localhost PORT 50000;
select LAST_NAME, LAST_NAME_MB from gosaleshr.employee fetch first 5 rows only;
LAST_NAME |
LAST_NAME_MB |
Pagé |
SMITH |
Michel |
YAMAMOTO |
Clermont |
MONTEVERDE |
Jauvin |
JONES |
Wiesinger |
SCHWARTZ |
Here is my example of masking birthday using some date arithmetic and the differing outputs that bob and testdata users would get.
CREATE MASK BIRTHDAY ON GOSALESHR.EMPLOYEE FOR
COLUMN BIRTH_DATE RETURN
CASE WHEN
VERIFY_ROLE_FOR_USER(SESSION_USER,'EXTRACT_FOR_TESTDB') = 1
THEN BIRTH_DATE + 2 DAYS - 1 YEARS + 1 MONTH
ELSE BIRTH_DATE
END
ENABLE;
ALTER TABLE GOSALESHR.EMPLOYEE ACTIVATE COLUMN ACCESS CONTROL;
select LAST_NAME, BIRTH_DATE from gosaleshr.employee fetch first 5 rows only;
bob |
testdata |
||||||||||||||||||||||||
|
|
The masking algorithm you use in your mask can be many of the built-in functions like the dbms_utility.get_hash_value function shown in the name mask example above or a function you define (a User Defined Function or UDF). There are certain restrictions on the functions that can be used as described on the CREATE MASK knowledge center page. Creating a function is a good idea if you are going to use the same algorithm to mask more than one column.
Next I show an example of creating a user defined function that changes the 9th digit of a credit card number by adding 4 to it and if the result is more than one digit, then it subtracts 9 from it. This is a very insecure algorithm for masking a credit card number and it would invalidate the check digit, but my only goal was to show how to use a function in a mask. I do not presume I know anything about obfuscating credit cards and a better algorithm is in order if you want to do this. With that said here’s the example:
CREATE OR REPLACE FUNCTION MASK_CARD(IN_CARD CHAR(57))
RETURNS CHAR(57)
CONTAINS SQL NO EXTERNAL ACTION DETERMINISTIC SECURED
BEGIN
DECLARE OUT_CARD CHAR(57);
DECLARE CD CHAR(1);
DECLARE CI INTEGER;
DECLARE EXIT HANDLER FOR SQLEXCEPTION RETURN(FALSE);
SET CD = SUBSTR(IN_CARD,9,1);
SET CI = INT(CD);
SET CI = CI + 4;
IF CI > 9 THEN SET CI=CI-9; END IF;
SET CD = CHAR(CI);
SET OUT_CARD = SUBSTR(IN_CARD,1,8) || CD || SUBSTR(IN_CARD,10,48) ;
RETURN(OUT_CARD);
END
@
CREATE MASK CRED_CARD ON GOSALESCT.CUST_CRDT_CARD FOR
COLUMN CUST_CC_NUMBER RETURN
CASE WHEN
VERIFY_ROLE_FOR_USER(SESSION_USER,'EXTRACT_FOR_TESTDB') = 1
THEN MASK_CARD(CUST_CC_NUMBER)
ELSE CUST_CC_NUMBER
END
ENABLE;
ALTER TABLE GOSALESCT.CUST_CRDT_CARD ACTIVATE COLUMN ACCESS CONTROL;
select CUST_CC_ID, cust_cc_number from GOSALESCT.CUST_CRDT_CARD fetch first 5 rows only;
bob |
testdata |
||||||||||||||||||||||||
|
|
One interesting thing is that neither bob nor testdata need to have authority to execute the function for the mask to function properly. In my database they didn’t and got an error when I tried to execute the function directly as either of them.
To develop and test the code shown in this article, I primarily used a Python Jupyter notebook. It contains additional stuff that I used in testing and developing this, and you are free to review and/or download it from my notebook on github.
As noted before, any masking process I show are may or may not be that great or very secure. If you want to do really secure obfuscation of your test data, you should consider using the Optim Data Privacy product. In addition to a robust system to copy, subset and mask your test data, it provides several database user defined functions that you can implement into your Db2 database. Once available in the database, you can easily call them in your masks.
***
I’m sure that there are many other things that you can do to ensure data is always changed when queried by certain users Files so please add any thoughts you have on the subject to my Facebook Page and share your thoughts about them.