Protecting your Virtualized Data
24 November 2020
As we discussed in my previous article on Data Virtualization, it is a great tool for making tables from several heterogenous databases appear to be in one database. This is very convenient for Busines Analysts, Data Scientists and Application Developers, because they can easily find the data they need in one database and only have to establish a connection to that database. Another nice thing about having your users’ only access to those many sources through DV is that you now have one place to control that access instead of controlling it through a hodgepodge of different databases with different means of controls. In this article I will discuss various aspects of securing your virtual tables.
For larger Cloud Pak for Data implementations with both Data Virtualization and Watson Knowledge Catalog (WKC), a good way of controlling data access is to use WKC data protection rules. I will discuss that in an upcoming article since it used for more than just Data Virtualization. However, for those implementing just Data Virtualization there are still robust controls available and I will spend the rest of this article on those.
As noted in my previous article, only tables that you have “virtualized” are available to be queried by most DV users. While it would be useful to virtualize most tables so that they can be easily found by your users, it might make sense to not virtualize tables with sensitive data that you don’t anticipate anyone needing. Like any database, only users who have a user id and password can connect to the Data Virtualization database. So, if you only have a small group of users who all have a high level of access, then the choice of tables to virtualize would get less scrutiny. On the other hand, if you have a large constellation of users with varying levels of access this becomes more important.
When a table is first virtualized, the virtual table can only be accessed by the user creating it and certain highly-privileged users. No one else has access. So the most straight forward way of controlling access to virtual tables and views built on them is to only grant access to users who need it. This is done via the “Manage Access” function for each virtual table or view on the My virtualized data page. The manage access menu item is presented when you click the three dots to the right of the table.
When you manage access for a virtual table or view, you can click a radio button to give all DV users authority to select from the table. This may be convenient, but should be done sparingly for implementations with diverse user access levels. From the manage access page, you can also grant select authority by choosing specific users or roles. If this is done then users given authority, either directly or by being part of the role, can select from the tables and those not granted access cannot.
One of the items in the DV menu is the “My virtualized data’ choice. When you select it you see a list of virtual tables (tables that have been virtualized). Depending on the setting of “Restrict Visibility” you may be able to see all of the virtual table names or just the ones on which you have been granted select authority. In either case you can only select from the ones to which you have been granted that authority. The DV administrator sets the Restrict Visibility switch from the Service settings item in the DV menu. Also on the Service Settings page is a check box to enable Watson Knowledge Center data protection rules on virtual objects . If you are not an administrator, you will not see the Service Settings menu choice.
Data Virtualization also comes with four built-in roles. The roles are:
From the User management DV menu choice, the administrator can grant access on virtual tables to one of these roles. When that is done, everyone with that role can query virtual tables or views that have been granted to that role. These roles not only determine which virtual tables you can access, but also determine what you can do on the DV console. For example only a user in the Admin role can manage users and put them into roles, where both the Admin and Engineer roles can virtualize data. You can see all privileges of the roles on the Managing users (Data Virtualization) page.
Group Users By Database Roles
For smaller user communities it may be convenient to grant access to each table to individual users or to group uses by the built-in roles. However, for larger communities with diverse access requirements this could be quite inconvenient. So, you can create database roles to group users for the sole purpose of organizing access to virtual tables and views. You must belong to the built-in Admin DV role to create and manage roles. The general flow is:
1. Create a role. Roles have names.
2. Grant access to all of the tables that the role needs
3. Grant users access to the role.
As new users are added or removed, roles make it quite convenient to give or revoke access to large groups of tables. Database role management can only be done through SQL either through the DV SQL Editor page or through any client that allows you to connect and query the database.
As an example, imagine that I have virtualized tables accounting.PATIENT and user1030.HOSPITAL and I wish give access to users user1008 and user1012. I might use the following commands to implement this access:
CREATE ROLE ACCOUNTING;
GRANT SELECT ON TABLE ACCOUNTING.PATIENT TO ROLE ACCOUNTING;
GRANT SELECT ON TABLE user1030.HOSPITAL TO ROLE ACCOUNTING;
GRANT ROLE ACCOUNTING TO USER user1008;
GRANT ROLE ACCOUNTING TO USER user1012;
Once these commands are successfully run, user1008 and user1012 can query the tables. Further if you then want to allow this same access to user1010, then all I need to do is grant the accounting role to user1010. The database roles feature was borrowed from Db2 and you can read about it on the Roles page for a description and links to command details. As of the writing of this article LDAP groups are not yet available to use instead of roles.
Limit Access to Rows or Columns
So far we’ve discussed access to virtual tables as either on or off. A user has access or he doesn’t. But what if you want a user to have partial access to a table meaning that a user should only see certain rows of a table or certain columns? In this case you would create a view on a virtual table, granting a particular user or role access on the view but not on the virtual table. This strategy has been used for many years in the relational database world.
To illustrate, imagine that you wanted the doctors to be able to view most information about patients, but you only wanted them to see the last four digits of a patient’s social security number and only wanted them to see rows corresponding to their own patients. In this case we need to create a role for primary care physicians called PCP, create a view on the patient table and grant access to the view but not the underlying patient virtual table. Here is how that could be done assuming that one of the doctors connects to DV as user1010.
CREATE ROLE PCP;
create or replace view limited.patient as
select 'XXX-XX-' || substr(SSN,8,4) SSN
where PCP_ID = SESSION_USER;
GRANT SELECT ON TABLE LIMITED.PATIENT TO ROLE PCP;
GRANT ROLE PCP TO USER user1010;
GRANT ROLE PCP TO USER user1011;
The VERIFY_ROLE_FOR_USER function can be useful if you want to use roles in the where clause instead of a user id. Once the objects above are created and I connect as user1010, I run the following select and get the results shown:
select * from limited.patient;
Even though user1010 doesn’t have any access to the virtual table ACCOUNTING.PATIENT, she can still select from the view LIMITED.PATIENT because select access has been granted to a role of which she is a member. However, each user can only see the rows where PCP_ID contains their user id because of the where clause condition in the view. Note that Db2 converts user ids to uppercase when you connect, so the values in the table should be upper case. Also please note I use the SESSION_USER special register to bring the user id into the query. Further, anyone selecting from this view will only see the last 4 digits of the social security number due to the function used in the view. While my example uses a simple predicate and one simple function, you can use any SQL you like including joining other tables that may be needed to cross reference a user id to the rows in the subject table that you really want. Also many other functions can be used to mask data.
So far, we have discussed users that are defined to DV. However, you may want to have an application server or other server that connects using a generic service id. Even in this scenario, you may want to monitor usage by end user or even control access to virtual tables and views by end user. In that case you will want to use Trusted Context. Trusted Context allows a user defined to DV to connect, and then by changing properties on the connection, tell DV who the real end user is for the current transaction. The user can be another user defined in DV, but can also be a user that DV doesn’t know about. Trusted Context is another feature borrowed from Db2 and you can read about how to configure it in my Trusted Context article.
You will notice that in my article, I am only using trusted context to see the end user running queries at any one time. However, you can grant roles and direct privileges to end users that appear as part of a trusted context that control what access those users have. It is interesting to note that you can grant a role or select authority to a user that does not actually exist in DV and only comes into being as part of a trusted context connection. This means that even though there is no user “BOB” defined to DV, you can successfully run this command:
GRANT ROLE ACCOUNTING TO USER BOB;
As you may notice in my article, a role is granted to every user coming in through the trusted connection. In addition, any role granted specifically to a user prior to the trusted connection is also in effect. This means that all roles whether granted explicitly to a user or by a trusted context are in effect.
I’ve put a Python Notebook on my github repo with additional examples of working with Trusted Context in Data Virtualization that contains additional commands that you may find useful.
This article describes some of the features for restricting access in Data Virtualization. If you find other interesting ways of doing this, please tell us about them on my db2Dean Facebook Page and share your thoughts about them.