Skip to main content

Audit Logs

Audit logging is an essential part of data control. Typically, an audit log is either implemented as instrumentation in the storage layer, or implemented in every service that accesses that storage layer. Both of those locations are suboptimal. Usually the complete picture is not known in the storage layer, so while you can know which service account accessed what, you cannot get a complete picture of what application-level action, for which end-user, this access should be traced back to. Implementing the audit log in services that access the storage layer gives you a much richer context that you can capture in the log, but it often requires duplicate logic across multiple different pieces of software and often leaves you with the worry that you may have missed an access path (e.g. what if the data is directly accessed in an S3 bucket?).

Antimatter gives you a third option. By encrypting the data and generating an audit trail whenever that data is decrypted, you get the best of both worlds:

  • You don't need to implement anything (it's built into Antimatter)
  • You can provide application level context (in the form of read parameters) that gets captured in the logs
  • You know that all access to the data will leave an audit trail - there is no other way to access the data

There are two kinds of audit log in Antimatter. The capsule access log covers all access to data such as when it is created, when it is accessed, and what data was returned or redacted as part of that access. The control log covers any changes to policy or settings within Antimatter. The logs are partitioned because it is often convenient to expose the capsule access log to the data owner (end customer) whereas it may not always be required to expose the control log.

Capsule Access Log

You do not need to do anything to enable the capsule access log. Whenever data is encapsulated, or whenever encapsulated data is read, a capsule access log entry is recorded immediately (i.e. there is no delay between when data is accessed and when it shows up in the log).

The capsule access log can be viewed in the web UI, or it can be accessed through the language libraries. As an example, here is how you would encapsulate a sentence, read it back, and then query the capsule access log:

import antimatter as am
from datetime import datetime, timedelta
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

# encapsulate a sentence and load it
capsule = amr.encapsulate("hello John Smith", write_context="sensitive")
data = amr.load_capsule(data=capsule, read_context="default").data()

# query the recent access log:
list(amr.query_access_log(start_date=datetime.now()-timedelta(minutes=1)))

This returns:

[{'id': '17bea9761b9f0ad376e0921dd10ab1fe',
'time': datetime.datetime(2024, 3, 21, 3, 20, 33, tzinfo=TzInfo(UTC)),
'domain': 'dm-A3Lxeqy53v8',
'capsule': 'ca-UkKeQkZsep3yHPMYbGnxSU',
'operation': 'read',
'session': 'sn-2exx9c1hhwn71v7htxaai2',
'location': None,
'create_info': None,
'open_info': None,
'read_info': {'parameters': {},
'read_context': 'dm-A3Lxeqy53v8::default',
'allowed_tags': {'unique_tags': [{'tag': {'name': 'tag.antimatter.io/pii/first_name',
'value': '',
'type': <TagTypeField.STRING: 'string'>,
'source': 'fast',
'hook_version': '1.0.0'},
'occurrences': 1}],
'elided_tags': []},
'redacted_tags': {'unique_tags': [], 'elided_tags': []},
'tokenized_tags': {'unique_tags': [], 'elided_tags': []},
'returned_records': 1,
'filtered_records': 0}},
{'id': '17bea97602afd78ba7e51be4166641bc',
'time': datetime.datetime(2024, 3, 21, 3, 20, 32, tzinfo=TzInfo(UTC)),
'domain': 'dm-A3Lxeqy53v8',
'capsule': 'ca-UkKeQkZsep3yHPMYbGnxSU',
'operation': 'open',
'session': 'sn-2exx9c1hhwn71v7htxaai2',
'location': None,
'create_info': None,
'open_info': {'read_context': 'dm-A3Lxeqy53v8::default'},
'read_info': None},
{'id': '17bea975f5a00a7d659404d4ed1b9a38',
'time': datetime.datetime(2024, 3, 21, 3, 20, 32, tzinfo=TzInfo(UTC)),
'domain': 'dm-A3Lxeqy53v8',
'capsule': 'ca-UkKeQkZsep3yHPMYbGnxSU',
'operation': 'create',
'session': 'sn-2exx9c1hhwn71v7htxaai2',
'location': None,
'create_info': {'write_context': 'dm-A3Lxeqy53v8::sensitive'},
'open_info': None,
'read_info': None}]

The returned fields for all records contain:

FieldMeaning
idThe globally unique identifier for the log record, you can lexically sort on this identifier to order by time
timeWhen the event occurred, in UTC
domainWhich domain this event occurred in (see Peering for more info
sessionWhich authenticated user session generated this event, you can correlate this with the control log to see which identity provider and principal is associated with this session
locationThe location of the capsule, if known
operationEither "create", "open" or "read"

In addition, depending on the operation, there will be additional information. For create operations, the write context used to process the capsule will be recorded. For open operations, the read context used to process the capsule will be recorded. For read operations, there are several additional fields:

FieldMeaning
allowed_tagsWhich types of tagged data were allowed to be read during the operation
redacted_tagsWhich pieces of tagged data were redacted during the operation
tokenized_tagsWhich pieces of tagged data were tokenized during the operation
returned_recordsHow many rows of data were returned in some form (including partial redaction)
filtered_recordsHow many rows of data were filtered as a result of DenyRecord read-context rules

Within the tags summary json, the _tags fields are broken up into "unique_tags" and "elided_tags". For tags with values, each tag=value pair is treated recorded under unique_tags, as long as there are a small number (~10) unique values for the tag. In some use cases, however, there could be thousands or millions of values associated with a tag (e.g. a tag of 'gdrive-uuid=some-uuid'). These high-cardinality tags instead get summarized in the capsule manifest and capsule access log as "elided_tags" where the distinct values are not reported (only their count). Within the capsule, however, all tags are stored in their full form. Unary tags (with no values) do not get elided.

You will notice that in the read_info object, there is a parameters field. This records the read parameters that were passed in by the application during the read operation. Those parameters can be referenced by read-context policy rules, but you can also place information into the parameters purely so it is captured by the audit log. For example, you may capture information about the user request that you are reading this data on behalf of, or capture information about what application action this data read is a part of.

Control log

For every interaction with Antimatter that isn't access or creation of data, a control log record is created. This includes:

  • When users begin an authenticated session
  • When read or write context policy is changed
  • When domain settings such as Root Encryption Keys are changed
  • When principals are added or removed from identity providers

The control log can be viewed in the UI, or it can be accessed via the language libraries. For example, here is the control log showing the authentication event corresponding to the same access log entries above:

import antimatter as am
from datetime import datetime, timedelta
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

# query the recent control log
list(amr.query_control_log(start_date=datetime.now()-timedelta(minutes=60)))

This returns

[{'id': '17bea911aca179a09bd62ba1eb994415',
'time': datetime.datetime(2024, 3, 21, 3, 13, 21, tzinfo=TzInfo(UTC)),
'session': 'sn-2exx9c1hhwn71v7htxaai2',
'url': '/authenticate',
'summary': 'Domain authentication',
'description': {
'method': 'POST',
'subject': 'PCustxqVNJSMOJCRqFFHz8HYTctBn037rDTeE9b9bao=',
'capability_names': 'subject-id,admin',
'issuer': 'apikey',
'expiry': '2024-03-21T07:13:21.81727558Z',
'not_before': '2024-03-21T03:13:11.817275445Z',
}}]