Skip to main content

Write Contexts

When data is encapsulated using Antimatter, a write context is specified. This determines what hooks run on that data, e.g. for classification or for converting the structure of the data into tags that can be referenced by policy.

You can see what hooks are available with the list hooks endpoint. This will show you what hooks are available, and some indication of what kinds of tags those hooks will emit (although, as we will see later, this is not applicable for the regex and data structure classifiers):

import antimatter as am

amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
# list the available hooks
amr.list_hooks()

This returns

[{'name': 'accurate-pii',
'version': '1.0.0',
'summary': 'An accurate classifier',
'description': 'An accurate PII classifier with a latency of ~1s',
'output_span_tags': ['tag.antimatter.io/pii/id',
'tag.antimatter.io/pii/date',
'tag.antimatter.io/pii/date_of_birth',
'tag.antimatter.io/pii/credit_card',
'tag.antimatter.io/pii/email_address',
'tag.antimatter.io/pii/ip_address',
'tag.antimatter.io/pii/location',
'tag.antimatter.io/pii/first_name',
'tag.antimatter.io/pii/last_name',
'tag.antimatter.io/pii/phone_number',
'tag.antimatter.io/pii/ssn',
'tag.antimatter.io/pii/driver_license',
'tag.antimatter.io/pii/age',
'tag.antimatter.io/pii/user_id'],
'output_capsule_tags': []},
{'name': 'data-structure-classifier',
'version': '1.0.0',
'summary': 'Data structure classifier',
'description': 'Internal hook to enable tag generation from the capsule data structure.',
'output_span_tags': [],
'output_capsule_tags': []},
{'name': 'fast-pii',
'version': '1.0.0',
'summary': 'A fast PII classifier',
'description': 'A fast PII classifier with a latency of 10-30 ms, but does not perform as well as the accurate classifier',
'output_span_tags': ['tag.antimatter.io/pii/credit_card',
'tag.antimatter.io/pii/date',
'tag.antimatter.io/pii/email_address',
'tag.antimatter.io/pii/ip_address',
'tag.antimatter.io/pii/location',
'tag.antimatter.io/pii/first_name',
'tag.antimatter.io/pii/phone_number',
'tag.antimatter.io/pii/ssn',
'tag.antimatter.io/pii/itin'],
'output_capsule_tags': []},
{'name': 'regex-classifier',
'version': '1.0.0',
'summary': 'regular expression classifier',
'description': 'Internal hook to enable tag generation based on regular expression matches.',
'output_span_tags': [],
'output_capsule_tags': []}]

The default write contexts

There are two write contexts that are present by default in a newly created domain:

  • default is a write context with no classifiers, appropriate when you just need to encrypt data but are not using any read-context policy rules that reference tags
  • sensitive is a write context that uses the fast-pii and data-structure-classifier classifiers

You can edit these write contexts, or create your own.

Creating a write context

To create a write context, you can use the language bindings or the CLI. It needs a name and description, along with the list of hooks to be run. It is recommended that your write context specifies all the hooks that will be required by your read contexts later, as classification on the write path is more efficient than just-in-time classification on the read path, which is what occurs whenever a read context specifies a required hook that was not run when the capsule was encapsulated.

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

amr.add_write_context(
# The name and description of the write context
"my_wctx", am.WriteContextBuilder().
set_summary("A description of your write context for documentation").
# Which hooks need to run on the data. What version of hook should run, and if it should run synchronously or asynchronously
add_hook("data-structure-classifier", ">1.0.0", am.WriteContextHookMode.Sync).
add_hook("accurate-pii", ">1.0.0", am.WriteContextHookMode.Sync)
)

warning

In the current release, only Sync is permitted as a hook mode, which means the classification runs as the data is encapsulated. In a future release we will permit data classification to occur in the background, with the resulting tags being stored detached from the capsule and transparently merged at read time.

Regex rules

If you use the regex-classifier, then you also need to configure some rules in your write context that capture the regex patterns you want to match against, and what tags to apply when they match. A regex rule can apply both capsule tags (which apply to the whole capsule) and span tags which apply only to the data matched by the pattern. You can also specify that the pattern should be matched against the key rather than the data itself (i.e. the column name or field name of the data). If emitting span tags from a rule that is matching against the key, the span will be set to the whole value. Here is a complete example that shows the regex classifier in action:

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

# Configure a write context to use the regex classifier
amr.add_write_context(
# The name and description of the write context
"my_wctx", am.WriteContextBuilder().
set_summary("A description of your write context for documentation").
add_hook("regex-classifier", ">1.0.0", am.WriteContextHookMode.Sync)
)
amr.insert_write_context_regex_rule("my_wctx", am.WriteContextRegexRuleBuilder("my.*pattern", match_on_key=False)
# add tags to the whole capsule whenever this pattern matches
.add_capsule_tag("mycompany.com/tagname")
.add_capsule_tag("mycompany.com/tagwithvalue", am.TagType.String, "somevalue")
# add a span tag to the data that actually matched the pattern
.add_span_tag("mycompany.com/sensitive")
)

# Add a read context that uses the sensitive tag
amr.add_read_context(
"no_sensitive", am.ReadContextBuilder().
set_summary("Redact everything tagged with our custom sensitive tag")
)
# you can remove all rules from a read context like this, which helps when experimenting with different rules:
# amr.delete_read_context_rules("no_sensitive")

# Add a new rule
amr.add_read_context_rules("no_sensitive", am.ReadContextRuleBuilder().
add_match_expression(am.Source.Tags,
key="mycompany.com/sensitive",
operator=am.Operator.Exists).
set_action(am.Action.Redact)
)

# Encapsulate a sentence and read it back:
cap = amr.encapsulate("this sentence contains my secret pattern", write_context="my_wctx")
amr.load_capsule(data=cap, read_context="no_sensitive").data()

This returns:

'this sentence contains {redacted}'

You can list your write contexts and their regex rules as follows:

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

amr.list_write_context()

Which returns

[{'name': 'default',
'summary': 'Structured data',
'description': '',
'config': {'required_hooks': []},
'imported': False,
'source_domain_id': None,
'source_domain_name': None},
{'name': 'my_wctx',
'summary': 'A description of your write context for documentation',
'description': '',
'config': {'required_hooks': [{'hook': 'regex-classifier',
'constraint': '>1.0.0',
'mode': 'sync'}]},
'imported': False,
'source_domain_id': None,
'source_domain_name': None},
{'name': 'sensitive',
'summary': 'Structured data',
'description': '',
'config': {'required_hooks': [{'hook': 'fast-pii',
'constraint': '>1.0.0',
'mode': 'sync'}]},
'imported': False,
'source_domain_id': None,
'source_domain_name': None}]

For regex rules, it is:

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
amr.list_write_context_regex_rules("my_wctx")

Which, for the above example, returns:

[{'id': 'rl-b5l2a537ysm4itrw',
'pattern': 'my.*pattern',
'match_on_key': False,
'span_tags': [{'name': 'mycompany.com/sensitive',
'value': '',
'type': <TagTypeField.UNARY: 'unary'>}],
'capsule_tags': [{'name': 'mycompany.com/tagname',
'value': '',
'type': <TagTypeField.UNARY: 'unary'>},
{'name': 'mycompany.com/tagwithvalue',
'value': 'somevalue',
'type': <TagTypeField.STRING: 'string'>}]}]

Controlling who can use a write context

To control who is allowed to use a write context to create capsules, you need to create Domain Policy Rules.

Key reuse

By default, every encapsulation of data creates a new capsule, encrypted with a unique data encryption key, obtained from the Antimatter API. This is the ideal pattern in most cases, but when working with large amounts of small data that all has the same tags, you may want to treat multiple pieces of data as one capsule, encrypted with the same key. This means that only one entry gets created in the capsule manifest, and only one round trip to the Antimatter API occurs.

There is an option in the write context configuration, called "Key Reuse TTL" that defines a window of time (in seconds) where encapsulation of data with the same manually provided capsule tags and the same write context will be treated as shards of the same capsule.