Write Contexts
When data is encapsulated using Antimatter, a write context is specified. This determines what hooks run on that data, e.g. for classification or for converting the structure of the data into tags that can be referenced by policy.
You can see what hooks are available with the list hooks endpoint. This will show you what hooks are available, and some indication of what kinds of tags those hooks will emit (although, as we will see later, this is not applicable for the regex and data structure classifiers):
- Python
- CLI
import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
# list the available hooks
amr.list_hooks()
This returns
[{'name': 'accurate-pii',
'version': '1.0.0',
'summary': 'An accurate classifier',
'description': 'An accurate PII classifier with a latency of ~1s',
'output_span_tags': ['tag.antimatter.io/pii/id',
'tag.antimatter.io/pii/date',
'tag.antimatter.io/pii/date_of_birth',
'tag.antimatter.io/pii/credit_card',
'tag.antimatter.io/pii/email_address',
'tag.antimatter.io/pii/ip_address',
'tag.antimatter.io/pii/location',
'tag.antimatter.io/pii/first_name',
'tag.antimatter.io/pii/last_name',
'tag.antimatter.io/pii/phone_number',
'tag.antimatter.io/pii/ssn',
'tag.antimatter.io/pii/driver_license',
'tag.antimatter.io/pii/age',
'tag.antimatter.io/pii/user_id'],
'output_capsule_tags': []},
{'name': 'data-structure-classifier',
'version': '1.0.0',
'summary': 'Data structure classifier',
'description': 'Internal hook to enable tag generation from the capsule data structure.',
'output_span_tags': [],
'output_capsule_tags': []},
{'name': 'fast-pii',
'version': '1.0.0',
'summary': 'A fast PII classifier',
'description': 'A fast PII classifier with a latency of 10-30 ms, but does not perform as well as the accurate classifier',
'output_span_tags': ['tag.antimatter.io/pii/credit_card',
'tag.antimatter.io/pii/date',
'tag.antimatter.io/pii/email_address',
'tag.antimatter.io/pii/ip_address',
'tag.antimatter.io/pii/location',
'tag.antimatter.io/pii/first_name',
'tag.antimatter.io/pii/phone_number',
'tag.antimatter.io/pii/ssn',
'tag.antimatter.io/pii/itin'],
'output_capsule_tags': []},
{'name': 'regex-classifier',
'version': '1.0.0',
'summary': 'regular expression classifier',
'description': 'Internal hook to enable tag generation based on regular expression matches.',
'output_span_tags': [],
'output_capsule_tags': []}]
am domain hooks
This returns
hooks:
- name: accurate-pii
version: 1.0.0
summary: Accurate PII classifier
description: An accurate PII classifier with a latency of ~1s
outputSpanTags:
- tag.antimatter.io/pii/id
- tag.antimatter.io/pii/date
- tag.antimatter.io/pii/credit_card
- tag.antimatter.io/pii/email_address
- tag.antimatter.io/pii/ip_address
- tag.antimatter.io/pii/location
- tag.antimatter.io/pii/phone_number
- tag.antimatter.io/pii/ssn
- tag.antimatter.io/pii/driver_license
- tag.antimatter.io/pii/age
- tag.antimatter.io/pii/name
- tag.antimatter.io/pii/secret
outputCapsuleTags: []
- name: data-structure-classifier
version: 1.0.0
summary: Data Structure classifier
description: A classifier that tags data based on its structure (e.g. column names or object keys)
outputSpanTags: []
outputCapsuleTags: []
- name: fast-pii
version: 1.0.0
summary: Fast PII classifier
description: A fast PII classifier with a latency of 10-30 ms, but does not perform as well as the accurate classifier
outputSpanTags:
- tag.antimatter.io/pii/credit_card
- tag.antimatter.io/pii/date
- tag.antimatter.io/pii/email_address
- tag.antimatter.io/pii/ip_address
- tag.antimatter.io/pii/location
- tag.antimatter.io/pii/phone_number
- tag.antimatter.io/pii/ssn
- tag.antimatter.io/pii/name
- tag.antimatter.io/pii/driver_license
- tag.antimatter.io/pii/id
- tag.antimatter.io/pii/age
outputCapsuleTags: []
- name: llm-classifier
version: 1.0.0
summary: custom llm classifier
description: Internal hook to enable tag generation based on classification performed by an LLM according to user-supplied classification rules.
outputSpanTags: []
outputCapsuleTags: []
- name: regex-classifier
version: 1.0.0
summary: Regular Expression classifier
description: A classifier that tags data if it matches regular expression rules configured in the write context
outputSpanTags: []
outputCapsuleTags: []
The default write contexts
There are two write contexts that are present by default in a newly created domain:
default
is a write context with no classifiers, appropriate when you just need to encrypt data but are not using any read-context policy rules that reference tagssensitive
is a write context that uses thefast-pii
anddata-structure-classifier
classifiers
You can edit these write contexts, or create your own.
Creating a write context
To create a write context, you can use the language bindings or the CLI. It needs a name and description, along with the list of hooks to be run. It is recommended that your write context specifies all the hooks that will be required by your read contexts later, as classification on the write path is more efficient than just-in-time classification on the read path, which is what occurs whenever a read context specifies a required hook that was not run when the capsule was encapsulated.
- Python
- CLI
import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
amr.add_write_context(
# The name and description of the write context
"my_wctx", am.WriteContextBuilder().
set_summary("A description of your write context for documentation").
# Which hooks need to run on the data. What version of hook should run, and if it should run synchronously or asynchronously
add_hook("data-structure-classifier", ">1.0.0", am.WriteContextHookMode.Sync).
add_hook("accurate-pii", ">1.0.0", am.WriteContextHookMode.Sync)
)
am write-context create \
--name my_wctx \
--summary "A description of your write context for documentation" \
--hook "data-structure-classifier" \
--hook "accurate-pii"
In the current release, only Sync is permitted as a hook mode, which means the classification runs as the data is encapsulated. In a future release we will permit data classification to occur in the background, with the resulting tags being stored detached from the capsule and transparently merged at read time.
Regex rules
If you use the regex-classifier
, then you also need to configure some rules in your write context that capture the regex patterns you want to match against, and what tags to apply when they match. A regex rule can apply both capsule tags (which apply to the whole capsule) and span tags which apply only to the data matched by the pattern. You can also specify that the pattern should be matched against the key rather than the data itself (i.e. the column name or field name of the data). If emitting span tags from a rule that is matching against the key, the span will be set to the whole value. Here is a complete example that shows the regex classifier in action:
- Python
- CLI
import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
# Configure a write context to use the regex classifier
amr.add_write_context(
# The name and description of the write context
"my_wctx", am.WriteContextBuilder().
set_summary("A description of your write context for documentation").
add_hook("regex-classifier", ">1.0.0", am.WriteContextHookMode.Sync)
)
amr.insert_write_context_regex_rule("my_wctx", am.WriteContextRegexRuleBuilder("my.*pattern", match_on_key=False)
# add tags to the whole capsule whenever this pattern matches
.add_capsule_tag("mycompany.com/tagname")
.add_capsule_tag("mycompany.com/tagwithvalue", am.TagType.String, "somevalue")
# add a span tag to the data that actually matched the pattern
.add_span_tag("mycompany.com/sensitive")
)
# Add a read context that uses the sensitive tag
amr.add_read_context(
"no_sensitive", am.ReadContextBuilder().
set_summary("Redact everything tagged with our custom sensitive tag")
)
# you can remove all rules from a read context like this, which helps when experimenting with different rules:
# amr.delete_read_context_rules("no_sensitive")
# Add a new rule
amr.add_read_context_rules("no_sensitive", am.ReadContextRuleBuilder().
add_match_expression(am.Source.Tags,
key="mycompany.com/sensitive",
operator=am.Operator.Exists).
set_action(am.Action.Redact)
)
# Encapsulate a sentence and read it back:
cap = amr.encapsulate("this sentence contains my secret pattern", write_context="my_wctx")
amr.load_capsule(data=cap, read_context="no_sensitive").data()
This returns:
'this sentence contains {redacted}'
# Configure a write context to use the regex classifier
am write-context create \
--name my_wctx \
--summary "A description of your write context for documentation" \
--hook "regex-classifier(>1.0.0, sync)"
am write-context classifier-rule create regex \
--name my_wctx \
--pattern "my.*pattern" \
--capsule-tag "mycompany.com/tagname(unary)" \
--capsule-tag "mycompany.com/tagwithvalue(string, 'somevalue')" \
--span-tag "mycompany.com/sensitive(unary)"
# Add a read context that uses the sensitive tag
am read-context create \
--name no_sensitive \
--summary "Redact everything tagged with our custom sensitive tag"
# Add a data policy
am data-policy create \
--name no_sensitive \
--description "Redact spans tagged with our custom sensitive tag"
# Get the policy ID
POLICY_ID=$(am data-policy list | yq '.policies[] | select(.name == "no_sensitive") | .id')
# Add a rule to the data policy to redact matching spans
am data-policy rule create \
--policy-id ${POLICY_ID} \
--comment "Redact all spans with our custom sensitive tag" \
--effect Redact \
--priority 0 \
--clause '{"operator": "AllOf", "tags": [{"name": "mycompany.com/sensitive", "operator": "Exists"}]}'
# Bind the policy to the no_sensitive read context
am data-policy binding set-read-context-attachment \
--policy-id ${POLICY_ID} \
--read-context-id no_sensitive \
--attachment Attached
# Encapsulate a sentence, writing the capsule to a file, then read it back
am capsule encapsulate --write-context my_wctx --file file.cap <<<"this sentence contains my secret pattern"
am capsule open --read-context no_sensitive --in file.cap
This returns:
this sentence contains {redacted}
Note that classification can optionally be skipped when creating a capsule. This will not
generate any capsule tags, and therefore there will be no data redaction. This can be done
using the --skip-classification
flag as shown below:
am capsule encapsulate --write-context my_wctx --skip-classification --file file.cap <<<"this sentence contains my secret pattern"
You can list your write contexts and their regex rules as follows:
- Python
- CLI
import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
amr.list_write_context()
Which returns
[{'name': 'default',
'summary': 'Structured data',
'description': '',
'config': {'required_hooks': []},
'imported': False,
'source_domain_id': None,
'source_domain_name': None},
{'name': 'my_wctx',
'summary': 'A description of your write context for documentation',
'description': '',
'config': {'required_hooks': [{'hook': 'regex-classifier',
'constraint': '>1.0.0',
'mode': 'sync'}]},
'imported': False,
'source_domain_id': None,
'source_domain_name': None},
{'name': 'sensitive',
'summary': 'Structured data',
'description': '',
'config': {'required_hooks': [{'hook': 'fast-pii',
'constraint': '>1.0.0',
'mode': 'sync'}]},
'imported': False,
'source_domain_id': None,
'source_domain_name': None}]
For regex rules, it is:
import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
amr.list_write_context_regex_rules("my_wctx")
Which, for the above example, returns:
[{'id': 'rl-b5l2a537ysm4itrw',
'pattern': 'my.*pattern',
'match_on_key': False,
'span_tags': [{'name': 'mycompany.com/sensitive',
'value': '',
'type': <TagTypeField.UNARY: 'unary'>}],
'capsule_tags': [{'name': 'mycompany.com/tagname',
'value': '',
'type': <TagTypeField.UNARY: 'unary'>},
{'name': 'mycompany.com/tagwithvalue',
'value': 'somevalue',
'type': <TagTypeField.STRING: 'string'>}]}]
am write-context list
Which returns
writeContexts:
- name: default
summary: Default write context
description: No classification of encapsulated data
config:
keyReuseTTL: 0
requiredHooks: []
imported: false
- name: my_wctx
summary: A description of your write context for documentation
description: ''
config:
keyReuseTTL: 0
requiredHooks:
- hook: regex-classifier
constraint: '>1.0.0'
mode: sync
imported: false
- name: sensitive
summary: Default write context (sensitive data)
description: Classifies data using the fast-pii and data structure classifiers
config:
keyReuseTTL: 0
requiredHooks:
- hook: data-structure-classifier
constraint: '>1.0.0'
mode: sync
- hook: fast-pii
constraint: '>1.0.0'
mode: sync
imported: false
For regex rules, it is:
am write-context classifier-rule list --name my_wctx
Which, for the above example, returns:
- id: rl-xvp5gg661st8ga64
comment: ''
spanTags:
- name: mycompany.com/sensitive
value: ''
type: unary
capsuleTags:
- name: mycompany.com/tagname
value: ''
type: unary
- name: mycompany.com/tagwithvalue
value: somevalue
type: string
regexConfig:
pattern: my.*pattern
matchOnKey: false
Controlling who can use a write context
To control who is allowed to use a write context to create capsules, you need to create Domain Policy Rules.
Key reuse
By default, every encapsulation of data creates a new capsule, encrypted with a unique data encryption key, obtained from the Antimatter API. This is the ideal pattern in most cases, but when working with large amounts of small data that all has the same tags, you may want to treat multiple pieces of data as one capsule, encrypted with the same key. This means that only one entry gets created in the capsule manifest, and only one round trip to the Antimatter API occurs.
There is an option in the write context configuration, called "Key Reuse TTL" that defines a window of time (in seconds) where encapsulation of data with the same manually provided capsule tags and the same write context will be treated as shards of the same capsule.