Write Contexts

When data is encapsulated using Antimatter, a write context is specified. This determines what hooks run on that data, e.g. for classification or for converting the structure of the data into tags that can be referenced by policy.

You can see what hooks are available with the list hooks endpoint. This will show you what hooks are available, and some indication of what kinds of tags those hooks will emit (although, as we will see later, this is not applicable for the regex and data structure classifiers):

Python
CLI

import antimatter as am

amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
# list the available hooks
amr.list_hooks()

This returns

[{'name': 'accurate-pii',
  'version': '1.0.0',
  'summary': 'An accurate classifier',
  'description': 'An accurate PII classifier with a latency of ~1s',
  'output_span_tags': ['tag.antimatter.io/pii/id',
   'tag.antimatter.io/pii/date',
   'tag.antimatter.io/pii/date_of_birth',
   'tag.antimatter.io/pii/credit_card',
   'tag.antimatter.io/pii/email_address',
   'tag.antimatter.io/pii/ip_address',
   'tag.antimatter.io/pii/location',
   'tag.antimatter.io/pii/first_name',
   'tag.antimatter.io/pii/last_name',
   'tag.antimatter.io/pii/phone_number',
   'tag.antimatter.io/pii/ssn',
   'tag.antimatter.io/pii/driver_license',
   'tag.antimatter.io/pii/age',
   'tag.antimatter.io/pii/user_id'],
  'output_capsule_tags': []},
 {'name': 'data-structure-classifier',
  'version': '1.0.0',
  'summary': 'Data structure classifier',
  'description': 'Internal hook to enable tag generation from the capsule data structure.',
  'output_span_tags': [],
  'output_capsule_tags': []},
 {'name': 'fast-pii',
  'version': '1.0.0',
  'summary': 'A fast PII classifier',
  'description': 'A fast PII classifier with a latency of 10-30 ms, but does not perform as well as the accurate classifier',
  'output_span_tags': ['tag.antimatter.io/pii/credit_card',
   'tag.antimatter.io/pii/date',
   'tag.antimatter.io/pii/email_address',
   'tag.antimatter.io/pii/ip_address',
   'tag.antimatter.io/pii/location',
   'tag.antimatter.io/pii/first_name',
   'tag.antimatter.io/pii/phone_number',
   'tag.antimatter.io/pii/ssn',
   'tag.antimatter.io/pii/itin'],
  'output_capsule_tags': []},
 {'name': 'regex-classifier',
  'version': '1.0.0',
  'summary': 'regular expression classifier',
  'description': 'Internal hook to enable tag generation based on regular expression matches.',
  'output_span_tags': [],
  'output_capsule_tags': []}]

am domain hooks

This returns

hooks:
- name: accurate-pii
  version: 1.0.0
  summary: Accurate PII classifier
  description: An accurate PII classifier with a latency of ~1s
  outputSpanTags:
  - tag.antimatter.io/pii/id
  - tag.antimatter.io/pii/date
  - tag.antimatter.io/pii/credit_card
  - tag.antimatter.io/pii/email_address
  - tag.antimatter.io/pii/ip_address
  - tag.antimatter.io/pii/location
  - tag.antimatter.io/pii/phone_number
  - tag.antimatter.io/pii/ssn
  - tag.antimatter.io/pii/driver_license
  - tag.antimatter.io/pii/age
  - tag.antimatter.io/pii/name
  - tag.antimatter.io/pii/secret
  outputCapsuleTags: []
- name: data-structure-classifier
  version: 1.0.0
  summary: Data Structure classifier
  description: A classifier that tags data based on its structure (e.g. column names or object keys)
  outputSpanTags: []
  outputCapsuleTags: []
- name: fast-pii
  version: 1.0.0
  summary: Fast PII classifier
  description: A fast PII classifier with a latency of 10-30 ms, but does not perform as well as the accurate classifier
  outputSpanTags:
  - tag.antimatter.io/pii/credit_card
  - tag.antimatter.io/pii/date
  - tag.antimatter.io/pii/email_address
  - tag.antimatter.io/pii/ip_address
  - tag.antimatter.io/pii/location
  - tag.antimatter.io/pii/phone_number
  - tag.antimatter.io/pii/ssn
  - tag.antimatter.io/pii/name
  - tag.antimatter.io/pii/driver_license
  - tag.antimatter.io/pii/id
  - tag.antimatter.io/pii/age
  outputCapsuleTags: []
- name: llm-classifier
  version: 1.0.0
  summary: custom llm classifier
  description: Internal hook to enable tag generation based on classification performed by an LLM according to user-supplied classification rules.
  outputSpanTags: []
  outputCapsuleTags: []
- name: regex-classifier
  version: 1.0.0
  summary: Regular Expression classifier
  description: A classifier that tags data if it matches regular expression rules configured in the write context
  outputSpanTags: []
  outputCapsuleTags: []

The default write contexts

There are two write contexts that are present by default in a newly created domain:

default is a write context with no classifiers, appropriate when you just need to encrypt data but are not using any read-context policy rules that reference tags
sensitive is a write context that uses the fast-pii and data-structure-classifier classifiers

You can edit these write contexts, or create your own.

Creating a write context

To create a write context, you can use the language bindings or the CLI. It needs a name and description, along with the list of hooks to be run. It is recommended that your write context specifies all the hooks that will be required by your read contexts later, as classification on the write path is more efficient than just-in-time classification on the read path, which is what occurs whenever a read context specifies a required hook that was not run when the capsule was encapsulated.

Python
CLI

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

amr.add_write_context(
    # The name and description of the write context
    "my_wctx", am.WriteContextBuilder().
        set_summary("A description of your write context for documentation").
        # Which hooks need to run on the data. What version of hook should run, and if it should run synchronously or asynchronously
        add_hook("data-structure-classifier", ">1.0.0", am.WriteContextHookMode.Sync).
        add_hook("accurate-pii", ">1.0.0", am.WriteContextHookMode.Sync)
)

am write-context create \
    --name my_wctx \
    --summary "A description of your write context for documentation" \
    --hook "data-structure-classifier" \
    --hook "accurate-pii"

warning

In the current release, only Sync is permitted as a hook mode, which means the classification runs as the data is encapsulated. In a future release we will permit data classification to occur in the background, with the resulting tags being stored detached from the capsule and transparently merged at read time.

Regex rules

If you use the regex-classifier, then you also need to configure some rules in your write context that capture the regex patterns you want to match against, and what tags to apply when they match. A regex rule can apply both capsule tags (which apply to the whole capsule) and span tags which apply only to the data matched by the pattern. You can also specify that the pattern should be matched against the key rather than the data itself (i.e. the column name or field name of the data). If emitting span tags from a rule that is matching against the key, the span will be set to the whole value. Here is a complete example that shows the regex classifier in action:

Python
CLI

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

# Configure a write context to use the regex classifier
amr.add_write_context(
    # The name and description of the write context
    "my_wctx", am.WriteContextBuilder().
    set_summary("A description of your write context for documentation").
    add_hook("regex-classifier", ">1.0.0", am.WriteContextHookMode.Sync)
)
amr.insert_write_context_regex_rule("my_wctx", am.WriteContextRegexRuleBuilder("my.*pattern", match_on_key=False)
    # add tags to the whole capsule whenever this pattern matches
    .add_capsule_tag("mycompany.com/tagname")
    .add_capsule_tag("mycompany.com/tagwithvalue", am.TagType.String, "somevalue")
    # add a span tag to the data that actually matched the pattern
    .add_span_tag("mycompany.com/sensitive")
)

# Add a read context that uses the sensitive tag
amr.add_read_context(
    "no_sensitive", am.ReadContextBuilder().
    set_summary("Redact everything tagged with our custom sensitive tag")
)
# you can remove all rules from a read context like this, which helps when experimenting with different rules:
# amr.delete_read_context_rules("no_sensitive")

# Add a new rule
amr.add_read_context_rules("no_sensitive", am.ReadContextRuleBuilder().
    add_match_expression(am.Source.Tags,
        key="mycompany.com/sensitive",
        operator=am.Operator.Exists).
   set_action(am.Action.Redact)
)

# Encapsulate a sentence and read it back:
cap = amr.encapsulate("this sentence contains my secret pattern", write_context="my_wctx")
amr.load_capsule(data=cap, read_context="no_sensitive").data()

This returns:

'this sentence contains {redacted}'

# Configure a write context to use the regex classifier
am write-context create \
    --name my_wctx \
    --summary "A description of your write context for documentation" \
    --hook "regex-classifier(>1.0.0, sync)"

am write-context classifier-rule create regex \
    --name my_wctx \
    --pattern "my.*pattern" \
    --capsule-tag "mycompany.com/tagname(unary)" \
    --capsule-tag "mycompany.com/tagwithvalue(string, 'somevalue')" \
    --span-tag "mycompany.com/sensitive(unary)"

# Add a read context that uses the sensitive tag
am read-context create \
  --name no_sensitive \
  --summary "Redact everything tagged with our custom sensitive tag" 

# Add a data policy
am data-policy create \
  --name no_sensitive \
  --description "Redact spans tagged with our custom sensitive tag"
  
# Get the policy ID
POLICY_ID=$(am data-policy list | yq '.policies[] | select(.name == "no_sensitive") | .id')
  
# Add a rule to the data policy to redact matching spans
am data-policy rule create \
  --policy-id ${POLICY_ID} \
  --comment "Redact all spans with our custom sensitive tag" \
  --effect Redact \
  --priority 0 \
  --clause '{"operator": "AllOf", "tags": [{"name": "mycompany.com/sensitive", "operator": "Exists"}]}'

# Bind the policy to the no_sensitive read context
am data-policy binding set-read-context-attachment \
  --policy-id ${POLICY_ID} \
  --read-context-id no_sensitive \
  --attachment Attached

# Encapsulate a sentence, writing the capsule to a file, then read it back
am capsule encapsulate --write-context my_wctx --file file.cap <<<"this sentence contains my secret pattern"
am capsule open --read-context no_sensitive --in file.cap

This returns:

this sentence contains {redacted}

Note that classification can optionally be skipped when creating a capsule. This will not generate any capsule tags, and therefore there will be no data redaction. This can be done using the --skip-classification flag as shown below:

am capsule encapsulate --write-context my_wctx --skip-classification --file file.cap <<<"this sentence contains my secret pattern"

You can list your write contexts and their regex rules as follows:

Python
CLI

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

amr.list_write_context()

Which returns

[{'name': 'default',
  'summary': 'Structured data',
  'description': '',
  'config': {'required_hooks': []},
  'imported': False,
  'source_domain_id': None,
  'source_domain_name': None},
 {'name': 'my_wctx',
  'summary': 'A description of your write context for documentation',
  'description': '',
  'config': {'required_hooks': [{'hook': 'regex-classifier',
     'constraint': '>1.0.0',
     'mode': 'sync'}]},
  'imported': False,
  'source_domain_id': None,
  'source_domain_name': None},
 {'name': 'sensitive',
  'summary': 'Structured data',
  'description': '',
  'config': {'required_hooks': [{'hook': 'fast-pii',
     'constraint': '>1.0.0',
     'mode': 'sync'}]},
  'imported': False,
  'source_domain_id': None,
  'source_domain_name': None}]

For regex rules, it is:

import antimatter as am
amr = am.Session.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")
amr.list_write_context_regex_rules("my_wctx")

Which, for the above example, returns:

[{'id': 'rl-b5l2a537ysm4itrw',
  'pattern': 'my.*pattern',
  'match_on_key': False,
  'span_tags': [{'name': 'mycompany.com/sensitive',
    'value': '',
    'type': <TagTypeField.UNARY: 'unary'>}],
  'capsule_tags': [{'name': 'mycompany.com/tagname',
    'value': '',
    'type': <TagTypeField.UNARY: 'unary'>},
   {'name': 'mycompany.com/tagwithvalue',
    'value': 'somevalue',
    'type': <TagTypeField.STRING: 'string'>}]}]

am write-context list

Which returns

writeContexts:
- name: default
  summary: Default write context
  description: No classification of encapsulated data
  config:
    keyReuseTTL: 0
    requiredHooks: []
  imported: false
- name: my_wctx
  summary: A description of your write context for documentation
  description: ''
  config:
    keyReuseTTL: 0
    requiredHooks:
    - hook: regex-classifier
      constraint: '>1.0.0'
      mode: sync
  imported: false
- name: sensitive
  summary: Default write context (sensitive data)
  description: Classifies data using the fast-pii and data structure classifiers
  config:
    keyReuseTTL: 0
    requiredHooks:
    - hook: data-structure-classifier
      constraint: '>1.0.0'
      mode: sync
    - hook: fast-pii
      constraint: '>1.0.0'
      mode: sync
  imported: false

For regex rules, it is:

am write-context classifier-rule list --name my_wctx

Which, for the above example, returns:

- id: rl-xvp5gg661st8ga64
  comment: ''
  spanTags:
  - name: mycompany.com/sensitive
    value: ''
    type: unary
  capsuleTags:
  - name: mycompany.com/tagname
    value: ''
    type: unary
  - name: mycompany.com/tagwithvalue
    value: somevalue
    type: string
  regexConfig:
    pattern: my.*pattern
    matchOnKey: false

Controlling who can use a write context

To control who is allowed to use a write context to create capsules, you need to create Domain Policy Rules.

Key reuse

By default, every encapsulation of data creates a new capsule, encrypted with a unique data encryption key, obtained from the Antimatter API. This is the ideal pattern in most cases, but when working with large amounts of small data that all has the same tags, you may want to treat multiple pieces of data as one capsule, encrypted with the same key. This means that only one entry gets created in the capsule manifest, and only one round trip to the Antimatter API occurs.

There is an option in the write context configuration, called "Key Reuse TTL" that defines a window of time (in seconds) where encapsulation of data with the same manually provided capsule tags and the same write context will be treated as shards of the same capsule.

The default write contexts​

Creating a write context​

Regex rules​

Controlling who can use a write context​

Key reuse​

The default write contexts

Creating a write context

Regex rules

Controlling who can use a write context

Key reuse