Skip to main content

Capsules

Antimatter works by encrypting data and placing it in an object format we call a capsule. This is a versatile format that is designed to support tabular data as well as dictionary/map type data (think JSON object) or a simple stream of unicode text. Binary data is also supported, but you cannot invoke any classification hooks on non-unicode data.

Having a tabular-native format allows you to, for example, reference the structure of the data in a policy, or have a policy rule that removes an entire row from the output if something is seen in one particular column of that row. It's a good way of grouping related information into records. Don't worry about making your data perfectly adhere to the rules of relational table schemas, though. It's perfectly acceptable to embed a bunch of unstructured data in a column. We support unbounded-size cells and streaming at every layer of the stack, so you can have a 1TB cell in your table if you want.

The language libraries include functionality for transparently converting from language-level objects (e.g. a Pandas dataframe in Python) to capsules and back, preserving types. The goal is for the data that comes out of a capsule to look as much like the data you put into it as possible.

The lifecycle of a capsule

Encapsulation

Using one of our Language libraries or Integrations, you will encapsulate some data, passing in the name of a write context that will be used. The write context will determine which, if any, classification hooks will run on the data as it is encrypted and placed in the capsule. At the point of encapsulation you can also pass in any custom tags you may have that apply to either the entire capsule (capsule tags) or apply to pieces of data within the whole encapsulated dataset (row, column, or span tags).

The result of the encapsulation is a stream or blob of encrypted capsule bytes. This is safe to store anywhere: in a file, in an object in S3, in a cell in a database, etc.

Open

To access the data in the capsule, you will first open the capsule. In some integrations, this is a transparent process, but in the language libraries you usually do this explicitly, passing in a read context. This read context captures "why" you are reading the data and therefore also what kind of processing needs to happen as the data is read. Common read contexts might be things like "model_training", "rag_retrieval" or "customer_support".

If the open is permitted by policy (see Policy), then the encryption key is returned to the client, and the data can be read.

Read

When you read a capsule, you can pass in additional read parameters that describe any additional information that needs to be referenced by the policy. This is commonly used when the identity that is authenticated with Antimatter is a service account that is performing the reads on behalf of end users. The read parameters allow the app to convey who the end user is, and any additional information about them (e.g. if the application has a notion of teams or groups, that can be captured in the read parameters). Read parameters are also included in the capsule access log, so you can include information in the read parameters that is not referenced by policy but desired in the audit trail.

The result of the read is plaintext data that has had any required redactions and transformations applied.

Capsule manifest

When a capsule is created, an entry in the capsule manifest is created. This is a list of all capsules in a domain, and information about them (such as their size and tags). You can view the capsule manifest using the web UI or the language libraries:

import antimatter as am
amr = am.from_api_key(domain_id="dm-xxxxxxxx", api_key="xxxxxxxxx")

list(amr.list_capsules())[:6]

This returns, for example:

[{'id': 'ca-FNCepje2sjvyowyQnArX45',
'domain': 'dm-dZdLzWR9Go1',
'capsule_tags': [],
'span_tags': {'unique_tags': [{'tag': {'name': 'tag.antimatter.io/pii/first_name',
'value': '',
'type': <TagTypeField.STRING: 'string'>,
'source': 'fast',
'hook_version': '1.0.0'},
'occurrences': 1}],
'elided_tags': []},
'size': 145,
'created': datetime.datetime(2024, 3, 23, 19, 2, 2, tzinfo=TzInfo(UTC))},
{'id': 'ca-1wwmZjPfwiZuY9X1BDhUhv',
'domain': 'dm-dZdLzWR9Go1',
'capsule_tags': [],
'span_tags': {'unique_tags': [], 'elided_tags': []},
'size': 224,
'created': datetime.datetime(2024, 3, 23, 19, 2, 2, tzinfo=TzInfo(UTC))},
{'id': 'ca-1DDcAaFJzJ5a8c4uyzQ69M',
'domain': 'dm-dZdLzWR9Go1',
'capsule_tags': [],
'span_tags': {'unique_tags': [{'tag': {'name': 'mycompany.com/companyid',
'value': 'company1',
'type': <TagTypeField.STRING: 'string'>,
'source': 'manual',
'hook_version': '0.0.0'},
'occurrences': 2},
{'tag': {'name': 'mycompany.com/companyid',
'value': 'company2',
'type': <TagTypeField.STRING: 'string'>,
'source': 'manual',
'hook_version': '0.0.0'},
'occurrences': 1},
{'tag': {'name': 'mycompany.com/companyid',
'value': 'company3',
'type': <TagTypeField.STRING: 'string'>,
'source': 'manual',
'hook_version': '0.0.0'},
'occurrences': 2}],
'elided_tags': []},
'size': 1980,
'created': datetime.datetime(2024, 3, 23, 19, 1, 47, tzinfo=TzInfo(UTC))},
{'id': 'ca-8RysvxquSyiWgZH2YEhbqg',
'domain': 'dm-dZdLzWR9Go1',
'capsule_tags': [],
'span_tags': {'unique_tags': [], 'elided_tags': []},
'size': 679,
'created': datetime.datetime(2024, 3, 23, 19, 1, 47, tzinfo=TzInfo(UTC))}]

See the language library reference for how to query for capsules by tag and date.

Bundles

A capsule exists in a single domain, and is encrypted using the configuration of that domain. If you are encapsulating data that you wish to partition across multiple domains, then you can use a bundle. A bundle can be thought of as multiple capsules stored together in one object. This lets you, for example, encapsulate a multi-tenant dataset into a single object but still have each tenant's data encrypted with its own key, in its own domain. Thus, a bundle is a cross-domain object. You will still open and read bundles in the context of one domain, however. See the Peering section for more information about how this works.