Skip to main content

Classification and Redaction

To get started with Antimatter, we'll explore a common, yet challenging, use case: detecting and redacting personally identifiable information (PII) in semi-structured and unstructured data sets. This tutorial will guide you through the process step-by-step, demonstrating how Antimatter's tools can help secure sensitive data.

This quickstart uses our Python library and assumes you are working in a notebook. To install the dependencies this demo requires:

pip install antimatter==1.1.0 pandas pyarrow

You can also run this example on Google Colab, or Binder

First thing we do is create an Antimatter domain:

import antimatter
amr = antimatter.new_domain("my@email.com")

This will have sent a confirmation email to your email address (it can take a few minutes). Click the button in the email to activate your domain before proceeding.

We can print the details of the domain we just created. Save these, you can use them to log in to the domain with the CLI or to use the python library with an existing domain:

# Print domain details
amr.config()
# To interact with an existing domain:
# amr = antimatter.Session.from_api_key(domain_id="dm-xxxxxxxxxxx", api_key="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

Now that we have an Antimatter domain, let's see how we can use Antimatter to classify some sensitive data and redact it. Let's load a parquet file that contains a mix of structured data and embedded unstructured data (a comments column)

df = pandas.read_parquet("https://get.antimatter.io/data/example_data.parquet")
df

You should get output like this:

|    |   id | first_name   | last_name   | email                    | gender   | ip_address     | cc               | country                | birthdate   | title                    | comments                                                                                                                                                                                                                                                                                |
|---:|-----:|:-------------|:------------|:-------------------------|:---------|:---------------|:-----------------|:-----------------------|:------------|:-------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 1 | Amanda | Jordan | ajordan0@com.com | Female | 1.197.201.2 | 6759521864920116 | Indonesia | 3/8/1971 | Internal Auditor | Hello friends, my name is Alice Johnson and I just turned 29 years old! ๐ŸŽ‰ I am looking forward to connecting with all of you. Feel free to drop me a line at alice.johnson@gmail.com or call me at 415-123-4567. |
| 1 | 2 | Albert | Freeman | afreeman1@is.gd | Male | 218.111.175.34 | | Canada | 1/16/1968 | Accountant IV | Customer feedback: I recently visited your store at 5678 Pine Avenue, Dallas, TX 75201. My name is Jane Doe, age 43. I had a wonderful experience and the staff was very friendly. You can reach out to me at janedoe@yahoo.com for any further details. |
| 2 | 3 | Evelyn | Morgan | emorgan2@altervista.org | Female | 7.161.136.94 | 6767119071901597 | Russia | 2/1/1960 | Structural Engineer | Booking Confirmation: Thank you, David Smith (DOB: 01/12/1978) for booking with us. We have received your payment through the credit card ending with 1234. Your booking ID is #67890. Please save this email for your records. For any queries, contact us at david.smith@hotmail.com. |
| 3 | 4 | Denise | Riley | driley3@gmpg.org | Female | 140.35.109.83 | 3576031598965625 | China | 4/8/1997 | Senior Cost Accountant | Hi, I am Emily Brown, aged 33, and I recently moved to 123 Harmony Lane, Los Angeles, CA 90001. I am looking to make new friends in the neighborhood. Feel free to call me at 323-987-6543 or email me at emilybrown@aol.com. |
| 4 | 5 | Carlos | Burns | cburns4@miitbeian.gov.cn | | 169.113.235.40 | 5602256255204850 | South Africa | | | Urgent: My name is Sarah Lee, my SSN is 512-34-6789. I noticed some unauthorized transactions on my credit card number ending in 5678. I am 39 years old, and I urgently need assistance with this. Please contact me at 213-123-9876 or sarahlee@gmail.com. |
| 5 | 6 | Kathryn | White | kwhite5@google.com | Female | 195.131.81.179 | 3583136326049310 | Indonesia | 2/25/1983 | Account Executive | Hello, I'm Mark Thompson. Iโ€™m 36 years old, residing at 3456 Elm Street, Austin, TX 78701. If anyone nearby wants to connect, feel free to email me at mark.thompson@yahoo.com or call 512-345-6789. |
| 6 | 7 | Samuel | Holmes | sholmes6@foxnews.com | Male | 232.234.81.197 | 3582641366974690 | Portugal | 12/18/1987 | Senior Financial Analyst | Hi, my name is Michael Martinez, I am 40 years old, and my SSN is 543-21-6789. Please contact me regarding my account details at 415-234-5678 or michael.martinez@hotmail.com. |
| 7 | 8 | Harry | Howell | hhowell7@eepurl.com | Male | 91.235.51.73 | | Bosnia and Herzegovina | 3/1/1962 | Web Developer IV | Customer Feedback: I'm Linda White, 32 years old. I had a great experience shopping online at your store. Reach me at 456 Elm Street, Phoenix, AZ 85001 or linda.white@gmail.com for further feedback. |
| 8 | 9 | Jose | Foster | jfoster8@yelp.com | Male | 132.31.53.61 | | South Korea | 3/27/1992 | Software Test Engineer I | Hey, itโ€™s Lisa Davis, I am 28 years old. I noticed a discrepancy in my latest bill. My address is 789 Pine Street, Miami, FL 33101. Please, get in touch at lisa.davis@aol.com or 305-123-4567. |
| 9 | 10 | Emily | Stewart | estewart9@opensource.org | Female | 143.28.251.245 | 3574254110301671 | Nigeria | 1/28/1997 | Health Coach IV | Support Request: My name is Joseph Johnson. I am facing issues with my recent purchase. Reach me at 123-45-6789 or at joseph.johnson@hotmail.com for order number #56789 details. |

You can see that we have a bunch of data here that would probably be considered sensitive in some contexts. Some of it is in clearly labelled columns, which might be easy to deal with manually, but some of it is embedded within free-form text in the comments column. That's pretty common whenever you are storing data coming from users: they can (and often do) enter sensitive info that needs special handling.

Let's encapsulate this data. A capsule is Antimatter's object format for tagged data. It stores the full set of data, as well as all the classification tags. It's encrypted, so can be stored anywhere without worrying that sensitive data might be accessible to those who can access the object. Having an intermediate file format lets you do the classification once, and then re-use multiple times. This is convenient, because classification is often fairly heavyweight.

When we encapsulate, we have to specify a write_context that contains the configuration for which classifiers to run on the data. New domains come with a default write context that does no classification, and a sensitive write context that uses the fast-pii classifier to tag common PII. You can change the configuration of these write contexts or add new ones as needed, but we're going to just use sensitive for now.

capsule = amr.encapsulate(df, write_context="sensitive")

Once the classification is done, we can write the capsule to a file, or just use it directly to read the data. When reading data, you need to specify a read_context which contains the configuration for what redaction and transformation should occur on the data. New domains come with a default read context which does no redaction, but we will add some rules to it in a bit

capsule.save("mycapsule.ca")
data = amr.load_capsule(data=capsule, read_context="default").data()
data

You should see the same data, and you'll notice that we got back a Pandas dataframe. Antimatter stores some information about the shape of the data during encapsulation, and automatically presents the data in the same form when reading (in this case, a Pandas dataframe). This lets you insert Antimatter into your data pipeline without impacting any of the operations that happen after the read. You can use .data_as() instead of .data() if you'd like to read the data in a different format.

So now, let's do some redaction. This is achieved by adding a rule to the read context. Rules can be fairly complex, to deal with advanced cases like reproducing permissions that existed in the original source of data, but we're going to make a simple one that just references a Tag and redacts if it exists:

amr.add_read_context_rules('default', antimatter.ReadContextRuleBuilder()
.add_match_expression(antimatter.Source.Tags,
key="tag.antimatter.io/pii/name",
operator=antimatter.Operator.Exists)
.set_action(antimatter.Action.Redact))

Now, if we read the data again, we'll see that names have been redacted in both the name columns, but also in the comments:

|    |   id | first_name   | last_name   | email                    | gender   | ip_address     | cc               | country                | birthdate   | title                    | comments                                                                                                                                                                                                                                                                               |
|---:|-----:|:-------------|:------------|:-------------------------|:---------|:---------------|:-----------------|:-----------------------|:------------|:-------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 1 | {redacted} | {redacted} | ajordan0@com.com | Female | 1.197.201.2 | 6759521864920116 | Indonesia | 3/8/1971 | Internal Auditor | Hello friends, my name is {redacted} and I just turned 29 years old! ๐ŸŽ‰ I am looking forward to connecting with all of you. Feel free to drop me a line at alice.johnson@gmail.com or call me at 415-123-4567. |
| 1 | 2 | {redacted} | {redacted} | afreeman1@is.gd | Male | 218.111.175.34 | | Canada | 1/16/1968 | Accountant IV | Customer feedback: I recently visited your store at 5678 Pine Avenue, Dallas, TX 75201. My name is {redacted}, age 43. I had a wonderful experience and the staff was very friendly. You can reach out to me at janedoe@yahoo.com for any further details. |
| 2 | 3 | {redacted} | {redacted} | emorgan2@altervista.org | Female | 7.161.136.94 | 6767119071901597 | Russia | 2/1/1960 | Structural Engineer | Booking Confirmation: Thank you, {redacted} (DOB: 01/12/1978) for booking with us. We have received your payment through the credit card ending with 1234. Your booking ID is #67890. Please save this email for your records. For any queries, contact us at david.smith@hotmail.com. |
| 3 | 4 | {redacted} | {redacted} | driley3@gmpg.org | Female | 140.35.109.83 | 3576031598965625 | China | 4/8/1997 | Senior Cost Accountant | Hi, I am {redacted}, aged 33, and I recently moved to 123 Harmony Lane, Los Angeles, CA 90001. I am looking to make new friends in the neighborhood. Feel free to call me at 323-987-6543 or email me at emilybrown@aol.com. |
| 4 | 5 | {redacted} | {redacted} | cburns4@miitbeian.gov.cn | | 169.113.235.40 | 5602256255204850 | South Africa | | | Urgent: My name is {redacted}, my SSN is 512-34-6789. I noticed some unauthorized transactions on my credit card number ending in 5678. I am 39 years old, and I urgently need assistance with this. Please contact me at 213-123-9876 or sarahlee@gmail.com. |
| 5 | 6 | {redacted} | {redacted} | kwhite5@google.com | Female | 195.131.81.179 | 3583136326049310 | Indonesia | 2/25/1983 | Account Executive | Hello, I'm {redacted}. Iโ€™m 36 years old, residing at 3456 Elm Street, Austin, TX 78701. If anyone nearby wants to connect, feel free to email me at mark.thompson@yahoo.com or call 512-345-6789. |
| 6 | 7 | {redacted} | {redacted} | sholmes6@foxnews.com | Male | 232.234.81.197 | 3582641366974690 | Portugal | 12/18/1987 | Senior Financial Analyst | Hi, my name is {redacted}, I am 40 years old, and my SSN is 543-21-6789. Please contact me regarding my account details at 415-234-5678 or michael.martinez@hotmail.com. |
| 7 | 8 | {redacted} | {redacted} | hhowell7@eepurl.com | Male | 91.235.51.73 | | Bosnia and Herzegovina | 3/1/1962 | Web Developer IV | Customer Feedback: I'm {redacted}, 32 years old. I had a great experience shopping online at your store. Reach me at 456 Elm Street, Phoenix, AZ 85001 or linda.white@gmail.com for further feedback. |
| 8 | 9 | {redacted} | {redacted} | jfoster8@yelp.com | Male | 132.31.53.61 | | South Korea | 3/27/1992 | Software Test Engineer I | Hey, itโ€™s {redacted}, I am 28 years old. I noticed a discrepancy in my latest bill. My address is 789 Pine Street, Miami, FL 33101. Please, get in touch at lisa.davis@aol.com or 305-123-4567. |
| 9 | 10 | {redacted} | {redacted} | estewart9@opensource.org | Female | 143.28.251.245 | 3574254110301671 | Nigeria | 1/28/1997 | Health Coach {redacted} | Support Request: My name is {redacted}. I am facing issues with my recent purchase. Reach me at 123-45-6789 or at joseph.johnson@hotmail.com for order number #56789 details. |

We added the rule to the default read context, but the purpose of read contexts is to be able to capture policy about what data can be used in which conditions. So you might have different read contexts for use cases (e.g. model training) or for different teams (e.g. fraud). Let's make a new read context for analytics and configure some rules to redact more of the PII in this dataset:

# add a new read context
amr.add_read_context('analytics', antimatter.ReadContextBuilder()
.set_summary("redacts data for use in analytics"))

# set some rules
amr.add_read_context_rules('analytics',antimatter.ReadContextRuleBuilder()
.add_match_expression(antimatter.Source.Tags,
key="tag.antimatter.io/pii/name",
operator=antimatter.Operator.Exists)
.set_action(antimatter.Action.Redact))
amr.add_read_context_rules('analytics',antimatter.ReadContextRuleBuilder()
.add_match_expression(antimatter.Source.Tags,
key="tag.antimatter.io/pii/email_address",
operator=antimatter.Operator.Exists)
.set_action(antimatter.Action.Redact))
amr.add_read_context_rules('analytics',antimatter.ReadContextRuleBuilder()
.add_match_expression(antimatter.Source.Tags,
key="tag.antimatter.io/pii/ssn",
operator=antimatter.Operator.Exists)
.set_action(antimatter.Action.Redact))

Now, if we load the capsule again, we'll see that more of the PII has been redacted

amr.load_capsule(data=capsule, read_context="analytics").data()

Which prints:

|    |   id | first_name   | last_name   | email      | gender   | ip_address     | cc               | country                | birthdate   | title                    | comments                                                                                                                                                                                                                                                                  |
|---:|-----:|:-------------|:------------|:-----------|:---------|:---------------|:-----------------|:-----------------------|:------------|:-------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 1 | {redacted} | {redacted} | {redacted} | Female | 1.197.201.2 | 6759521864920116 | Indonesia | 3/8/1971 | Internal Auditor | Hello friends, my name is {redacted} and I just turned 29 years old! ๐ŸŽ‰ I am looking forward to connecting with all of you. Feel free to drop me a line at {redacted} or call me at 415-123-4567. |
| 1 | 2 | {redacted} | {redacted} | {redacted} | Male | 218.111.175.34 | | Canada | 1/16/1968 | Accountant IV | Customer feedback: I recently visited your store at 5678 Pine Avenue, Dallas, TX 75201. My name is {redacted}, age 43. I had a wonderful experience and the staff was very friendly. You can reach out to me at {redacted} for any further details. |
| 2 | 3 | {redacted} | {redacted} | {redacted} | Female | 7.161.136.94 | 6767119071901597 | Russia | 2/1/1960 | Structural Engineer | Booking Confirmation: Thank you, {redacted} (DOB: 01/12/1978) for booking with us. We have received your payment through the credit card ending with 1234. Your booking ID is #67890. Please save this email for your records. For any queries, contact us at {redacted}. |
| 3 | 4 | {redacted} | {redacted} | {redacted} | Female | 140.35.109.83 | 3576031598965625 | China | 4/8/1997 | Senior Cost Accountant | Hi, I am {redacted}, aged 33, and I recently moved to 123 Harmony Lane, Los Angeles, CA 90001. I am looking to make new friends in the neighborhood. Feel free to call me at 323-987-6543 or email me at {redacted}. |
| 4 | 5 | {redacted} | {redacted} | {redacted} | | 169.113.235.40 | 5602256255204850 | South Africa | | | Urgent: My name is {redacted}, my SSN is {redacted}. I noticed some unauthorized transactions on my credit card number ending in 5678. I am 39 years old, and I urgently need assistance with this. Please contact me at 213-123-9876 or {redacted}. |
| 5 | 6 | {redacted} | {redacted} | {redacted} | Female | 195.131.81.179 | 3583136326049310 | Indonesia | 2/25/1983 | Account Executive | Hello, I'm {redacted}. Iโ€™m 36 years old, residing at 3456 Elm Street, Austin, TX 78701. If anyone nearby wants to connect, feel free to email me at {redacted} or call 512-345-6789. |
| 6 | 7 | {redacted} | {redacted} | {redacted} | Male | 232.234.81.197 | 3582641366974690 | Portugal | 12/18/1987 | Senior Financial Analyst | Hi, my name is {redacted}, I am 40 years old, and my SSN is {redacted}. Please contact me regarding my account details at 415-234-5678 or {redacted}. |
| 7 | 8 | {redacted} | {redacted} | {redacted} | Male | 91.235.51.73 | | Bosnia and Herzegovina | 3/1/1962 | Web Developer IV | Customer Feedback: I'm {redacted}, 32 years old. I had a great experience shopping online at your store. Reach me at 456 Elm Street, Phoenix, AZ 85001 or {redacted} for further feedback. |
| 8 | 9 | {redacted} | {redacted} | {redacted} | Male | 132.31.53.61 | | South Korea | 3/27/1992 | Software Test Engineer I | Hey, itโ€™s {redacted}, I am 28 years old. I noticed a discrepancy in my latest bill. My address is 789 Pine Street, Miami, FL 33101. Please, get in touch at {redacted} or 305-123-4567. |
| 9 | 10 | {redacted} | {redacted} | {redacted} | Female | 143.28.251.245 | 3574254110301671 | Nigeria | 1/28/1997 | Health Coach {redacted} | Support Request: My name is {redacted}. I am facing issues with my recent purchase. Reach me at 123-45-6789 or at {redacted} for order number #56789 details. |

You can see which tags are available to reference in your rules by calling list_hooks. The sensitive write context uses fast-pii by default:

amr.list_hooks()

Which prints:

[{'name': 'accurate-pii',
'version': '1.0.0',
'summary': 'An accurate classifier',
'description': 'An accurate PII classifier with a latency of ~1s',
'output_span_tags': ['tag.antimatter.io/pii/id',
'tag.antimatter.io/pii/date',
'tag.antimatter.io/pii/credit_card',
'tag.antimatter.io/pii/email_address',
'tag.antimatter.io/pii/ip_address',
'tag.antimatter.io/pii/location',
'tag.antimatter.io/pii/phone_number',
'tag.antimatter.io/pii/ssn',
'tag.antimatter.io/pii/driver_license',
'tag.antimatter.io/pii/age',
'tag.antimatter.io/pii/name',
'tag.antimatter.io/pii/secret'],
'output_capsule_tags': []},
{'name': 'data-structure-classifier',
'version': '1.0.0',
'summary': 'Data structure classifier',
'description': 'Tagging based on the capsule data structure.',
'output_span_tags': [],
'output_capsule_tags': []},
{'name': 'fast-pii',
'version': '1.0.0',
'summary': 'A fast PII classifier',
'description': 'A fast PII classifier with a latency of 10-30 ms, but does not perform as well as the accurate classifier',
'output_span_tags': ['tag.antimatter.io/pii/credit_card',
'tag.antimatter.io/pii/date',
'tag.antimatter.io/pii/email_address',
'tag.antimatter.io/pii/ip_address',
'tag.antimatter.io/pii/location',
'tag.antimatter.io/pii/phone_number',
'tag.antimatter.io/pii/ssn',
'tag.antimatter.io/pii/name',
'tag.antimatter.io/pii/driver_license',
'tag.antimatter.io/pii/id',
'tag.antimatter.io/pii/age'],
'output_capsule_tags': []},
{'name': 'regex-classifier',
'version': '1.0.0',
'summary': 'regular expression classifier',
'description': 'Tagging based on regular expression matches.',
'output_span_tags': [],
'output_capsule_tags': []}]

One of the advantages of using Antimatter is that the policy captured in the read context is separate from your data pipeline. Often, the rules of what data is allowed to be used by whom are actually decided by different stakeholders (e.g. the security or legal teams) than the folks who are doing data cleaning and augmentation for the purposes of analytics.

The data policy rules can be configured by anyone who is invited to the domain, using the python libraries, the command line tool, or the web app. Let's create an API key for a colleague on the security team to configure the read contexts. For simplicity, we'll make them an admin:

apik = amr.insert_identity_provider_principal('apikey',
capabilities={'admin':None},
principal_type=antimatter.PrincipalType.ApiKey)
print (f"Login with --domain-id={amr.config()['domain_id']} --api-key={apik['api_key']}")

Which, for example, prints something like:

Login with --domain-id=dm-AnV7uMCPQ3r --api-key=YXBpa2V5OlhWdFhNnmdmYWEzS3dYRjY5d053UnBVdWFoeEp3Vkpn

They can use the CLI (for example) to interact with the domain by first downloading the CLI and logging in, substituting your domain ID and API key:

sudo curl https://get.antimatter.io/cli/darwin/arm64/am -o /usr/local/bin/am
sudo chmod a+x /usr/local/bin/am

am config domain login --domain-id=dm-AnV7uMCPQ3r --api-key=YXBpa2V5OlhWdFhNnmdmYWEzS3dYRjY5d053UnBVdWFoeEp3Vkpn

Now they can list the read contexts:

am read-context list
readContexts:
- name: analytics
summary: redacts data for use in analytics
description: ''
disableReadLogging: false
keyCacheTTL: 0
readParameters: []
imported: false
- name: default
summary: Default read context
description: The default read context
disableReadLogging: false
keyCacheTTL: 0
readParameters: []
imported: false

They can add a data policy rule, e.g. to redact physical addresses as follows.

First, create a new data policy.

am data-policy create \
--name redact-location \
--description "Redact all spans with a location tag"
policyID: pl-mhjm8gbjz5x11529

Now create a data policy rule. The effect is Redact, and the clause matches any span tagged with the tag.antimatter.io/pii/location tag.

am data-policy rule create \
--policy-id pl-mhjm8gbjz5x11529 \
--comment "Redact all spans with a location tag" \
--effect Redact \
--priority 0 \
--clause '{"operator": "AllOf", "tags": [{"name": "tag.antimatter.io/pii/location", "operator": "Exists"}]}'
newRules:
- rl-lsv7ez3f3rap5wdl
deletedRules: []

Finally, attach the data policy to the analytics read context by setting the attachment state to Attached.

am data-policy binding set-read-context-attachment \
--policy-id pl-mhjm8gbjz5x11529 \
--read-context-id analytics \
--attachment Attached

To review the attachment state of a given policy for each read context, describe the bindings for the policy.

am data-policy binding describe --policy-id pl-mhjm8gbjz5x11529
policyId: pl-mhjm8gbjz5x11529
policyName: redact-location
imported: false
readContexts:
- name: analytics
configuration: Attached
status: Attached
source: ContextConfiguration
- name: default
configuration: NotAttached
status: NotAttached
source: Default
defaultAttachment: NotAttached

Now, in python, if we read the same capsule as before, we will see that addresses have been redacted. We don't need to re-encapsulate or re-materialize any datasets. For example purposes, lets read it from the file instead of using the capsule variable:

# we saved "mycapsule.ca" earlier
amr.load_capsule(path="mycapsule.ca",read_context="analytics").data()

We will see:

|    |   id | first_name   | last_name   | email      | gender   | ip_address     | cc               | country                   | birthdate   | title                    | comments                                                                                                                                                                                                                                                                  |
|---:|-----:|:-------------|:------------|:-----------|:---------|:---------------|:-----------------|:--------------------------|:------------|:-------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 1 | {redacted} | {redacted} | {redacted} | Female | 1.197.201.2 | 6759521864920116 | {redacted} | 3/8/1971 | Internal Auditor | Hello friends, my name is {redacted} and I just turned 29 years old! ๐ŸŽ‰ I am looking forward to connecting with all of you. Feel free to drop me a line at {redacted} or call me at 415-123-4567. |
| 1 | 2 | {redacted} | {redacted} | {redacted} | Male | 218.111.175.34 | | {redacted} | 1/16/1968 | Accountant IV | Customer feedback: I recently visited your store at {redacted}. My name is {redacted}, age 43. I had a wonderful experience and the staff was very friendly. You can reach out to me at {redacted} for any further details. |
| 2 | 3 | {redacted} | {redacted} | {redacted} | Female | 7.161.136.94 | 6767119071901597 | {redacted} | 2/1/1960 | Structural Engineer | Booking Confirmation: Thank you, {redacted} (DOB: 01/12/1978) for booking with us. We have received your payment through the credit card ending with 1234. Your booking ID is #67890. Please save this email for your records. For any queries, contact us at {redacted}. |
| 3 | 4 | {redacted} | {redacted} | {redacted} | Female | 140.35.109.83 | 3576031598965625 | {redacted} | 4/8/1997 | Senior Cost Accountant | Hi, I am {redacted}, aged 33, and I recently moved to {redacted}. I am looking to make new friends in the neighborhood. Feel free to call me at 323-987-6543 or email me at {redacted}. |
| 4 | 5 | {redacted} | {redacted} | {redacted} | | 169.113.235.40 | 5602256255204850 | South Africa | | | Urgent: My name is {redacted}, my SSN is {redacted}. I noticed some unauthorized transactions on my credit card number ending in 5678. I am 39 years old, and I urgently need assistance with this. Please contact me at 213-123-9876 or {redacted}. |
| 5 | 6 | {redacted} | {redacted} | {redacted} | Female | 195.131.81.179 | 3583136326049310 | {redacted} | 2/25/1983 | Account Executive | Hello, I'm {redacted}. Iโ€™m 36 years old, residing at {redacted}. If anyone nearby wants to connect, feel free to email me at {redacted} or call 512-345-6789. |
| 6 | 7 | {redacted} | {redacted} | {redacted} | Male | 232.234.81.197 | 3582641366974690 | {redacted} | 12/18/1987 | Senior Financial Analyst | Hi, my name is {redacted}, I am 40 years old, and my SSN is {redacted}. Please contact me regarding my account details at 415-234-5678 or {redacted}. |
| 7 | 8 | {redacted} | {redacted} | {redacted} | Male | 91.235.51.73 | | {redacted} and {redacted} | 3/1/1962 | Web Developer IV | Customer Feedback: I'm {redacted}, 32 years old. I had a great experience shopping online at your store. Reach me at {redacted} or {redacted} for further feedback. |
| 8 | 9 | {redacted} | {redacted} | {redacted} | Male | 132.31.53.61 | | {redacted} | 3/27/1992 | Software Test Engineer I | Hey, itโ€™s {redacted}, I am 28 years old. I noticed a discrepancy in my latest bill. My address is {redacted}. Please, get in touch at {redacted} or 305-123-4567. |
| 9 | 10 | {redacted} | {redacted} | {redacted} | Female | 143.28.251.245 | 3574254110301671 | {redacted} | 1/28/1997 | Health Coach {redacted} | Support Request: My name is {redacted}. I am facing issues with my recent purchase. Reach me at 123-45-6789 or at {redacted} for order number #56789 details. |

Where you can see that the addresses are now redacted too.

We used a Pandas dataframe above, but you can encapsulate data of multiple different shapes. For example, even a plain string can be encapsulated by itself:

string_cap = amr.encapsulate(
"""
This works with arbitrary data, e.g. 'contact Alan McKinsey at some@email.com'",
We support many shapes of data, like strings, dicts, lists of dicts, pandas dataframes, pytorch data loaders etc
""", write_context="sensitive")
print(amr.load_capsule(data=string_cap, read_context="analytics").data())

Which should print:

    This works with arbitrary data, e.g. 'contact {redacted}'",
We support many shapes of data, like strings, dicts, lists of dicts, pandas dataframes, pytorch data loaders etc

Run this example on Google Colab.

Run this example on Binder: Binder

Download the source code On GitHub