Pangeanic Anonymization Solution (PAS) is based on the use of Neural Networks (NN). These networks process text and output annotated content identifying various entities, such as person names, addresses, etc.
However, Neural Networks only identify patterns they were trained on, meaning they may not detect new entities or adapt to specific use cases.
To improve identification performance, PAS combines Neural Networks with:
Dictionaries
Rules
Anonymization Dictionaries
The simplest way to help the NN identify an entity is to declare it in a dictionary. These are called Anon Dictionaries as they list terms that can be anonymized.
For example, in a hospital setting where you want to ensure that all doctor names are anonymized, you can create an Anon Dictionary named DoctorsList
. This file simply contains a list of names, one per line, in plain text format. The dictionary is used during the anonymization process to force detection.
An Anon Dictionary is linked to an entity type, such as PER
(Person Name), or you can create a new entity type (e.g., DOCTOR
) and assign the list to that type.
Clear Dictionaries
Rules are based on Regular Expressions and are used to detect patterns in text. Learn more about regex syntax here: https://en.wikipedia.org/wiki/Regular_expression
Like Anon Dictionaries, rules are associated with a specific or new entity type.
Rules are particularly useful for detecting:
Driving license numbers
Employee IDs
Case or file reference numbers
Bank account numbers
Anonymization Profiles
When users anonymize a document via ECO, they must provide:
The language (e.g., English, Spanish, Japanese)
The list of entities to anonymize
The anonymization mode (e.g., redaction, pseudonymization)
It would be cumbersome to ask users to manually choose every dictionary and rule each time. To streamline the process, admins can create Anonymization Profiles.
An Anonymization Profile bundles dictionaries and rules under a memorable name. Users simply select the language, entity list, anonymization mode, and optionally one of the admin-defined profiles — which applies the appropriate dictionaries and rules automatically.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article