# Migrating to version `3.0.0` Version `3.0.0` of `deduce` includes many optimizations that allow more accurate de-identification, some already included in `2.1.0` - `2.5.0.` It also includes some structural optimizations. Version `3.0.0` should be backwards compatible, but some functionality is scheduled for removal in `3.1.0`. Those changes are listed below. ## Custom config Adding a custom config is now possible as a `dict` or as a filename pointing to a `json`. Both should be presented to `deduce` with the `config` keyword, e.g.: ```python deduce = Deduce(config='my_own_config.json') deduce = Deduce(config={'redactor_open_char': '**', 'redactor_close_char': '**'}) ``` The `config_file` keyword is no longer used, please use `config` instead. ## Lookup structure names For consistency, lookup structures names are now all in singular form: | **Old name** | **New name** | |-------------------------|------------------------| | prefixes | prefix | | first_names | first_name | | interfixes | interfixes | | interfix_surnames | interfix_surname | | surnames | surname | | streets | street | | placenames | placename | | hospitals | hospital | | healthcare_institutions | healthcare_institution | Additionally, the `first_name_exceptions` and `surname_exceptions` list are removed. The exception items are now simply removed from the original list in a more structured way, so there is no need to explicitly filter exceptions in patterns, etc. ## The `annotator_type` field in config In a config, each each annotator should specify `annotator_type`, so `Deduce` knows what annotator to load. In `3.0.0` we simplified this a bit. In most cases, the `annotator_type` field should be set to `module.Class` of the annotator that should be loaded, and `Deduce` will handle the rest (sometimes with a little bit of magic, so all arguments are presented with the right type). You should make the following changes: | **annotator_type** | **Change** | |----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | multi_token | `docdeid.process.MultiTokenLookupAnnotator` | | dd_token_pattern | This used to load `docdeid.process.TokenPatternAnnotator`, but this is now replaced by `deduce.annotator.TokenPatternAnnotator`. The latter is more poweful, but needs a different pattern. A `docdeid.process.TokenPatternAnnotator` can no longer be loaded through config, although adding it manually to `Deduce.processors` is always possible. | | token_pattern | `deduce.annotator.TokenPatternAnnotator` | | annotation_context | `deduce.annotator.ContextAnnotator` | | custom | Use `module.Class` directly, where `module` and `class` fields used to be specified in `args`. They should be removed there. | | regexp | `docdeid.process.RegexpAnnotator` | # Migrating to version `2.0.0` Version `2.0.0` of `deduce` sees a major refactor that enables speedup, configuration, customization, and more. With it, the interface to apply `deduce` to text changes slightly. Updating your code to the new interface should not take more than a few minutes. The details are outlined below. ## Calling `deduce` `deduce` is now called from `Deduce.deidentify`, which replaces the `annotate_text` and `deidentify_annotations` functions. Those functions will give a `DeprecationWarning` from version `2.0.0`, and will be deprecated from version `2.1.0`.
deprecated new
```python from deduce import annotate_text, deidentify_annotations text = "Jan Jansen" annotated_text = annotate_text(text) deidentified_text = deidentify_annotations(annotated_text) ``` ```python from deduce import Deduce text = "Jan Jansen" deduce = Deduce() doc = deduce.deidentify(text) ```
## Accessing output The annotations and deidentified text are now available in the `Document` object. Intext annotations can still be useful for comparisons, they can be obtained by passing the document to a util function from the `docdeid` library (note that the format has changed).
deprecated new
```python print(annotated_text) '' print(deidentified_text) '' ``` ```python import docdeid as dd print(dd.utils.annotate_intext(doc)) 'Jan Jansen' print(doc.annotations) AnnotationSet({ Annotation( text="Jan Jansen", start_char=0, end_char=10, tag="persoon", length="10" ) }) print(doc.deidentified_text) '' ```
## Adding patient names The `patient_first_names`, `patient_initials`, `patient_surname` and `patient_given_name` keywords of `annotate_text` are replaced with a structured way to enter this information, in the `Person` class. This class can be passed to `deidentify()` as metadata. The use of a given name is deprecated, it can instead be added as a separate first name. The behaviour is still the same.
deprecated new
```python from deduce import annotate_text, deidentify_annotations text = "Jan Jansen" annotated_text = annotate_text( text, patient_first_names="Jan Hendrik", patient_initials="JH", patient_surname="Jansen", patient_given_name="Joop" ) deidentified_text = deidentify_annotations(annotated_text) ``` ```python from deduce import Deduce from deduce.person import Person text = "Jan Jansen" patient = Person( first_names=['Jan', 'Hendrik', 'Joop'], initials="JH", surname="Jansen" ) deduce = Deduce() doc = deduce.deidentify(text, metadata={'patient': patient}) ```
## Enabling/disabling specific categories Previously, the `annotate_text` function offered disabling specific categories by using `dates`, `ages`, `names`, etc. keywords. This behaviour can be achieved by setting the `disabled` argument of the `Deduce.deidentify` method. Note that the identification logic of Deduce is now further split up into `Annotator` classes, allowing disabling/enabling specific components. You can read more about the specific annotators and other components in the tutorial [here](tutorial.md#annotators), and more information on enabling, disabling, replacing or modifying specific components [here](tutorial.md#customizing-deduce).
deprecated new
```python from deduce import annotate_text, deidentify_annotations text = "Jan Jansen" annotated_text = annotate_text( text, dates=False, ages=False ) deidentified_text = deidentify_annotations(annotated_text) ``` ```python from deduce import Deduce text = "Jan Jansen" deduce = Deduce() doc = deduce.deidentify( text, disabled={'dates', 'ages'} ) ```