deduce

IntroductionΒΆ

  • ✨ Remove sensitive information from clinical text written in Dutch

  • πŸ” Rule based logic for detecting e.g. names, locations, institutions, identifiers, phone numbers

  • πŸ“ Useful out of the box, but customization higly recommended

  • 🌱 Originally validated in Menger et al. (2017), but further optimized since

❗ Deduce is useful out of the box, but please validate and customize on your own data before using it in a critical environment. Remember that de-identification is almost never perfect, and that clinical text often contains other specific details that can link it to a specific person. Be aware that de-identification should primarily be viewed as a way to mitigate risk of identification, rather than a way to obtain anonymous data.

Currently, deduce can remove the following types of Protected Health Information (PHI):

  • πŸ‘€ person names, including prefixes and initials

  • 🌎 geographical locations smaller than a country

  • πŸ₯ names of hospitals and healthcare institutions

  • πŸ“† dates (combinations of day, month and year)

  • πŸŽ‚ ages

  • πŸ”’ BSN numbers

  • πŸ”’ identifiers (7+ digits without a specific format, e.g. patient identifiers, AGB, BIG)

  • ☎️ phone numbers

  • πŸ“§ e-mail addresses

  • πŸ”— URLs

CitingΒΆ

If you use deduce, please cite the following paper:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853