How we’re supercharging our services with machine learning
Machine learning is undoubtedly cool – but comes at a cost. Here’s how we’re improving efficiency for our experts while keeping the environment in mind.
Environmental, health, and safety (EHS) compliance is perhaps not the most obvious target for state-of-the-art solutions based on machine learning (ML). However, appearances can be deceiving. Over the last few years, the team at Enhesa has been working on developing machine learning tools to support our EHS experts and help them to deliver better services to our customers.
Machine learning in EHS at Enhesa
At Enhesa, we’re currently developing and using machine learning tools in 3 main areas:
- Information retrieval: We want to ensure that we retrieve all the necessary information about any particular EHS compliance issue as soon as it is published. We also need to know that we can identify relevant elements of any new information and get the implications to our clients as quickly as possible. Machine learning tools in this area include a custom application for named entity recognition, which we have built to focus on EHS-relevant topics within our specific field of coverage. This application is built in seven different languages to cover our most active jurisdictions.
- Quality assurance: We need to be confident that we are providing actionable information presented concisely to our clients. Examples of tools for this purpose include a commercially available AI-powered grammar and spell-checking tool, which is integrated into the company’s text processing platforms. Our experts also have access to a dedicated optical character recognition tool that enables them to make legal documents machine-readable.
- Translation: Our experts and clients don’t always work in English—we cover more than 35 languages across 300 jurisdictions—so our tools must operate in several languages. However, commercial translation tools don’t always perform well on highly technical documents, as they are trained on general language corpora. We therefore use 27 different custom-trained neural machine translation protocols between English and other selected languages, specialized in EHS terminology.
In all of these applications, we use both custom-built and commercially available tools, to avoid reinventing the wheel.
Integrating machine learning and EHS expertise
Machine learning models require training which requires a lot of data. At Enhesa, we’ve been harvesting and curating information about EHS regulation on a global scale for more than 2 decades. We were therefore in the fortunate position to use our own data for training many of the tools we’ve built in-house.
We’ve also drawn on the expertise of our consultants to ensure the training data was correctly delineated and labeled. By using them as a specialist resource, we were able to label data in multiple languages and have peace of mind in the knowledge that it was as accurate as humanly possible.
Despite this access to expertise and vast quantities of structured data, however, we still encountered some teething issues. For example, one of our models struggled to identify the difference between ‘lead’ (the act of leading) and ‘lead’ (the hazardous heavy metal) in English. To most English-speaking human beings, such nuances are obvious—but not necessarily to an algorithm. To further illustrate some of the challenges of working with data, we caught another error after the initial training of a classifier. We found that the algorithm was classifying almost all texts including the term “woman” under the label “pregnant & breastfeeding workers”, simply because the term “woman” was mentioned so often in texts using this heading.
It’s therefore easy to see how errors and bias can creep in when training and using machine learning models. That’s why we need to carefully check and monitor all our tools to ensure that they continue to do the job we want them to do. We also, of course, keep in mind that they may need updating to reflect changes in the external environment, such as the countless developments that occurred during the evolution of the COVID-19 pandemic.
The key is to find a balance between using ML to provide a better service for our clients while minimizing environmental impacts.
Balancing the cost and benefits of machine learning
That said, we’re going to throw in a cautionary note: We believe that we should only innovate when we can provide real added value to our employees or customers. Any innovation must also address a genuine business need or pain point.
There is a reason for this belief. Innovation costs money– but not only. First, it takes time to develop machine learning models, and to monitor them and ensure that they are accurate. However, we also must recognize that there is an environmental cost to running these models. We can even see how machine learning adds up in air emissions via calculators like “ML CO2 Impact.” The computing workload can be significant. We don’t very often talk about this because we all like to focus on the benefits of these tools. However, it’s not a factor that can be ignored.
For me, the key is to find a balance between using machine learning tools that enable us to provide a better service for our clients while simultaneously minimizing the impact on the environment. This is a crucial part of how Enhesa works. It’s also an important part of our company values. Ultimately, everything boils down to this—we cannot advise our clients to follow environmental, health, and safety best practices if we don’t innovate in sympathy with environmental needs ourselves.