Best practices for bolstering machine learning security
Machine learning security is very important in business
ML security has the same goal as all cybersecurity measures: reducing the risk of sensitive data being exposed. If a bad actor interferes with your ML model or the data it uses, that model can produce inaccurate results, at best undermining the benefits of ML and at worst. negatively impact your business or customers.
“Executives should be concerned about this because there is nothing worse than getting it wrong very quickly and confidently,” said Zach Hanif, vice president of machine learning platforms at Capital One. And while Hanif works in a regulated industry – financial services – that requires additional levels of governance and security, he says every business adopting ML should take the opportunity to test the its security operations.
Devon Rollins, vice president of network engineering and machine learning at Capital One, adds: “Securing business-critical applications requires a different level of protection. It’s safe to assume that it’s important to deploy multiple ML tools at scale given the role they play for businesses and how they directly impact user outcomes.”
New security considerations to keep in mind
While best practices for securing ML systems are similar to those for any software or hardware system, greater adoption of ML also introduces new considerations. Hanif explains: “Machine learning adds another layer of complexity. “This means that organizations have to consider many points in the machine learning process that might represent entirely new vectors.” These core workflow elements include ML models, the documentation, and the systems around them, and the data they use and the use cases they enable.
It is imperative that ML models and supporting systems be developed with security in mind from the outset. It is not uncommon for engineers to rely on freely available open source libraries developed by the software community, rather than coding every aspect of their program. These libraries are often designed by software engineers, mathematicians, or academics who may not be well versed in writing secure code. Hanif adds: “The people and skills required to develop advanced or high-performance ML software may not always intersect with security-focused software development.
According to Rollins, this highlights the importance of cleaning up the open source libraries used for ML models. Developers should think about considering security, integrity, and usability as a framework to guide information security policy. Confidentiality means that data assets are protected from unauthorized access; integrity refers to the quality and confidentiality of data; and availability ensures that the right authorized users can easily access the data needed for the current job.
In addition, ML input data can be manipulated to compromise a model. One risk is inferential manipulation—essentially altering the data to fool the model. Since ML models interpret the data differently than the human brain, the data can be manipulated in ways that are not noticeable to humans, but that still alter the results. For example, all that is needed to compromise a computer vision model might be a pixel or two change in the stop sign image used in that model. The stop sign can still be seen by the human eye, but the ML model may not classify it as a stop sign. Alternatively, one can probe a model by submitting a series of different inputs, thereby learning how the model behaves. Hanif explains, by observing how inputs affect the system, external actors can figure out how to disguise a malicious file so that it goes undetected.
Another vector for risk is the data used to train the system. Third parties can “poison” the training data so that the machine learns something wrong. As a result, the trained model will make mistakes—for example, automatically identifying all stop signs as give way.