Responsible, Privacy-preserving Machine Learning
Machine learning workflows are comprised of a common set of steps: Gathering data; Pre-processing data; Devising model inputs and choosing the best model; Training and testing the model; and Evaluating the model. Traditionally, the best model is chosen based on performance metrics such as accuracy or error rate. However, performance metrics alone are no longer considered the sole or even primary criteria when it comes to deciding which model should be moved into production.
For instance, in financial services, insurance, healthcare, and supply-chain domains regulation is spreading in full force (e.g., the EU’s GDPR; China’s updated Information security technology; or California’s Consumer Privacy) and domain specific guidelines are being developed for responsible usage of data analytics and machine learning (e.g., MAS’ Principles to Promote Fairness, Ethics, Accountability and Transparency in the Use of AI and Data Analytics in Singapore’s financial sector; or the EU’s Ethics Guidelines for Trustworthy Artificial Intelligence) with the main objectives to help protect people, preserve privacy, and instil confidence in data-driven / machine learning-based decision making. Devising machine learning solutions that are secure, robust, and transparent is no easy feat and an ongoing focus across both academia and start-ups.
In October 2019, we helped to organise and conduct to second Singapore OpenMined meetup (jointly organised with the Singapore ACM SIGKDD Chapter) hosting two talks addressing different aspects of responsible, privacy-preserving machine learning:
- “Data Privacy in Machine Learning: From Centralized Platforms to Federated Learning” by Assistant Professor Reza SHOKRI (CompSci, NUS); and
- “Tests and Metrics to Evaluate ML Model Explanations” by Naresh R. SHAH (Co-founder and CTO at Untangle AI).
Prof Reza, an expert in data privacy and trustworthy machine learning, first introduced common types of inference attacks on machine learning algorithms (i.e., membership inference attacks and reconstruction attacks) and then detailed a range of possible privacy risk scenarios in both, black-box and white-box settings. As possible protection mechanisms, approaches from trusted hardware, federated learning, secure multi-party computation, and differential privacy were presented.
Dr. Naresh, an expert in image processing, artificial intelligence, and explainability, addressed the black-box problem inherent in deep learning-based solutions. While there are a range of explanation methods available today, it is essential to understand their limitations and pitfalls. As such, having capabilities that enable data scientists to understand when an issue arises from the explanation method as opposed to the model itself is highly desired. To that end, Naresh presented a test suite similar to metrics and a range of tests available for most prevailing machine learning methods.