The fourth meeting of the Expert Group Privacy Technologies for Data Collaboration took place in Zurich and online on November 23, 2021 in the afternoon. We were joined by 9 participants.
Juan Troncoso-Pastoriza from EPFL and co-founder of the startup Tune Insight explained the concept and of homomorphic encryption as well as its use for federated learning. He showed several use cases were homomorphic had been applied to (e.g., in the health sector).
In the second half of the meeting the participants discussed the modus operandi for nexts years meeting and fixed three dates for 2022.
The ETH AI Center celebrated its first birthday on October 15, 2021, at the AI+X Summit and the data innovation alliance was there to congratulate and to join the inspiring crowd. The day started with workshops.
David Sturzenegger and Stefan Deml from Decentriqorganized one of the workshops on “Privacy-preserving analytics and ML” in the name of the alliance.
It was our first in-person workshop again, and such a great experience for us. We gave an overview of various privacy-enhancing technologies (PETs) to a very engaged and diverse audience of about 30 people. We had in-depth discussions about the use-cases that PETs could unlock, and also presented about Decentriq’s data clean rooms and our use of confidential computing. Our product certainly generated a lot of follow up interest, especially from those who wanted to reach out to demo the platform. We were also joined by a guest speaker from Hewlett Packard who spoke about “Swarm Learning”.
David Sturzenegger, Stefan Deml
Melanie Geiger from the data innovation alliance office attended the workshop about AI + Industry & Manufacturing led by Olga Fink from ETH. The overall goal of the workshop was to identify the next research topics. Small groups with representatives from manufacturing companies mixed with researchers discussed the challenges and opportunities of predictive maintenance, quality control, optimization, and computer vision. We identified research topics such as more generalizable predictive maintenance methods that work for multiple machines or even multiple manufacturing companies. But we also realized that some challenges are more on the operational side or applied research like in the integration of the method into the whole manufacturing process and closing the feedback loop.
In the evening the exhibition and the program on the main stage attracted 1000 participants. We had many interesting discussions at our booth with a wonderful mix of students, entrepreneurs, researchers, and people from the industry. Of course, we also saw many familiar faces and due to the 3G policy, we got back some “normality”.
The 3rd meeting of the Expert Group Privacy Technologies for Data Collaboration took place online on September 8, 2021 in the afternoon. We were joined by 14 participants.
Nico Ebert from ZHAW opened the meeting with a discussion about the possibilities for a physical meeting at the fourth meeting on November 26. The participants agreed to meet in Zurich. He also introduced the speaker for the upcoming meeting, namely Juan Troncoso-Pastoriza from EPFL and co-founder of the startup Tune Insight. Juan will introduce the group to the basics of homomorphic encryption.
Afterwards Matthias Templ, an expert from ZHAW in the areas of data anonymization and synthetic data, presented the concept of synthetic data. Synthetic data is “any production data applicable to a given situation that are not obtained by direct measurement” according to the McGraw-Hill Dictionary of Scientific and Technical Terms. Synthetic data is generated from datasets that often contain personal data and should not be shared with third parties. However, major properties of the synthetic dataset are equal compared to the original dataset and it therefore can be used for similar purposes such as learning about distributions.
Matthias explained that creating synthetic data first requires a good understanding of the original dataset (e.g. personal data about a population). This includes understanding its generation process and its inherent distributions (including marginal distributions). Afterwards these distributions are rebuilt with one or more models (e.g. neural networks, decision trees). The models are then used to generate the synthetic dataset. Matthias has developed and published an r library to accomplish this task. He also demonstrated some of his real-world examples in which synthetic data had been applied. After Matthias’ presentation the participants discussed about the potentials of synthetic data. Another discussion point was which modelling techniques are required for which complexities of the original datasets (e.g. datasets with only a few features require less complex techniques).
In the second half of the meeting the participants discussed the potential benefits of the “Data Collaboration Canvas”. The Data Collaboration Canvas is a graphical workshop tool and has been developed with the help of the Expert Group. It is aimed at organizations that want to explore the potential of data innovation with other organizations at an early stage to create mutual added value. It offers a simple, visual structuring aid, e.g. in workshops, to identify common potentials and hurdles of collaboration. The canvas can not only be used to identify data collaboration opportunities between organizations such as companies but also within an organization (e.g. opportunities between different divisions or departments). Participants applied the canvas in two different use cases and discussed usability and comprehensibility of the canvas afterwards.