The fourth meeting of the Expert Group Privacy Technologies for Data Collaboration took place in Zurich and online on November 23, 2021 in the afternoon. We were joined by 9 participants.
Juan Troncoso-Pastoriza from EPFL and co-founder of the startup Tune Insight explained the concept and of homomorphic encryption as well as its use for federated learning. He showed several use cases were homomorphic had been applied to (e.g., in the health sector).
In the second half of the meeting the participants discussed the modus operandi for nexts years meeting and fixed three dates for 2022.
Geodata are used in various industries and academic fields and often have to meet specific requirements in order to be used, for example in terms of geometry, recording time point or semantics. But often different geodata sets have similar geometric properties but different semantics or are captured at different times – or vice versa. Accordingly, the added value arises when your data sources start to «talk to each other», connection points between the data are used or possible gaps can be filled. In this context, there is a multitude of technical terms, which are sometimes used differently depending on the subject area, sometimes are used synonymously and sometimes are used inappropriately in their terminology – so you will read about «append», «merge», «relate», «link», «connect», «join», «combine», or «fuse», just to mention a few.
In the last meeting of the Spatial Data Expert Group on the 4th of November, this topic was presented and discussed, and the challenges and potential of the concept were highlighted. This included a critical examination of the semantic classification as well as the presentation of various possible applications in research and industry. Our host was the UZH Space Hub at the University of Zurich, represented by Dr. Claudia Röösli.
So, what is Data Fusion – with a strong focus on spatial data? For some, it means more a list of different data sets, with a narrative relating one data set to the next. For others, it means visualizing different data sources on the same graph to spot trends, dynamics, or relations. In the spatial domain, the basic concept of data fusion is often the extraction of the best-fit geometry data as well as the most suitable semantic data and acquisition times from existing datasets.
The keynote was given by Dr. André Bruggmann, Co-CEO, Data Scientist and Geospatial Solutions Expert at Crosswind GmbH. Under the motto “Unlock the Where.”, he presented how data fusion techniques help customers gain new insights, from (spatial) visualizations and web applications to facilitate strategic business decisions (e.g., selection of optimal point of sale locations). In addition, he presented a project where data fusion techniques are applied to make detailed and future-oriented statements about the assertiveness of e-mobility and identify relevant trends for the automotive industry.
These inputs led to an exciting discussion between the experts present – not only on the technical implementations presented, but also regarding the potential for optimisation and possible future cooperation. This is exactly how the initiators of the event had envisioned it – an open and inspiring exchange in line with the basic idea of open innovation.
HR analytics is studying all people’s processes, functions, challenges, and opportunities at work to elevate these systems and achieve sustainable business success. The active use of advanced analytics can improve the way organisations identify, attract, develop, and retain talent dramatically. On the other hand decisions supported by data and algorithms may be hiding risks for employees as well as employers that most companies do not even consider.
On November 4, 2021, this question and topics were examined from different perspectives – by renowned speakers from both the academic and business environment.
With the event generously hosted by the Digital Society Initiative at the University of Zurich, Markus Christen (UZH) and Karin Lange (die Mobiliar) opened the event – both on the leadership team of the Data Ethics Expert Group of the data innovation alliance.
“If your PA makes your employees visible: Let them participate in the design and use and explicitly forego automation. If your PA bears automation potential: Allow your employees to withdraw OR explain and justify its “raison d’être” comprehensively!”
Nadia Fischer of witty works – a company that uses its product to help companies formulate job ads in such a way that the open positions can appeal to all types of talent – proved to the audience that we all have cultivated a language bias. With the help of artificial intelligence, this can be made visible and corrected.
Finally, the Mobiliar Lab for Analytics, represented by Mara Nägelin and Jasmine Kerr, used a research project on digital stress intervention to demonstrate on the one hand the great benefits for health management within companies – but also the great potential ethical dangers. In an exemplary manner, the scientists had clarified and cleared up all ethical implications in advance before the project started and repeatedly checked whether privacy and informational self- determination were guaranteed.
Three examples of how digital responsibilities can and must be assumed in order to continue to make our world worth living – and working – in in the future and not generate black boxes!
The panel discussion with the speakers continued in a lively manner during the subsequent aperitif. The audience agreed: A successful event with valuable insights on an important topic.
On October 22, the expert group Smart Services welcomed worldwide top experts to the fourth Smart Services Summit. The focus was on how Smart Services allow firms to adapt in the COVID-19 pandemic. Examples of remote and collaborative working have created new forms of co-delivery where customers are integrated into the service processes. Such a change requires a mindset change for more traditional firms as the service model migrates from ‘do it for you’ to ‘do it yourself’ or some mix of ‘do it together’. Considering service science, the switch makes perfect sense as it means that the full set of resources within the ecosystem are now being used rather than only a part. Services can be delivered faster and at lower costs with the support of new technologies and when working with the customer in a co-delivery mode. The changes are leading to new value propositions and business models today and will lead to an evolution in Smart Services in the future. The changes themselves must be understood, and we may need to consider new or different implementation and delivery models for Smart Services. These new working approaches may also requite use to re-evaluate both training and education.
Across the papers and presentations, it became apparent that digital service innovation has substantially changed and accelerated since the start of the pandemic. Customer needs and service processes have undergone dramatic disruption, which is still ongoing. A common thread throughout all the papers was the concept of the ecosystem thinking, which was discussed from a wide field of perspectives and in a comprehensive way. In line with the concept of the Service-Dominant Logic, the needs of the different actors in the ecosystem need to be identified and integrated into the design of the services and the integration of the various resources in the ecosystem. The ecosystem perspective not only integrates the different human actors, but also technological, digital resources.
Innovation through intensive collaboration allows switching different perspectives and innovation approaches. This results in seamless value propositions and solutions for the beneficiary actors, which is a necessary prerequisite for economic value creation. Well-designed service experiences based on a consequentially customer-centric view and approach are thus at the basis of value creation.
This transition to digital service innovation in ecosystems requires not only fundamental changes of the technological platforms. In particular, collaboration across actors, organizations, and industry requires a new level of trust, culture, skills, marketing approaches and innovation frameworks.
What a great late afternoon we had at the Kongresshaus in Zurich. 33 representatives of the members of our alliance met at the General Assembly, an exciting member talk and an apero like in old times.
The General Assembly was rather unspectacular. The financial report for 2020, the budget for 2022, the discharge of the board and auditors were unanimously approved without larger discussions.
Congratulations to the board members Matthias Brändle, La Mobilière, Hans Peter Gränicher, D ONE Solutions AG (Vice-President), Christoph Heitz, ZHAW IDP (President), Anne Herrmann, FHNW Institute for Market Supply & Consumer Decision-Making, Thilo Stadelmann, ZHAW Center of AI and Matthias Werner, Trumpf Schweiz AG that were re-elected for another two years.
The member talk on “An AI practitioner’s guide to data privacy“ by Jacqueline Stählin from D ONE on a mandate for La Mobilière kick-started a discussion with many questions and shared experiences from the community. It was a great transition to the informal apero.
After 2 years without a physical conference among members, a lot of old and new news had to be shared. Some members that joined during the pandemic had the chance to meet the community for the first time in such a setting and for the others it felt a bit like a class reunion. I think I can speak for all of us that we look forward to the next community gathering.
The ETH AI Center celebrated its first birthday on October 15, 2021, at the AI+X Summit and the data innovation alliance was there to congratulate and to join the inspiring crowd. The day started with workshops.
David Sturzenegger and Stefan Deml from Decentriqorganized one of the workshops on “Privacy-preserving analytics and ML” in the name of the alliance.
It was our first in-person workshop again, and such a great experience for us. We gave an overview of various privacy-enhancing technologies (PETs) to a very engaged and diverse audience of about 30 people. We had in-depth discussions about the use-cases that PETs could unlock, and also presented about Decentriq’s data clean rooms and our use of confidential computing. Our product certainly generated a lot of follow up interest, especially from those who wanted to reach out to demo the platform. We were also joined by a guest speaker from Hewlett Packard who spoke about “Swarm Learning”.
David Sturzenegger, Stefan Deml
Melanie Geiger from the data innovation alliance office attended the workshop about AI + Industry & Manufacturing led by Olga Fink from ETH. The overall goal of the workshop was to identify the next research topics. Small groups with representatives from manufacturing companies mixed with researchers discussed the challenges and opportunities of predictive maintenance, quality control, optimization, and computer vision. We identified research topics such as more generalizable predictive maintenance methods that work for multiple machines or even multiple manufacturing companies. But we also realized that some challenges are more on the operational side or applied research like in the integration of the method into the whole manufacturing process and closing the feedback loop.
In the evening the exhibition and the program on the main stage attracted 1000 participants. We had many interesting discussions at our booth with a wonderful mix of students, entrepreneurs, researchers, and people from the industry. Of course, we also saw many familiar faces and due to the 3G policy, we got back some “normality”.
The 13th meeting of the expert group “Blockchain Technology in Interorganisational Collaboration” took place on the evening of the 30th of September.
At the beginning of the meeting, the group discussed the topic of a Swiss e-ID. The members were able to exchange valuable insights regarding self-sovereign identity. Self-sovereign identity enables individuals to have control over their digital identity. Due to its immutability, blockchain technology can enable self-sovereign identity. Wallet recovery, scalability and API availability are the main issues being worked on in this field right now. The group decided to send a joint statement with implementation recommendations to the federal office of justice.
Because it has been a long time since the last in-person meeting, the remainder of the time was used by the members to share ongoing blockchain projects and their status, recent blockchain-related news and lessons learned. Talking points included the issue of scaling and potential solutions to it and decentralized autonomous organizations.
After the meeting, the informal part of the meeting continued with a beer at the local pub.
Spatial Data was a workshop topic at the #wetechtogether conference which took place on October 2. The data innovation alliance, its member Litix and WiMLDS Zurich sponsored and organized two workshops entitled “Jump into Geodata”. The participants learned how to access Swiss geoservices with Python and how to use them for the presentation and analysis of geodata. We would like to thank the participants for their attendance and the lively discussions in the workshops. We hope that this will lead to new innovative geospatial projects in the future!
Innovative ideas around geodata are indeed welcome, since Spatial Data Analytics is one of four focus topics in the Databooster initiative. Project ideas related to geospatial data have increased chances of receiving Databooster support!
Being part of the #wetechtogether conference was a great experience for data innovation alliance and Litix. We support the main goal of the conference to empower people to bring more diversity into tech. We are already looking forward to the 2022 edition!
First, Simon Würsten from SBB introduced various methods of data anonymization. Each of the presented methods can be considered as a trade-off between anonymization strength and expressiveness of the data (i.e. to minimize disclosure risk and to maximize data utility. For instance, some methods randomly change the values of data while others reshuffle the content of values between different attributes. Depending on which types of data analysis is performed, the respective anonymization methods can be chosen along with a report about the strength of the methods. The presented approaches have a very high potential to be used in various data-sensitive areas such as health care or e-government. The technology is ready to be used, for instance, in a PoC by other Alliance members (see R library sdcMicro).
Next, Gerald Reif from IPT presented the big data and AI architecture blueprint on the Microsoft Azure Cloud. Currently, one of the most widely used approaches is the lambda-architecture which consists of the in three layers: (1) The Speed Layer for real-time stream processing, (2) the Batch Layer for processing big amount of stored data, and (3) the Service Layer for presenting and reacting on the analysis results. There is currently a clear trend of combining and consolidating big data and machine learning technology from Apache Spark and Azure PaaS services. The advantage of the combined solution is the bleeding edge open-source technology of Apache Spark coupled with the enterprise features and user management functionality of Microsoft.
Finally, Luca Furrer from Trivadis provided insights into latest tools to enable reproducibility of data science experiments. In principle, three different aspects need to be reproducible: Data, code/models, and parameters. Promising tools for these aspects are dvc, mlflow and git. The advantage of these tools is that data scientists can easily keep the history of code and data and track the results of various machine learning experiments along with the chosen parameters. The tools integrate well together through git.
The presentations were followed by lively discussions about the methods, the architectures, and the experiences of using them in real life. One of the main questions was about the experience of deploying machine learning models in production over longer periods of times. A typical phenomenon is that big data and AI technology is often successfully used in proof of concepts but there is little information of how the approaches “pass the test of time” in real production environment.
As part of a future event – and possibly in collaboration with the expert group on machine learning – we are planning to report on the experience of using machine learning models in production. Typical questions to be addressed are: What models should be deployed? How often should models be deployed? When should re-training be done? How do we handle rapidly changing data? How do models degrade over time and what can we do to mitigate model degradation?
The 3rd meeting of the Expert Group Privacy Technologies for Data Collaboration took place online on September 8, 2021 in the afternoon. We were joined by 14 participants.
Nico Ebert from ZHAW opened the meeting with a discussion about the possibilities for a physical meeting at the fourth meeting on November 26. The participants agreed to meet in Zurich. He also introduced the speaker for the upcoming meeting, namely Juan Troncoso-Pastoriza from EPFL and co-founder of the startup Tune Insight. Juan will introduce the group to the basics of homomorphic encryption.
Afterwards Matthias Templ, an expert from ZHAW in the areas of data anonymization and synthetic data, presented the concept of synthetic data. Synthetic data is “any production data applicable to a given situation that are not obtained by direct measurement” according to the McGraw-Hill Dictionary of Scientific and Technical Terms. Synthetic data is generated from datasets that often contain personal data and should not be shared with third parties. However, major properties of the synthetic dataset are equal compared to the original dataset and it therefore can be used for similar purposes such as learning about distributions.
Matthias explained that creating synthetic data first requires a good understanding of the original dataset (e.g. personal data about a population). This includes understanding its generation process and its inherent distributions (including marginal distributions). Afterwards these distributions are rebuilt with one or more models (e.g. neural networks, decision trees). The models are then used to generate the synthetic dataset. Matthias has developed and published an r library to accomplish this task. He also demonstrated some of his real-world examples in which synthetic data had been applied. After Matthias’ presentation the participants discussed about the potentials of synthetic data. Another discussion point was which modelling techniques are required for which complexities of the original datasets (e.g. datasets with only a few features require less complex techniques).
In the second half of the meeting the participants discussed the potential benefits of the “Data Collaboration Canvas”. The Data Collaboration Canvas is a graphical workshop tool and has been developed with the help of the Expert Group. It is aimed at organizations that want to explore the potential of data innovation with other organizations at an early stage to create mutual added value. It offers a simple, visual structuring aid, e.g. in workshops, to identify common potentials and hurdles of collaboration. The canvas can not only be used to identify data collaboration opportunities between organizations such as companies but also within an organization (e.g. opportunities between different divisions or departments). Participants applied the canvas in two different use cases and discussed usability and comprehensibility of the canvas afterwards.