Â鶹AV

Event

MCCHE Precision Convergence Webinar Series with Barend Mons

Thursday, October 31, 2024 11:00to13:00

STOP SHARING DATA: Visiting Algorithms, Swarm Learning and Next Generation FAIR (Federated AI Ready) Principles and Practice

By Barend Mons

Leiden University

Date: October 31, 2024
Time: 11:00am to 1:00pm

Abstract

The rapid developments in the field of machine learning have also brought along some existential challenges, which are in essence all related to the broad concept of ‘trust’. Aspects of this broad concept include trust in the output of any ML process (and the prevention of black boxes, hallucinations and so forth). The very trust in science is at stake, especially now that LLMs can generate ‘good-looking nonsense’ and paper mills come up in response to the perverse reward systems in current research environments. The other side of the same coin is that ML, if nor properly controlled will also break through security and privacy barriers and violate GDPR and other Ethical, Legal and Societal barriers, including equitability. In addition, the existence of data ‘somewhere’ by no means automatically implies its actual Reusability. This includes the by now well established four elements of the FAIR principles: Much data is not even Findable, if found, not Accessible under well defined conditions, and if accessed not Interoperable (understandable by third parties and machines) and this results in the vast majority of data and information not being Reusable without violation of copyrights, privacy regulations or the basic conceptual models that implicitly or explicitly underpin the query or the deep learning algorithm. Now that more and more data will also be ‘independently’ used by machines, all these challenges will be severely aggravated. This keynote will address how ‘data visiting’ as opposed to classical ‘data sharing’, which carries the onnotation of data downloads, transport and loosing control, mitigates most, if not all, the unwanted side effects of classical ‘data sharing’. For federated data visiting, the data should be FAIR in an additional sense or perspective, they should be ‘Federated, AI-Ready’, so that visiting algorithms can answer questions related to Access Control, Consent, Format, and can read rich (FAIR) metadata about the data itself to determine whether they are ‘fit for purpose’ and machine actionable (i.e. FAIR digital Objects, or Machine Actionable Units). The ‘fitness for purpose’ concept goes way beyond (but includes) information about methods, quality, error bars etc. The ‘immutable logging’ of all operation of visiting algorithms is crucial, especially when self learning algorithmsin ‘swarm learning’ are being used. Enough to keep us busy for a while.

Back to top