OCIS 2018 Keynote, featuring Sirkka Jarvenpaa (The University of Texas at Austin)
Generativity in Data Infrastructures:
Exploring Tensions in Linked Health Data in Medical Genetics Initiatives
Summary by Jean-Charles Pillet and Maheshwar Boodraj,
OCIS Student Representatives-at-Large.
Medicine is at a turning point because of advances in the ways that we can handle large amounts of unstructured and semi-structured data. Hefty investments are being made in strategic initiatives aimed at developing a digital infrastructure for the collection, integration, and distribution of health data. Often coordinated by governments, these strategic initiatives involve a myriad of private actors who collectively contribute to shaping an infrastructure that will pave the way to new medical discoveries over the next 10 to 20 years. In the United States, the “Precision Medicine Initiative” (also known as the “All of Us” Research Program) is collecting genetic data, biological samples, and other information about the health of one million volunteers with the goals of better predicting disease risk, understanding how diseases occur, and improving diagnosis and treatment strategies. Our keynote speaker, Sirkka Jarvenpaa (The University of Texas at Austin), and her colleague M. Lynne Markus (Bentley University) are currently involved in six of these strategic medical genetic initiatives worldwide. During her keynote address, Sirkka highlighted several critical challenges that are currently being faced in this area.
While typical information systems projects focus on a defining and meeting a set of requirements to address a relatively known and bounded problem or set of problems, medical genetic initiatives require planning for problems which are totally unknown today. In this context, the challenge is to establish an open-ended infrastructure that will not hinder future discoveries for unforeseen uses and users but rather facilitate their emergence. Two additional difficulties arise with these large scale initiatives: the heterogeneity of the profiles of prospective users of the data who do not share a mental model, and the ever changing nature of these mental models. For example, disease or genomic categories are prone to change in areas of medical breakthrough. Consequently, discoveries that are made along the way may point to the importance of factors that might have been overlooked at the time when the initial scope of data collection was defined. Therefore, the very foundation upon which these infrastructures are built appear as ever-changing and largely unknown. This highly uncertain and unknown world represents a considerable challenge given what is at stake both financially and for our health.
Diverse membership in these programs is another intriguing aspect that the research has identified. There are many groups of participants including private actors that make substantial investments in these initiatives, and volunteers who agree to share their personal data often for no benefit of their own. For private actors, success is far from guaranteed and it is uncertain what proportion of the 35,000 genome projects involving data infrastructures will remain 20 years from now. Whlle there is much entrepreneurial energy devoted in these strategic initiatives, their financial sustainability is a very long road indeed. These programs require long term engagement from actors who do not have a clear idea of the results they will yield, even under conditions of success. The strategic initiatives are also dependent on major culture change particularly in terms of data reuse. Traditionally scientists have secured cohorts and collected their own data and built their reputations on such data. Researchers who rely on datasets of others, so called “data parasites,” have been viewed as free-riders. Yet, these “data parasites” are exactly what is needed to maximize value from the data investments. The error prone nature of the data and challenges in data updating were also mentioned. Many tensions surround the data governance of the initiatives.
Societal trends are also critical for success. Such trends will influence the quantity of the data that is being collected: without the consent of participants to voluntarily share sensitive information, such initiatives would remain wishful thinking. Yet major obstacles to participation arise when neither the question of who will be able to access the data nor what it will be used for can be defined. Such uncertainties could deter volunteers from participating in these programs or prompt them to quit after a few years. This poses a considerable challenge given that the success of these initiatives requires the long-term commitment of its participants to be successful.
In closing, Sirkka outlined what she and Lynne have discovered so far. They observed that successful initiatives tend to have high degrees of control over the data they collect, for example, allowing access to the data but denying the ability to download or manipulate it. While this strong sense of control might prove beneficial in the short term, it could impede scientific advances over the long run. Indeed, when designing for generativity, relational aspects where unexpected combinations are made are more important than the sheer amount of data collected. This will be a key challenge in contexts dominated by protective data controls. These large scale data collection initiatives continue to struggle how to balance the ethical concerns of obtaining broad consent in an open-ended program but at the same time informing participants about how their personal data is most likely to be used and who is most likely to derive value from it.
While there is much to be learnt from these six initiatives, which are still at a nascent stage, the many insightful questions posed by the keynote participants show that much is still unknown about generativity in data infrastructures.