Best Practices

Curation & Harmonisation

Data curation is becoming more important for research communities. There is an expectation from funders that outputs from research they support be shared and this includes data (Johnston, 2017, p.6)[1]. There are also increasing requests from researchers to be able to access secondary data so that they can determine the reproducibility. As Manu and Gala (2021)[2] state, ‘data curation is the technical function that ensures research datasets are stored and managed in ways that promote ongoing integrity and accessibility’. Data curation is a process that allows for research data to be findable and reused now and into the future. The curation process is also important because if shared as secondary data, a researcher may need to clean the data so it is fit for their purpose. You read more about how the IRISS project assessed data curation in the report here.  

 

[1] Johnston, L.R. (Ed.). (2017). Curating Research Data. Volume One: Practical Strategies for Your Digital Repository. Association of College and Research Libraries. https://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/booksanddigitalresources/digital/9780838988596_crd_v1_OA.pdf 

  

[2] Manu, T.R & Gala, B. (2021). Data Curation Activities in Research Data Repositories: Best Practices. International Conference on Statistical Tools and Techniques for Research Data Analysis ICSTRDA 2021. ResearchGate. https://www.researchgate.net/publication/348873527_Data_Curation_Activities_in_Research_Data_Repositories_Best_Practices 

 

Data Integration

 

Data integration refers to an overarching process of aggregating data from a number of different sources. As different data sources often represent information in various ways, the process of integration typically incorporates additional steps such as data cleaning or transformation as well as the ingestion of the (newly tidied) data. Combining information from disparate but complementary datasets can provide data analytics with greatly enriched and diversified information [1]. 

There is no one universal process for data integration, but certain elements of digital infrastructure are typical: these include a network of original data sources, a master server in which they are all aggregated, and a single point of access. Adherence to shared data standards, vocabularies, ontologies, and schemas is often a key component in data aggregation.

Data aggregation is a powerful tool for analysis and research, but where it concerns the data of individual people, it has potential for a privacy disaster [1]. The practical and pragmatic methods for anonymising data are centred around removal of details, and the process of data aggregation can (re)fill the gaps. For this reason, careful and ethical consideration of data aggregation and subsequent access to aggregated datasets is a critical feature of data science [1].

You can read more about data integration in the IRISS reports here.

 

[1] Nurmikko-Fuller, Terhi. Linked Open Data for Digital Humanities. Taylor & Francis, 2023.