What is a Trusted Research Environment (TRE)
Also known by various other names such as Secure Access Environment (SAE), or Data Safe Haven, a TRE is a highly secure and controlled computing environment designed to facilitate safe access, storage, sharing, and analysis of sensitive data by approved researchers for approved research questions.
The primary purpose of a TRE is to protect the privacy and security of sensitive data, particularly health data, while enabling its utilisation for research to benefit the public. This addresses significant challenges in healthcare research, where vast amounts of biomedical and health-relevant data, such as whole genomes and linked Electronic Health Records (EHRs), are often stored in institutional silos, making them difficult to access and integrate. TREs streamline data access and allow multiple researchers to collaborate on a single project, increasing confidence among patients and data custodians that their data will be kept safe.
Key Features and Components of a TRE
- Curated Gateway: A checkpoint for all data and files moving in and out of the environment, ensuring that only approved data enters and only screened results leave.
- Contained Environment: Project data, tools, and software are kept in a single, isolated space. This prevents unauthorised data movement and ensures project-specific segregation.
- Access Control: Stringent processes and protocols ensure that only authorised individuals access sensitive data. This involves secure login procedures with two-factor authentication, role-based access controls, and limiting access to data based on an individual's role or responsibilities.
- Data Encryption: All data stored within the TRE is encrypted, protecting it from unauthorised access even if intercepted.
- Incident Response and Disaster Recovery Plans: Robust plans are in place to minimise the impact of unauthorised incidents or data breaches and restore data integrity.
- Analytics Enabled: TREs provide the necessary tools and resources for analysts, often including advanced statistical software like R and SPSS, and increasingly support programming within the environment for AI and natural language processing.
- Platform Governance: Well-defined governance frameworks outline the roles and responsibilities of all stakeholders, ensuring data is handled responsibly and in compliance with regulations.
Five Safes framework
The operation of TREs is fundamentally rooted in the internationally recognised Five Safes framework:
- Safe Data: Confidentiality of the data is maintained, with measures like pseudonymisation or anonymisation applied, and data is only accessible if necessary for the research project.
- Safe Projects: Data owners formally approve research projects.
- Safe People: Researchers are trained to use the data safely, understand their responsibilities, and adhere to user agreements.
- Safe Settings: The computing environment is secure, preventing unauthorised data access.
- Safe Outputs: All exported results are rigorously screened and approved to prevent re-identification, typically requiring a minimum number of individuals in any aggregate output.
The "Next-Generation" TRE Capabilities and Challenges
While traditional TREs have been mature in supporting observational studies using classical statistical methods, there is a growing demand for "next-generation" capabilities to address current scientific needs. These include:
- Support for Big, Nonstructured Data: Handling large-scale genomic and imaging data, which can be several terabytes in size.
- AI and Machine Learning (ML) Algorithm Development: The ability to develop and export AI/ML algorithms within the secure environment. This presents a significant challenge due to the risk of sensitive individual-level data being inadvertently "hidden" within algorithm weights or overtrained models. Disclosure control for exporting AI algorithms is particularly challenging, and there is a clear need for additional controls and tools.
- Federated Analysis/Learning: Enabling secure analysis of datasets stored in multiple TREs across different locations without directly sharing or moving sensitive data. This allows for global insights and research while ensuring data security and confidentiality.
- Automation of Output Checks: Moving beyond time-consuming and error-prone manual checks for data export. While full automation is not yet seen as feasible due to security concerns, a hybrid model with automated tools could significantly accelerate the process and reduce risks.
- Scalability: The need for infrastructure to support processor-intensive projects and on-demand computing power with that ability to access High Performance Computing.
UQ TRE
UQ is strategically moving towards establishing an encompassing sensitive data framework built around a next-generation TRE.
Contact us to discuss your sensitive data requirements.