WP3 Tools for operating trajectories medicine | My way to health

Objectives

One of the principal shortcomings in trajectories medicine is the lack of integration of the entire spectrum of individual determinants including clinical, environmental, social, societal and behavioural factors into the prediction of patients’ treatment responses and outcomes. Different sets of data already exist or will be produced and collected by partnerships of the CDP project in a wide variety of formats. To overcome this heterogeneity, the aim of this WP is to address both the technical challenges of integrating these data in a centralized and unique repository and the complex security and regulatory issues related to the need for remote access by both academic and industrial partners to perform analysis thanks to a big data analytics cloud. This secured platform will provide infrastructure, analytics, interoperability, quality control, security, privacy and service layers for collecting, aggregating, storing, accessing, analysing and visualizing multiscale data according to a user-centred design. We will build this secured platform based on prior knowledge we have acquired using different databases.

MARS database tools

The MARS database is a real-life clinical cohort that was set-up in 2016, with the appropriate administrative and ethical authorizations (C.C.T.I.R.S N°15.925bis, obtained on March 23^rd 2016; CNIL Declaration in conformity with the reference methodology MR003 N° 1996650v0 obtained on October 5th 2016). All patients in the database signed written informed consent prior to being included. The rate of inclusion is more than 300 patients/month, with 8,000 patients from our CHU and 4,000 patients from other hospital centres already included into the database. This complete and secured data solution resolves the issues of acquiring health data during the process of care, data banking with raw data (~1,000 polysomnographies, DICOM data from imaging (300 stroke cerebral CTscans and MRI, 150 thoracic CTscans and 250 Echocardiographies) and to connect these data sets with IoT and connected devices to capture health trajectories of patients on OSA treatments. We are now in the process of linking this highly phenotyped database to SNDS data from national health services.

EpiMed information system, data storage and management

An information treatment/storage system has been developed by EpiMed to enable and facilitate the use of Omics data. This system mainly integrates expression and epigenomic data, either from publicly available sources or produced by our teams from individuals in various clinical or epidemiological contexts. The specificity of this system is that its main aim is to facilitate the implementation of “concept-driven” approaches to analyse the data, guided by specific scientific questions and that can be implemented in various technologies and genomic platforms,

More specifically, this information system relies on three main building blocks for the management of: i/ omics data in various formats (methylomes, transcriptomes, ChIPseq…), ii/ clinical and sample annotations, iii/ gene and genomic region annotations; with regular updates. These pipelines are accessible via the EpiMed website. In addition, specific pipelines, tools and packages developed by EpiMed are also accessible online.

EpiMed makes a financial contribution to the CIMENT/GRICAD mesocentre computing infrastructure (4 storage elements and 4 computing nodes) and sits on the users' committee, giving priority access to a part of the facility while benefiting from the pooled computing resources. EpiMed has access to its servers for hosting the EpiMed databases and information system (3 servers Winter UAR GRICAD) and data storage facilities (contribution of EpiMed to Bettick storage: ~250To sur 1.3Po; Approximately 60 To of secured storage on the summer UAR).

Exploring the Dynamics of Proteomes EDyP information system and data storage and management

The EDyP team is ISO:9001 and NFX50-900 certified. The proteomics data produced will be of several types: i) raw file output from the mass spectrometers (raw or other proprietary format), ii) processed data (.mgf, .dat, .txt, .mzIdentML, etc.), iii) restated and consolidated results (.xlsx, .pptx, .docx, etc.).

All the mass spectrometry raw data are entered into an in-house LIMS (ePims) allowing automatic organization, storage and backup. Metadata such as acquisition parameters, device type or software version are included in the raw files. The storage uses NetApp arrays with a mirroring system, managed by CEA outsourcing. The Proline software used for protein identification and quantification retains all of the metadata associated with the process. The underlying databases as well as quantitative analyses, identification results, etc. are stored on the team's servers, backed up weekly by the CEA system.