FIMMG_pred Dataset

The FIMMG_pred dataset is a subset of the FIMMG dataset, because the following exclusion/exclusion criteria have been applicated:

i) exclusion of all ICD-9 diabetic patients;

ii) inclusion of only demographic, monitoring and laboratory exam EHR fields;

iii) inclusion of patients with at least three measurements of triglycerides (TG; mg/dl) and fasting glycemia (Gb; mg/dl) collected simultaneously;

iv) inclusion of patients with a temporal distance between the two last TG and Gb measurements greater than 12 months; 

v) inclusion of EHR features that contain an overall amount of missing values less than 90%.

The FIMMG_pred dataset contains a total of 256 patients and 49 EHR features.

The number of different EHR features for each main field is enclosed in square brackets:

  • Demographic (Gender, Age) [2]
  • Monitoring (Systolic and diastolic blood pressure) [2]
  • Clinical (Laboratory exams) [45]


To request the FIMMG_pred dataset:

  • Send an email to: (Note: you should send the email from an email address that is linked to your research institution/university)
  • You will be sent a form to fill out and after that, a link for the download

Please cite our work using the following bib:

  title={Early temporal prediction of Type 2 Diabetes Risk Condition from a General Practitioner Electronic Health Record: A Multiple Instance Boosting Approach},
  author={Bernardini, Michele and Morettini, Micaela and Romeo, Luca and Frontoni, Emanuele and Burattini, Laura},
  journal={Artificial Intelligence in Medicine},

 The code to replicate all the experiments is publicy available to the following link.