The FIMMG_pred dataset is a subset of the FIMMG dataset, because the following exclusion/exclusion criteria have been applicated:
i) exclusion of all ICD-9 diabetic patients;
ii) inclusion of only demographic, monitoring and laboratory exam EHR fields;
iii) inclusion of patients with at least three measurements of triglycerides (TG; mg/dl) and fasting glycemia (Gb; mg/dl) collected simultaneously;
iv) inclusion of patients with a temporal distance between the two last TG and Gb measurements greater than 12 months;
v) inclusion of EHR features that contain an overall amount of missing values less than 90%.
The FIMMG_pred dataset contains a total of 256 patients and 49 EHR features.
The number of different EHR features for each main field is enclosed in square brackets:
- Demographic (Gender, Age) [2]
- Monitoring (Systolic and diastolic blood pressure) [2]
- Clinical (Laboratory exams) [45]
To request the FIMMG_pred dataset:
- Send an email to: vrai@dii.univpm.it (Note: you should send the email from an email address that is linked to your research institution/university)
- You will be sent a form to fill out and after that, a link for the download
Please cite our work using the following bib:
@article{bernardini2020discovering, title={Early temporal prediction of Type 2 Diabetes Risk Condition from a General Practitioner Electronic Health Record: A Multiple Instance Boosting Approach}, author={Bernardini, Michele and Morettini, Micaela and Romeo, Luca and Frontoni, Emanuele and Burattini, Laura}, journal={Artificial Intelligence in Medicine}, year={2202}, publisher={Elsevier} }
The code to replicate all the experiments is publicy available to the following link.