EventLog

This table serves as the event log, which can be either a single-entity or multi-entity event log. Entities represent distinct existences. Terms like "case notion," "case," "object," and "dimensional" are often used interchangeably. The term "multi-entity event log" is sometimes considered equivalent to "object-centric event log" or "multi-dimensional event log." Each entity is defined with its origin and IDs. This table contains 121,577,735 records and includes several columns:

  • Event_ID: Contains the ID of each event. There are a total of 121,577,735 events.
  • Timestamp: Contains the time and date of activities.
  • Activity: Consists of the activity label of the event. There are a total of 95 distinct activities.
  • Activity_Synonym: Contains abbreviations of activity labels. For example, BGT for Blood Gas Test. There are 95 synonyms for 95 activities.
  • Activity_Attributes_ID: A unique foreign key ID for each distinct feature and value. For example:
    • po2=2951
    • lactate=3.232
    • Blood pressure=137/793
    • po2=4124 (same feature, different value)
    • lactate=0.735 (same feature, different value)
    • po2=2951 (same feature and value)
    • lactate=3.232 (same feature and value)
  • Activity_Instance_ID: A unique foreign key identifier for each distinct activity, considering its features and values. This identification allows tracking of activities throughout the event log. Examples:
    • First event: Blood Gas Test: po2=295, lactate=3.23 → 1
    • Second event: BP_measurement: Blood pressure=137/79 → 1
    • Third event: Blood Gas Test: po2=412, lactate=0.73 → 3
    • Fourth event: Blood Gas Test: po2=295, lactate=3.23 → 1
  • Entity1_origin and Entity1_ID: Represents the ID of each patient (equivalent to subject_id in MIMIC).
    • All values for Entity1_origin are Patients.
    • Entity1_ID has 282,484 distinct patient IDs.
  • Entity2_origin and Entity2_ID: Represents each admission (equivalent to hadm_id in MIMIC).
    • All values for Entity2_origin are Admission.
    • Entity2_ID has 448,709 distinct IDs and includes nulls for outpatient-only patients.
  • Entity3_origin and Entity3_ID: Represents each outpatient encounter.
    • All values for Entity3_origin are Outpatient.
    • Entity3_ID has 550,405 distinct IDs and includes nulls for admitted patients.
  • Entity4_origin and Entity4_ID: Represents admission sequences per patient.
    • All values for Entity4_origin are Admission_Sequence.
    • Entity4_ID ranges from 1 to 238, with nulls for outpatient-only patients.
  • Entity5_origin and Entity5_ID: Represents outpatient sequences per patient.
    • All values for Entity5_origin are Outpatient_Sequence.
    • Entity5_ID ranges from 1 to 114, with nulls for admitted patients.
  • temp_patient_id and temp_encounter_id: These are helper columns for subset analysis, such as after clustering.
    • Exclude these columns when generating the final CEKG event log.
    • temp_patient_id is the equivalent of Entity1_ID.
    • temp_encounter_id
      • is equivalent to Entity2_ID if Entity3_ID is null.
      • is equivalent to Character "O" + Entity3_ID if Entity2_ID is null.
    • temp_patient_id has 282,484 distinct patient IDs.
    • temp_encounter_id has 999,112 distinct patient IDs.