---
title: "Case Study: Hospital Discharge Analysis (DEIS Chile)"
author: "Rodolfo Tasso Suazo"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Case Study: Hospital Discharge Analysis (DEIS Chile)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ciecl)
library(dplyr)
```

## Introduction

This case study describes the technical workflow required to transform administrative health databases in Chile into a standardized analytical format. We use the **Hospital Discharge** databases published by the Department of Health Statistics and Information (DEIS) as a reference.

Health administrative records frequently present structural inconsistencies in ICD-10 coding. The two most common variations in the Chilean context are:

1.  **Compact formats**: Codes without a decimal point (e.g., `J189` instead of `J18.9`).
2.  **Filler suffixes**: Use of the letter `X` to complete the field length in 3-digit categories (e.g., `I10X` for Essential hypertension).

These variations hinder interoperability and cross-referencing with international standards. The `ciecl` package automates the correction of these inconsistencies in a vectorized and efficient manner.

## 1. Original Data Structure

A synthetic dataset is generated below that replicates the structure and typical anomalies found in DEIS `.csv` files. The `DIAG1` column represents the primary diagnosis with the aforementioned informal coding formats.

```{r datos}
set.seed(42)

# Simulation of 200 records with typical DEIS Chile formats
discharges <- data.frame(
  DISCHARGE_ID = 1:200,
  PATIENT_ID   = sample(1:50, 200, replace = TRUE),
  YEAR         = sample(2018:2022, 200, replace = TRUE),
  DIAG1        = sample(
    c(
      "J189", "O800", "Z380", "K359", "N390",
      "I10X", "J449", "E119", "O829", "J069",
      "K922", "N185", "I509", "C509", "A099",
      "N40X", "K800", "I259", "J180", "E149"
    ),
    size    = 200,
    replace = TRUE
  ),
  stringsAsFactors = FALSE
)

head(discharges)
```

## 2. Code Normalization with `cie_norm()`

Normalization is the critical first step to ensure analysis integrity. The `cie_norm()` function processes codes by applying official MINSAL coding rules:

*   **Suffix removal**: Identifies and removes the trailing `X`.
*   **Punctuation formatting**: Inserts the decimal point in the standard position according to the ICD-10 hierarchy.
*   **String cleaning**: Removes whitespace and non-printable characters.

```{r normalizacion}
# Cleaning and standardization of diagnoses
discharges <- discharges %>%
  mutate(
    DIAG1_NORM = cie_norm(codes = DIAG1)
  )

# Comparison between original and normalized formats
discharges %>% 
  select(DIAG1, DIAG1_NORM) %>% 
  distinct() %>% 
  head(5)
```

## 3. Semantic Enrichment with `cie_describe()`

After normalization, standardized clinical descriptions are assigned to facilitate result interpretation. The `ciecl` package provides the vectorized `cie_describe()` function, specifically designed to integrate into `dplyr` workflows efficiently.

Unlike a traditional `left_join`, `cie_describe()` directly returns a character vector, avoiding the creation of additional join columns and keeping the code cleaner.

```{r describe}
# Direct integration of descriptions into the main dataframe
discharges_full <- discharges %>%
  mutate(
    description = cie_describe(DIAG1_NORM)
  )

head(discharges_full %>% select(DISCHARGE_ID, DIAG1, description))
```

For cases where full metadata is required (chapter, category, group), `cie_lookup()` can still be used in conjunction with a table join:

```{r lookup}
# Extracting full metadata via lookup + join
metadata <- cie_lookup(
  code = unique(discharges$DIAG1_NORM),
  full_description = TRUE
)

discharges_metadata <- discharges %>%
  left_join(metadata, by = c("DIAG1_NORM" = "codigo"))
```

## 4. Catalog Exploration with `cie_search()`

In exploratory phases, where the exact code is unknown or transcription errors in the original clinical descriptions are suspected, `cie_search()` allows text searches using string similarity (fuzzy matching).

```{r busqueda}
# Example of search with intentional typo ("diabetis")
# The function returns the most likely matches ordered by score
cie_search(text = "diabetis", threshold = 0.7)
```

## 5. Comorbidity Index Calculation with `cie_comorbid()`

An advanced application of `ciecl` is population risk stratification through comorbidity indices. The `cie_comorbid()` function maps normalized diagnoses to Charlson or Elixhauser categories, adapted to the reality of Chilean data.

```{r comorbilidad, eval=FALSE}
# Requires the 'comorbidity' package to be installed
# Calculation of the Charlson Index by patient identifier
comorbidities <- cie_comorbid(
  data = discharges,
  id   = "PATIENT_ID",
  code = "DIAG1",
  map  = "charlson"
)

# The result allows immediate use in statistical models
head(comorbidities, 10)
```

## Process Summary

The `ciecl` workflow enables a reproducible transition from raw administrative data to an analytical dataset in four stages:

1.  **Standardization**: Format correction using `cie_norm()`.
2.  **Contextualization**: Assignment of official glosses with `cie_lookup()`.
3.  **Validation**: Term discovery and checking with `cie_search()`.
4.  **Aggregation**: Generation of complex clinical indicators with `cie_comorbid()`.

---

**Data source:**
This tool uses the ICD-10 catalog standardized by the Centro FIC of the DEIS, Ministry of Health of Chile. For more information, visit [deis.minsal.cl](https://deis.minsal.cl).