--- title: "Introduction to ciecl: Chilean ICD-10 in R" author: "Rodolfo Tasso Suazo" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to ciecl: Chilean ICD-10 in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) library(ciecl) ``` > **Version 0.9.8**: Available on CRAN with SQLite optimizations, XLSX support, and standardized English arguments. ## The problem: working with Chilean diagnostic codes in R Chilean health information systems — DEIS, GRD, REM — store diagnoses using the ICD-10 classification in its official MINSAL/DEIS v2018 version. Analysts working with these datasets in R typically have to cross-reference PDF catalogs, Excel tables, or ministry websites to look up codes, which breaks the analytical workflow. `ciecl` solves this by embedding all **39,877 ICD-10 codes** from the official catalog directly into R, with functions for fast lookup, hierarchical expansion, fuzzy search, and comorbidity scoring. ## Installation The package is available on CRAN: ```{r eval=FALSE} install.packages("ciecl") ``` To install the development version with the latest fixes: ```{r eval=FALSE} # Requires the pak package pak::pak("RodoTasso/ciecl") ``` ## Direct SQL queries against the catalog `cie10_sql()` exposes the full catalog through a local SQLite database, allowing you to filter with the full expressiveness of SQL. This is useful when you know the code structure but not the exact label — for instance, to retrieve all subcodes within a category. The example below fetches the first five type 2 diabetes codes (category E11): ```{r} cie10_sql("SELECT codigo, descripcion FROM cie10 WHERE codigo LIKE 'E11%' LIMIT 5") ``` Only `SELECT` queries are accepted, protecting catalog integrity. ## Looking up known codes When codes are already present in your data — such as hospital discharge diagnoses — `cie_lookup()` retrieves the official description for one or more codes at once. ```{r} # Single code cie_lookup("E11.0") ``` The function accepts vectors, making it straightforward to use inside a `dplyr` pipeline: ```{r} # Multiple codes from different chapters cie_lookup(c("E11.0", "I10", "Z00", "J44.0")) ``` When working at the category level (three-digit codes), you may need all subcodes within a category. The `expand = TRUE` argument traverses the hierarchy and returns the parent category together with all its children: ```{r} cie_lookup("E11", expand = TRUE) ``` ## Extracting descriptions for use in tables and plots When you only need the description text — without the full structure returned by `cie_lookup()` — `cie_describe()` returns a character vector of descriptions in the same order as the input codes. This is designed for use inside `mutate()` or as axis labels in plots: ```{r} cie_describe(c("E11.0", "I10")) ``` A typical use case with hospital discharge data: ```{r} library(dplyr) discharges <- data.frame( id = 1:4, diag_code = c("E11.0", "I10", "J44.0", "E11.0") ) discharges %>% mutate(description = cie_describe(diag_code)) ``` ## Fuzzy search with tolerance for misspellings Clinical text data often contains spelling errors, abbreviations, or non-standard terms. `cie_search()` uses Jaro-Winkler string similarity to find matching codes even when the search term contains typos. The `threshold` parameter controls strictness: higher values require closer matches. A range of 0.70 to 0.85 works well in practice: ```{r} # "diabetis" instead of "diabetes" — the typo does not prevent finding the code cie_search("diabetis with coma", threshold = 0.75) ``` ## Charlson and Elixhauser comorbidity indices The Charlson and Elixhauser indices are widely used in clinical research and risk adjustment. `cie_comorbid()` computes these indices from a data frame containing patient identifiers and their associated diagnostic codes. This function requires the `comorbidity` package. Install it with `install.packages("comorbidity")` if needed: ```{r eval=FALSE} # Requires: install.packages("comorbidity") patient_df <- data.frame( patient_id = c(1, 1, 2, 2, 3), diagnosis = c("E11.0", "I50.9", "C50.9", "N18.5", "J44.0") ) cie_comorbid(patient_df, id = "patient_id", code = "diagnosis", map = "charlson") ``` The result is a data frame with one row per patient and columns for each condition in the selected index, plus a weighted total score. ## Formatted tables with gt For reports and presentations, `cie_table()` generates an enriched HTML table of all codes within a category using the `gt` package. Requires `gt` to be installed: ```{r eval=FALSE} # Requires: install.packages("gt") cie_table("E11") ``` ## Data source The data included in `ciecl` comes from the official ICD-10 catalog published by the Chilean Ministry of Health through the Department of Health Statistics and Information (DEIS): - FIC Chile Center: - DEIS Repository: ## Further information - Report issues or suggestions: - Package repository: