--- title: "Workflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Workflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( comment = "#>" ) ``` ## What is rater? The {rater} package is designed to allow easy fitting and analysis of Bayesian models of categorical data annotation using [Stan](https://mc-stan.org/). Here we demonstrate the basic workflow for using the package. ## Data We will use the *anesthesia* data set taken from the paper *Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm* by A. P. Dawid and A. M. Skene, the paper which introduced the original Dawid-Skene model the type of models used This dataset is included in **rater**. We can prepare the package and data with: ```{r package-load} # Load the rater package. library(rater) # Access the 'anesthesia' data set. data("anesthesia") ``` The data comes in the form of a `data.frame` with three columns `item`, `rater` and `rating`. In the nomenclature of the package we would describe this as *long* data. *Long* data is the standard data format for passing data to inference functions in {rater}. The `item` column is the index of each item, the `rater` column is the index of the rater and `rating` is the actual rating. For example the twentieth row of the dataset: ```{r one-row} anesthesia[20, ] ``` means that item 3 was rated as being in category 2 by the fourth rater. {rater} also allows the use of *grouped* data for fitting some of the models but that feature is not covered in this vignette. ## Inference The core function of the {rater} package is the `rater()` function which fits a specified categorical rating to model to given data. This function has two arguments: `data`, data in an appropriate format and `model`, a character string or functions specifying the model you would like to fit. By default `rater()` will fit the model using MCMC (specifically NUTS) provided by Stan. To fit the basic Dawid-Skene model[^1] to the anesthesia data we can run. ```{r, inference, warnings = FALSE, message = FALSE} fit <- rater(anesthesia, "dawid_skene", chains = 1, verbose = FALSE) ``` Note that here we have set `verbose = FALSE` to suppress the normal Stan sampling output. We have also specified that we should use only 1 chain, simply to speed up the creation of the vignette. Other fitting parameters can be passed directly to the underlying Stan functions through the `...` in `rater()`. We can also compute MAP estimates by specifying `method = "optim"`in `rater()`: ```{r optim-fit} optim_fit <- rater(anesthesia, "dawid_skene", method = "optim") ``` ## Plotting Having fit the Dawid and Skene model to the data we can now plot parameter estimates from the model. To plot the population prevalence estimates (the parameter $\pi$ in the model) we run: ```{r plot-pi, fig.width = 6, fig.height = 4} plot(fit, pars = "pi") ``` To plot the rater's error's matrices of the (the parameter $\theta$ in the model) we run: ```{r plot-theta, fig.width = 6, fig.height = 4} plot(fit, pars = "theta") ``` To plot the latent class estimates we run: ```{r plot-z, fig.width = 6, fig.height = 8} plot(fit, pars = "latent_class") ``` ## Point estimates In additions we can extract point estimates for all the parameters. These can be extracted using the `point_estimates()` function. Different parameters can be extracted using the `pars` argument i.e. ```{r point-estimates} # Extract all parameters. all_parameters <- point_estimate(fit) # Extract only the 'pi' parameter. point_estimate(fit, pars = "pi") ``` Note that the interpretation of the point estimates returned will differ depending on whether the model has been fit using MCMC or optimisation. ## Other functions {rater} also supports a variety of other functions to extract useful quantities from fit objects which are listed below: * `posterior_samples()` * `posterior_interval()` * `class_probabilities()` * `mcmc_diagnostics()` Hopefully the uses of these functions are fairly self explanatory. [^1]: {rater} also supports the 'class conditional' and 'hierarchical' Dawid-Skene models as well as setting (some of) the prior parameters in all three models.