Title: | Statistical Models of Repeated Categorical Rating Data |
---|---|
Description: | Fit statistical models based on the Dawid-Skene model - Dawid and Skene (1979) <doi:10.2307/2346806> - to repeated categorical rating data. Full Bayesian inference for these models is supported through the Stan modelling language. 'rater' also allows the user to extract and plot key parameters of these models. |
Authors: | Jeffrey Pullin [aut, cre, cph] , Damjan Vukcevic [aut] , Lars Mølgaard Saxhaug [ctb] |
Maintainer: | Jeffrey Pullin <[email protected]> |
License: | GPL-2 |
Version: | 1.3.1.9000 |
Built: | 2024-10-26 05:00:51 UTC |
Source: | https://github.com/jeffreypullin/rater |
Fit statistical models based on the Dawid-Skene model to repeated categorical rating data. Full Bayesian inference for these models is supported through the Stan modelling language. rater also allows the user to extract and plot key parameters of these models.
Stan Development Team (2018). RStan: the R interface to Stan. R package version 2.18.2. http://mc-stan.org
The data consist of ratings, on a 4-point scale, made by five anaesthetists of patients' pre-operative health. The ratings were based on the anaesthetists assessments of a standard form completed for all of the patients. There are 45 patients (items) and five anaesthetists (raters) in total. The first anaesthetist assessed the forms a total of three times, spaced several weeks apart. The other anaesthetists each assessed the forms once. The data is in 'long' format.
anesthesia
anesthesia
A data.frame
with 315 rows and 3 columns:
The item index - which item is being rated
The rater index - which rater is doing the rating
The rating given
Dawid, A. P., and A. M. Skene. "Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm." Applied Statistics 28, no. 1 (1979): 20.
mcmc.list
object.Convert a rater_fit object to a coda mcmc.list
object.
as_mcmc.list(fit)
as_mcmc.list(fit)
fit |
A rater_fit object. |
A coda mcmc.list object.
# Fit a model using MCMC (the default). mcmc_fit <- rater(anesthesia, "dawid_skene") # Convert it to an mcmc.list rater_mcmc_list <- as_mcmc.list(mcmc_fit)
# Fit a model using MCMC (the default). mcmc_fit <- rater(anesthesia, "dawid_skene") # Convert it to an mcmc.list rater_mcmc_list <- as_mcmc.list(mcmc_fit)
It consists of binary ratings, made by 5 dentists, of whether a given tooth was healthy (sound) or had caries, also known as cavities. The ratings were performed using X-ray only, which was thought to be more error-prone than visual/tactile assessment of each tooth. In total 3,689 ratings were made. This data is in 'grouped' format. Each row is one of the 'pattern' with the final columns being a tally of how many times that pattern occurs in the dataset.
caries
caries
A data.frame
with 6 columns and 32 rows.
The rating of the dentist 1
The rating of the dentist 2
The rating of the dentist 3
The rating of the dentist 4
The rating of the dentist 5
The number of times the rating pattern appears in the dataset
Espeland, Mark A., and Stanley L. Handelman. “Using Latent Class Models to Characterize and Assess Relative Error in Discrete Measurements.” Biometrics 45, no. 2 (1989): 587–99.
Extract latent class probabilities from a rater fit object
class_probabilities(fit, ...) ## S3 method for class 'mcmc_fit' class_probabilities(fit, ...) ## S3 method for class 'optim_fit' class_probabilities(fit, ...)
class_probabilities(fit, ...) ## S3 method for class 'mcmc_fit' class_probabilities(fit, ...) ## S3 method for class 'optim_fit' class_probabilities(fit, ...)
fit |
A rater fit object. |
... |
Extra arguments. |
The latent class probabilities are obtained by marginalising out the latent class and then calculating, for each draw of pi and theta, the conditional probability of the latent class given the other parameters and the data. Averaging these conditional probabilities gives the (unconditional) latent class probabilities retuned by this function.
A I * K matrix where each element is the probably of item i being of class k. (I is the number of items and K the number of classes).
fit <- rater(anesthesia, "dawid_skene") class_probabilities(fit)
fit <- rater(anesthesia, "dawid_skene") class_probabilities(fit)
stanfit
object from a rater_fit
object.Get the underlying stanfit
object from a rater_fit
object.
get_stanfit(fit)
get_stanfit(fit)
fit |
A |
A stanfit
object from rstan.
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) stan_fit <- get_stanfit(fit) stan_fit
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) stan_fit <- get_stanfit(fit) stan_fit
Compute the PSIS LOO CV - a measure of model fit - of a rater fit object.
## S3 method for class 'rater_fit' loo(x, ..., cores = getOption("mc.cores", 1))
## S3 method for class 'rater_fit' loo(x, ..., cores = getOption("mc.cores", 1))
x |
A |
... |
Other arguments passed. |
cores |
The number of cores to use when calling the underlying
functions. By default the value of the |
This function is somewhat experimental; model comparison is always difficult and choosing between variants of the Dawid-Skene model should be largely guided by considerations of data size and what is known about the characteristics of the raters. loo is, however, one of the leading methods for Bayesian model comparison and should provide a helpful guide in many situations.
When calculating loo we always use the relative effective
sample size, calculated using loo::relaive_eff
to improve the estimates
of the PSIS effective sample sizes and Monte Carlo error.
For further information about the details of loo and PSIS please consult the provided references.
A loo object.
Vehtari, A., Gelman, A., and Gabry, J. (2017a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. preprint arXiv:1507.02646
fit_ds <- rater(anesthesia, "dawid_skene", verbose = FALSE, chains = 1) fit_ccds <- rater(anesthesia, "class_conditional_dawid_skene", verbose = FALSE, chains = 1) loo_ds <- loo(fit_ds) loo_ccds <- loo(fit_ccds) # To compare the loos easily we can use the loo_compare function from the # loo package: library(loo) loo_compare(loo_ds, loo_ccds) # The documentation of the loo package contains more information about how # the output should be interpreted.
fit_ds <- rater(anesthesia, "dawid_skene", verbose = FALSE, chains = 1) fit_ccds <- rater(anesthesia, "class_conditional_dawid_skene", verbose = FALSE, chains = 1) loo_ds <- loo(fit_ds) loo_ccds <- loo(fit_ccds) # To compare the loos easily we can use the loo_compare function from the # loo package: library(loo) loo_compare(loo_ds, loo_ccds) # The documentation of the loo package contains more information about how # the output should be interpreted.
Produce simulation data from a 'complete' rating design
make_complete_rating_design_sim_data(I, J, N)
make_complete_rating_design_sim_data(I, J, N)
I |
The number of items. |
J |
The number of raters. |
N |
The number of times each rater rates each item. |
A 'complete' rating design is situation where every rater rates
each item the same number of times. In this function the number of times
each rater rates each item is N
.
Simulation data in the format required by
simulate_dawid_skene_model()
or simulate_hier_dawid_skene_model()
.
make_complete_rating_design_sim_data(100, 5, 2)
make_complete_rating_design_sim_data(100, 5, 2)
Make a theta parameter
make_theta(diag_values, J, K)
make_theta(diag_values, J, K)
diag_values |
The diagonal entries of each error matrix. |
J |
The number of raters (The umber matrices in 3D array). |
K |
The number of latent classes. |
The diag_values
argument can either be a numeric vector of length
1 or J. If it is length J, the jth element is the diagonal values of the
error matrix for the jth rater. If it is length 1 all raters have the same
diagonal values.
A c(J, K, K) array; the theta parameter
theta <- make_theta(0.7, 5, 4) theta[1, , ]
theta <- make_theta(0.7, 5, 4) theta[1, , ]
Retrieve MCMC convergence diagnostics for a rater fit
mcmc_diagnostics(fit, pars = c("pi", "theta"))
mcmc_diagnostics(fit, pars = c("pi", "theta"))
fit |
An rater |
pars |
A character vector of parameter names to return. By default
|
MCMC diagnostics cannot be calculate for the z due to the marginalisation used to fit the models.
These MCMC diagnostics are intended as basic sanity check of the quality
of the MCMC samples returned. Users who want more in depth diagnostics
should consider using as_mcmc.list()
to convert the samples to a
coda::mcmc.list()
object, or get_stanfit()
to extract the underlying
stanfit object.
A matrix where the columns represent different diagnostics and the rows are different parameters. Currently the first column contains the Rhat statistic and the second bulk effective samples size. The rownames contain the parameter names.
Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, and
Paul-Christian Bürkner (2019). Rank-normalization, folding, and
localization: An improved R-hat for assessing convergence of
MCMC. arXiv preprint arXiv:1903.08008
.
rstan::Rhat()
, rstan::ess_bulk()
as_mcmc.list()
,
get_stanfit()
.
fit <- rater(anesthesia, "dawid_skene") # Calculate the diagnostics for all parameters. mcmc_diagnostics(fit) # Calculate the diagnostics just for the pi parameter. mcmc_diagnostics(fit, pars = "pi")
fit <- rater(anesthesia, "dawid_skene") # Calculate the diagnostics for all parameters. mcmc_diagnostics(fit) # Calculate the diagnostics just for the pi parameter. mcmc_diagnostics(fit, pars = "pi")
Functions to set up models and change their prior
parameters for use in rater()
.
dawid_skene(alpha = NULL, beta = NULL) hier_dawid_skene(alpha = NULL) class_conditional_dawid_skene(alpha = NULL, beta_1 = NULL, beta_2 = NULL)
dawid_skene(alpha = NULL, beta = NULL) hier_dawid_skene(alpha = NULL) class_conditional_dawid_skene(alpha = NULL, beta_1 = NULL, beta_2 = NULL)
alpha |
prior parameter for pi |
beta |
prior parameter for theta. This can either be a K * K matrix, in which case it is interpreted as the prior parameter of all of the J raters, or a J by K by K array in which case it is the fully specified prior parameter for all raters. (Here K is the number of categories in the data and J is the number of raters in the data.) |
beta_1 |
First on diagonal prior probability parameter |
beta_2 |
Second on diagonal prior probability parameter for theta |
a rater model object that can be passed to rater()
.
# Model with default prior parameters: default_m <- dawid_skene() # Changing alpha: set_alpha_m <- dawid_skene(alpha = c(2, 2, 2)) # Changing beta, single matrix: # (See details for how this is interpreted.) beta_mat <- matrix(1, nrow = 4, ncol = 4) diag(beta_mat) <- 4 beta_mat_m <- dawid_skene() # The above is equivalent (when the model is fit - see details) to: beta_array <- array(NA, dim = c(2, 4, 4)) for (i in 1:2) { beta_array[i, , ] <- beta_mat } beta_array_m <- dawid_skene(beta = beta_array) # But you can also specify an array where each slice is different. # (Again, see details for how this is interpreted.) beta_array[1, , ] <- matrix(1, nrow = 4, ncol = 4) beta_array_m <- dawid_skene(beta = beta_array) # Default: hier_dawid_skene() # Changing alpha hier_dawid_skene(alpha = c(2, 2)) # Default: class_conditional_dawid_skene() # Not default: class_conditional_dawid_skene( alpha = c(2, 2), beta_1 = c(4, 4), beta_2 = c(2, 2) )
# Model with default prior parameters: default_m <- dawid_skene() # Changing alpha: set_alpha_m <- dawid_skene(alpha = c(2, 2, 2)) # Changing beta, single matrix: # (See details for how this is interpreted.) beta_mat <- matrix(1, nrow = 4, ncol = 4) diag(beta_mat) <- 4 beta_mat_m <- dawid_skene() # The above is equivalent (when the model is fit - see details) to: beta_array <- array(NA, dim = c(2, 4, 4)) for (i in 1:2) { beta_array[i, , ] <- beta_mat } beta_array_m <- dawid_skene(beta = beta_array) # But you can also specify an array where each slice is different. # (Again, see details for how this is interpreted.) beta_array[1, , ] <- matrix(1, nrow = 4, ncol = 4) beta_array_m <- dawid_skene(beta = beta_array) # Default: hier_dawid_skene() # Changing alpha hier_dawid_skene(alpha = c(2, 2)) # Default: class_conditional_dawid_skene() # Not default: class_conditional_dawid_skene( alpha = c(2, 2), beta_1 = c(4, 4), beta_2 = c(2, 2) )
rater_fit
objectPlot a rater_fit
object
## S3 method for class 'rater_fit' plot( x, pars = "theta", prob = 0.9, rater_index = NULL, item_index = NULL, theta_plot_type = "matrix", ... )
## S3 method for class 'rater_fit' plot( x, pars = "theta", prob = 0.9, rater_index = NULL, item_index = NULL, theta_plot_type = "matrix", ... )
x |
An object of class |
pars |
A length one character vector specifying the parameter to plot.
By default |
prob |
The coverage of the credible intervals shown in the |
rater_index |
The indexes of the raters shown in the |
item_index |
The indexes of the items shown in the class probabilities
plot. If not plotting the class probabilities this argument will be
ignored. By default |
theta_plot_type |
The type of plot of the "theta" parameter. Can be
either |
... |
Other arguments. |
The use of pars
to refer to only one parameter is for backwards
compatibility and consistency with the rest of the interface.
A ggplot2 object.
fit <- rater(anesthesia, "dawid_skene") # By default will just plot the theta plot plot(fit) # Select which parameter to plot. plot(fit, pars = "pi") # Plot the theta parameter for rater 1, showing uncertainty. plot(fit, pars = "theta", theta_plot_type = "points", rater_index = 1)
fit <- rater(anesthesia, "dawid_skene") # By default will just plot the theta plot plot(fit) # Select which parameter to plot. plot(fit, pars = "pi") # Plot the theta parameter for rater 1, showing uncertainty. plot(fit, pars = "theta", theta_plot_type = "points", rater_index = 1)
Extract point estimates of parameters from a fit object
point_estimate(fit, pars = c("pi", "theta", "z"), ...)
point_estimate(fit, pars = c("pi", "theta", "z"), ...)
fit |
A rater fit object |
pars |
A character vector of parameter names to return. By default
|
... |
Extra arguments |
If the passed fit object was fit using MCMC then the posterior
means are returned. If it was fit through optimisation the maximum a
priori (MAP) estimates are returned. The z parameter returned is the
value of class probabilities which is largest. To return the full
posterior distributions of the latent class use class_probabilities()
.
For the class conditional model the 'full' theta parameterisation (i.e. appearing to have the same number of parameters as the standard Dawid-Skene model) is calculated and returned. This is designed to allow easier comparison with the full Dawid-Skene model.
A named list of the parameter estimates.
class_probabilities()
# A model fit using MCMC. mcmc_fit <- rater(anesthesia, "dawid_skene") # This will return the posterior mean (except for z) post_mean_estimate <- point_estimate(mcmc_fit) # A model fit using optimisation. optim_fit <- rater(anesthesia, dawid_skene(), method = "optim") # This will output MAP estimates of the parameters. map_estimate <- point_estimate(optim_fit)
# A model fit using MCMC. mcmc_fit <- rater(anesthesia, "dawid_skene") # This will return the posterior mean (except for z) post_mean_estimate <- point_estimate(mcmc_fit) # A model fit using optimisation. optim_fit <- rater(anesthesia, dawid_skene(), method = "optim") # This will output MAP estimates of the parameters. map_estimate <- point_estimate(optim_fit)
Extract posterior intervals for parameters of the model
## S3 method for class 'mcmc_fit' posterior_interval(object, prob = 0.9, pars = c("pi", "theta"), ...)
## S3 method for class 'mcmc_fit' posterior_interval(object, prob = 0.9, pars = c("pi", "theta"), ...)
object |
A rater |
prob |
A single probability. The size of the credible interval
returned. By default |
pars |
The parameters to calculate the intervals for |
... |
Other arguments. |
Posterior intervals can only be calculated for models fit with MCMC. In addition, posterior intervals are not meaningful for the latent class (and indeed cannot be calculated). The full posterior distribution of the latent class can be extracted using class_probabilities
For the class conditional model the 'full' theta parameterisation (i.e. appearing to have the same number of parameters as the standard Dawid-Skene model) is calculated and returned. This is designed to allow easier comparison with the full Dawid-Skene model.
A matrix with 2 columns. The first column is the lower bound of of the credible interval and the second is the upper bound. Each row corresponds to one individuals parameters. The rownames are the parameter names.
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE, chains = 1) intervals <- posterior_interval(fit) head(intervals)
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE, chains = 1) intervals <- posterior_interval(fit) head(intervals)
Extract posterior intervals for parameters of the model
## S3 method for class 'optim_fit' posterior_interval(object, prob = 0.9, pars = c("pi", "theta"), ...)
## S3 method for class 'optim_fit' posterior_interval(object, prob = 0.9, pars = c("pi", "theta"), ...)
object |
A rater optim_fit object |
prob |
A probability |
pars |
The parameters to calculate the intervals for |
... |
Other arguments |
Draw from the posterior predictive distribution
## S3 method for class 'rater_fit' posterior_predict(object, new_data, seed = NULL, ...)
## S3 method for class 'rater_fit' posterior_predict(object, new_data, seed = NULL, ...)
object |
A |
new_data |
New data for the model to be fit to. The must be in the form
used in |
seed |
An optional random seed to use. |
... |
Other arguments. |
The number of raters implied by the entries in the rater column must match the number of raters in the fitted model.
The passed new_data
augmented with a column 'z' containing the
latent class of each item and 'rating' containing the simulated rating.
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) new_data <- data.frame(item = rep(1:2, each = 5), rater = rep(1:5, 2)) predictions <- posterior_predict(fit, new_data) predictions
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) new_data <- data.frame(item = rep(1:2, each = 5), rater = rep(1:5, 2)) predictions <- posterior_predict(fit, new_data) predictions
Extract posterior samples from a rater fit object
posterior_samples(fit, pars = c("pi", "theta"))
posterior_samples(fit, pars = c("pi", "theta"))
fit |
A rater fit object. |
pars |
A character vector of parameter names to return. By default
|
Posterior samples can only be returned for models fitting using MCMC not optimisation. In addition, posterior samples cannot be returned for the latent class due to the marginalisation technique used internally.
For the class conditional model the 'full' theta parameterisation (i.e. appearing to have the same number of parameters as the standard Dawid-Skene model) is calculated and returned. This is designed to allow easier comparison with the full Dawid-Skene model.
A named list of the posterior samples for each parameters. For each
parameter the samples are in the form returned by rstan::extract()
.
fit <- rater(anesthesia, "dawid_skene") samples <- posterior_samples(fit) # Look at first 6 samples for each of the pi parameters head(samples$pi) # Look at the first 6 samples for the theta[1, 1, 1] parameter head(samples$theta[, 1, 1, 1]) # Only get the samples for the pi parameter: pi_samples <- posterior_samples(fit, pars = "pi")
fit <- rater(anesthesia, "dawid_skene") samples <- posterior_samples(fit) # Look at first 6 samples for each of the pi parameters head(samples$pi) # Look at the first 6 samples for the theta[1, 1, 1] parameter head(samples$theta[, 1, 1, 1]) # Only get the samples for the pi parameter: pi_samples <- posterior_samples(fit, pars = "pi")
mcmc_fit
objectPrint a mcmc_fit
object
## S3 method for class 'mcmc_fit' print(x, ...)
## S3 method for class 'mcmc_fit' print(x, ...)
x |
An object of class |
... |
Other arguments. |
# Suppress sampling output. mcmc_fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) print(mcmc_fit)
# Suppress sampling output. mcmc_fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) print(mcmc_fit)
optim_fit
objectPrint a optim_fit
object
## S3 method for class 'optim_fit' print(x, ...)
## S3 method for class 'optim_fit' print(x, ...)
x |
An object of class |
... |
Other arguments. |
optim_fit <- rater(anesthesia, "dawid_skene", method = "optim") print(optim_fit)
optim_fit <- rater(anesthesia, "dawid_skene", method = "optim") print(optim_fit)
rater_model
object.Print a rater_model
object.
## S3 method for class 'rater_model' print(x, ...)
## S3 method for class 'rater_model' print(x, ...)
x |
A |
... |
Other arguments |
mod <- dawid_skene() print(mod)
mod <- dawid_skene() print(mod)
rater_fit
object.Provide a summary of the priors specified in a rater_fit
object.
## S3 method for class 'rater_fit' prior_summary(object, ...)
## S3 method for class 'rater_fit' prior_summary(object, ...)
object |
A |
... |
Other arguments. |
# Fit a model using MCMC (the default). fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) # Summarise the priors (and model) specified in the fit. prior_summary(fit)
# Fit a model using MCMC (the default). fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) # Summarise the priors (and model) specified in the fit. prior_summary(fit)
This functions allows the user to fit statistical models of noisy categorical rating, based on the Dawid-Skene model, using Bayesian inference. A variety of data formats and models are supported. Inference is done using Stan, allowing models to be fit efficiently, using both optimisation and Markov Chain Monte Carlo (MCMC).
rater( data, model, method = "mcmc", data_format = "long", long_data_colnames = c(item = "item", rater = "rater", rating = "rating"), inits = NULL, verbose = TRUE, ... )
rater( data, model, method = "mcmc", data_format = "long", long_data_colnames = c(item = "item", rater = "rater", rating = "rating"), inits = NULL, verbose = TRUE, ... )
data |
A 2D data object: data.frame, matrix, tibble etc. with data in either long or grouped format. |
model |
Model to fit to data - must be rater_model or a character string - the name of the model. If the character string is used, the prior parameters will be set to their default values. |
method |
A length 1 character vector, either |
data_format |
A length 1 character vector, |
long_data_colnames |
A 3-element named character vector that specifies
the names of the three required columns in the long data format. The vector
must have the required names:
* item: the name of the column containing the item indexes,
* rater: the name of the column containing the rater indexes,
* rating: the name of the column containing the ratings.
By default, the names of the columns are the same as the names of the
vector: |
inits |
The initialization points of the fitting algorithm |
verbose |
Should |
... |
Extra parameters which are passed to the Stan fitting interface. |
The default MCMC algorithm used by Stan is No U Turn Sampling (NUTS) and the default optimisation method is LGFGS. For MCMC 4 chains are run be default with 2000 iterations in total each.
An object of class rater_fit containing the fitted parameters.
rstan::sampling()
, rstan::optimizing()
# Fit a model using MCMC (the default). mcmc_fit <- rater(anesthesia, "dawid_skene") # Fit a model using optimisation. optim_fit <- rater(anesthesia, dawid_skene(), method = "optim") # Fit a model using passing data grouped data. grouped_fit <- rater(caries, dawid_skene(), data_format = "grouped")
# Fit a model using MCMC (the default). mcmc_fit <- rater(anesthesia, "dawid_skene") # Fit a model using optimisation. optim_fit <- rater(anesthesia, dawid_skene(), method = "optim") # Fit a model using passing data grouped data. grouped_fit <- rater(caries, dawid_skene(), data_format = "grouped")
Simulate data from the Dawid-Skene model
simulate_dawid_skene_model(pi, theta, sim_data, seed = NULL)
simulate_dawid_skene_model(pi, theta, sim_data, seed = NULL)
pi |
The pi parameter of the Dawid-Skene model. |
theta |
The theta parameter of the Dawid-Skene model. |
sim_data |
Data to guide the simulation. The data must be in the long
data format used in
|
seed |
An optional random seed to use. |
The number of raters implied by the entries in the rater column must match the number of raters implied by the passed theta parameter.
This function can also be used to simulate from the class-conditional Dawid-Skene model by specifying theta in the required form (i.e where all off-diagonal entries of the error matrices are equal.)
The passed sim_data
augmented with columns:
"z"
containing the latent class of each item,
"rating"
containing the simulated ratings.
J <- 5 K <- 4 pi <- rep(1 / K, K) theta <- make_theta(0.7, J, K) sim_data <- data.frame(item = rep(1:2, each = 5), rater = rep(1:5, 2)) simulations <- simulate_dawid_skene_model(pi, theta, sim_data) simulations
J <- 5 K <- 4 pi <- rep(1 / K, K) theta <- make_theta(0.7, J, K) sim_data <- data.frame(item = rep(1:2, each = 5), rater = rep(1:5, 2)) simulations <- simulate_dawid_skene_model(pi, theta, sim_data) simulations
Simulate data from the hierarchical Dawid-Skene model
simulate_hier_dawid_skene_model(pi, mu, sigma, sim_data, seed = NULL)
simulate_hier_dawid_skene_model(pi, mu, sigma, sim_data, seed = NULL)
pi |
The pi parameter of the hierarchical Dawid-Skene model. |
mu |
The mu parameter of the hierarchical Dawid-Skene model. |
sigma |
The sigma parameter of the hierarchical Dawid-Skene model. |
sim_data |
Data to guide the simulation. The data must be in the long
data format used in
|
seed |
An optional random seed to use. |
The number of raters implied by the entries in the rater column must match the number of raters implied by the passed theta parameter.
The passed sim_data
augmented with columns:
"z"
containing the latent class of each item,
"rating"
containing the simulated rating.
J <- 5 K <- 4 pi <- rep(1 / K, K) mu <- matrix(0, nrow = K, ncol = K) diag(mu) <- 5 sigma <- matrix(sqrt(2) / sqrt(pi), nrow = K, ncol = K) sim_data <- data.frame(item = rep(1:2, each = 5), rater = rep(1:5, 2)) sim_result <- simulate_hier_dawid_skene_model(pi, mu, sigma, sim_data) sim_result$sim sim_result$theta
J <- 5 K <- 4 pi <- rep(1 / K, K) mu <- matrix(0, nrow = K, ncol = K) diag(mu) <- 5 sigma <- matrix(sqrt(2) / sqrt(pi), nrow = K, ncol = K) sim_data <- data.frame(item = rep(1:2, each = 5), rater = rep(1:5, 2)) sim_result <- simulate_hier_dawid_skene_model(pi, mu, sigma, sim_data) sim_result$sim sim_result$theta
mcmc_fit
objectSummarise a mcmc_fit
object
## S3 method for class 'mcmc_fit' summary(object, n_pars = 8, ...)
## S3 method for class 'mcmc_fit' summary(object, n_pars = 8, ...)
object |
An object of class |
n_pars |
The number of pi/theta parameters and z 'items' to display. |
... |
Other arguments passed to function. |
For the class conditional model the 'full' theta parameterisation (i.e. appearing to have the same number of parameters as the standard Dawid-Skene model) is calculated and returned. This is designed to allow easier comparison with the full Dawid-Skene model.
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) summary(fit)
fit <- rater(anesthesia, "dawid_skene", verbose = FALSE) summary(fit)
optim_fit
objectSummarise an optim_fit
object
## S3 method for class 'optim_fit' summary(object, n_pars = 8, ...)
## S3 method for class 'optim_fit' summary(object, n_pars = 8, ...)
object |
An object of class |
n_pars |
The number of pi/theta parameters and z 'items' to display. |
... |
Other arguments passed to function. |
For the class conditional model the 'full' theta parameterisation (i.e. appearing to have the same number of parameters as the standard Dawid-Skene model) is calculated and returned. This is designed to allow easier comparison with the full Dawid-Skene model.
fit <- rater(anesthesia, "dawid_skene", method = "optim") summary(fit)
fit <- rater(anesthesia, "dawid_skene", method = "optim") summary(fit)
rater_model
.Summarise a rater_model
.
## S3 method for class 'rater_model' summary(object, ...)
## S3 method for class 'rater_model' summary(object, ...)
object |
A |
... |
Other arguments. |
mod <- dawid_skene() summary(mod)
mod <- dawid_skene() summary(mod)
Compute the WAIC - a measure of model fit - of a rater fit object.
## S3 method for class 'rater_fit' waic(x, ...)
## S3 method for class 'rater_fit' waic(x, ...)
x |
A |
... |
Other arguments passed. |
This function provides provides an additional method for model
comparison, on top of the loo()
function. In general we recommend that
loo()
is preferred: see the documentation of the loo package for details.
Also, note the comments regarding model selection the the details section
of loo()
.
A waic/loo object.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571-3594.
Vehtari, A., Gelman, A., and Gabry, J. (2017a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
fit_ds <- rater(anesthesia, "dawid_skene", verbose = FALSE, chains = 1) fit_ccds <- rater(anesthesia, "class_conditional_dawid_skene", verbose = FALSE, chains = 1) waic(fit_ds) waic(fit_ccds)
fit_ds <- rater(anesthesia, "dawid_skene", verbose = FALSE, chains = 1) fit_ccds <- rater(anesthesia, "class_conditional_dawid_skene", verbose = FALSE, chains = 1) waic(fit_ds) waic(fit_ccds)
Convert wide data to the long format
wide_to_long(data)
wide_to_long(data)
data |
Data in a wide format. Must be 2D data object which can be converted to a data.frame |
Wide data refers to a way of laying out categorical rating data
where each item is one row and each column represents the ratings of each
rater. Elements of the data can be NA
, indicating that an item wasn't
rated by a rater. Wide data cannot represent the same rater rating an item
multiple times.
Currently any column names of the data are ignored and the raters are labelled by their column position (1 indexed, left to right). Only numeric ratings are currently supported.
The data converted into long format. A data.frame with three columns item, rater and rating.
wide_data <- data.frame(dater_1 = c(3, 2, 2), rater_2 = c(4, 2, 2)) wide_data long_data <- wide_to_long(wide_data) long_data
wide_data <- data.frame(dater_1 = c(3, 2, 2), rater_2 = c(4, 2, 2)) wide_data long_data <- wide_to_long(wide_data) long_data