Title: | Association Analysis for Categorical Variables |
---|---|
Description: | Association analysis between categorical variables using the Goodman and Kruskal tau measure. This asymmetric association measure allows the detection of asymmetric relations between categorical variables (e.g., one variable obtained by re-grouping another). |
Authors: | Ron Pearson [aut, cre] |
Maintainer: | Ron Pearson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.3 |
Built: | 2025-01-28 03:09:20 UTC |
Source: | https://github.com/cran/GoodmanKruskal |
GKtau
returns forward and backward Goodman and Kruskal tau measures
between categorical variables.
GKtau(x, y, dgts = 3, includeNA = "ifany")
GKtau(x, y, dgts = 3, includeNA = "ifany")
x |
A categorical vector (factor). |
y |
A categorical vector (factor). |
dgts |
Integer, number of digits for results; optional (default = 3). |
includeNA |
Character, passed to useNA parameter for table; default is "ifany"; other valid options are "no" and "always" |
The Goodman and Kruskal tau measure is an asymmetric association measure between two categorical variables, based on the extent to which variation in one variable can be explained by the other. This function returns a dataframe with both forward and backward associations.
A one-row dataframe with the following columns:
the names of the x and y variables,
the numbers of distinct values Nx and Ny for each variable, and
the forward and backward associations, tau(x,y) and tau(y,x).
Ron Pearson
x <- rep(c("a", "b", "c", "d"), each = 3) y <- rep(c("a", "b", "c", "d"), times = 3) z <- rep(c("a", "b", "a", "c"), each = 3) GKtau(x, y) GKtau(x, z) GKtau(y, z)
x <- rep(c("a", "b", "c", "d"), each = 3) y <- rep(c("a", "b", "c", "d"), times = 3) z <- rep(c("a", "b", "a", "c"), each = 3) GKtau(x, y) GKtau(x, z) GKtau(y, z)
GKtauDataframe
returns the square matrix of Goodman and Kruskal
measures computed between each pair of columns in a dataframe. Numeric
variables in the dataframe are treated as factors.
GKtauDataframe(df, dgts = 3, includeNA = "ifany")
GKtauDataframe(df, dgts = 3, includeNA = "ifany")
df |
Dataframe from which to compute association measures. |
dgts |
Integer, number of digits for results; optional (default = 3). |
includeNA |
Character, passed to useNA parameter for table; default is "ifany"; other valid options are "no" and "always" |
The Goodman and Kruskal tau measure is an asymmetric association measure between two categorical variables, based on the extent to which variation in one variable can be explained by the other. This function returns an S3 object of class 'GKtauMatrix' that gives the number of levels for each variable on the diagonal of the matrix and the association between variables in the off-diagonal elements. Note that this matrix is generally NOT symmetric, in contrast to standard correlation matrices.
An S3 object of class 'GKtauMatrix' consisting of a square matrix with one row and column for each column of the dataframe df. The structure of this matrix is:
row and column names are the names of the variables in the dataframe.
the diagonal matrix element contains the number of unique levels for the corresponding variable.
off-diagonal matrix elements contain the forward Goodman-Kruskal tau association from the variable listed in the row names to the variable listed in the column names.
Ron Pearson
Association analysis between categorical variables using the Goodman and Kruskal tau measure
GroupNumeric
converts a numerical variable x
into a factor variable with n groups for categorical
association analysis.
GroupNumeric( x, n = NULL, groupNames = NULL, orderedFactor = FALSE, style = "quantile", ... )
GroupNumeric( x, n = NULL, groupNames = NULL, orderedFactor = FALSE, style = "quantile", ... )
x |
Numeric vector to be grouped. |
n |
Integer number of groups; if NULL (the default), the number of groups will be inferred from the groupNames parameter. |
groupNames |
Character vector of names for the levels of the factor variable created; if NULL (the default), the default names from R's cut function will be used. |
orderedFactor |
Logical, specifying whether the factor returned is ordered or not; default is FALSE. |
style |
Character string, passed to the classIntervals function from the classInt package as its style parameter (see help file for the classInterval function for details); default is "quantile". |
... |
Optional parameters passed to the classIntervals function from the classInt package. |
This function uses the classIntervals function from the classInt package to compute the breakpoints that define the groups. The style parameter is passed to the classIntervals function to specify the grouping method. Note that some methods may return a different number of groups than that requested via the n parameter. If groupNames is specified consistently with n, this different number of returned groups will cause an error. The recommended approach in this case is to either change the style parameter or to re-run without groupNames specified.
Factor variable with n distinct levels, named according to groupNames (if specified).
Ron Pearson
Method for R's generic plot function for DataRobot S3 objects of class GKtauMatrix. This function generates an array of Goodman-Kruskal tau association measures as described under Details. Note that, in general, this matrix is asymmetric.
## S3 method for class 'GKtauMatrix' plot( x, y, colorPlot = TRUE, corrColors = NULL, backgroundColor = "gray", diagColor = "black", diagSize = 1, ... )
## S3 method for class 'GKtauMatrix' plot( x, y, colorPlot = TRUE, corrColors = NULL, backgroundColor = "gray", diagColor = "black", diagSize = 1, ... )
x |
S3 object of class GKtauMatrix to be plotted. |
y |
Not used; included for conformance with plot() generic function parameter requirements. |
colorPlot |
Logical variable indicating whether to generate a color plot (the default, for colorPlot = TRUE) or a black-and-white plot. |
corrColors |
Character vector giving the color names for the correlation values printed on the plot; default value is NULL, causing rainbow(n) to be used, where n is the number of rows and columns in the matrix x. |
backgroundColor |
Character variable naming the background color used for the correlation ellipses in the plot. |
diagColor |
Character variable naming the color of the text used to display the number of levels per variable along the diagonal of the correlation matrix plot. |
diagSize |
Numeric scale factor to adjust the text size for the number of levels displayed on the diagonal of the plot array. |
... |
Not used; included for conformance with plot() generic function parameter requirements. |
This function calls the corrplot function from the corrplot package to generate an array of correlation plots from the matrix of Goodman-Kruskal tau measures returned by the GKtauDataframe function. The off-diagonal elements of this array contain ellipses whose shape is determined by the corresponding tau value, and the diagonal elements display the number of unique levels for each variable in the plot.
This plot may be rendered either in color (the default, obtained by specifying colorPlot as TRUE) or black-and-white. In color plots, the color of the text for the correlation values is set by the corrColors parameter. The default value for this parameter is NULL, which causes the function to use the color vector rainbow(n) where n is the number of rows and columns in the GKtauMatrix object x. The background color used to fill in each ellipse is specified by bhe backgroundColor parameter, and the text for the diagonal entries is determined by the diagColor parameter. In cases where the default choices make the correlation values difficult to read, a useful alternative is to specify corrColors = "blue".
None. This function is called for its side-effect of generating a plot.
Ron Pearson