Title: | Miscellaneous Functions for the Analysis of Educational Assessments |
---|---|
Description: | Miscellaneous functions for data cleaning and data analysis of educational assessments. Includes functions for descriptive analyses, character vector manipulations and weighted statistics. Mainly a lightweight dependency for the packages 'eatRep', 'eatGADS', 'eatPrep' and 'eatModel' (which will be subsequently submitted to 'CRAN'). The function for defining (weighted) contrasts in weighted effect coding refers to te Grotenhuis et al. (2017) <doi:10.1007/s00038-016-0901-1>. Functions for weighted statistics refer to Wolter (2007) <doi:10.1007/978-0-387-35099-8>. |
Authors: | Sebastian Weirich [aut, cre], Martin Hecht [aut], Karoline Sachse [aut], Benjamin Becker [aut], Nicole Mahler [aut], Edna Grewers [ctb] |
Maintainer: | Sebastian Weirich <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7.8.9000 |
Built: | 2025-02-21 06:01:19 UTC |
Source: | https://github.com/weirichs/eattools |
The eatTools
package provides various groups of functions. The main groups of functions include:
transformation of vector types, modification of character variables, descriptive analyses and weighted statistics. The package's purpose is mainly to function as a lightweight dependency for other packages.
The functions asNumericIfPossible
and catch_asNumericIfPossible
transform character and factor variables to numeric. facToChar
transforms factor variables to character. set.col.type
allows manually setting the type of multiple variables within a data.frame
.
Multiple convenience functions exist for modification of character variables: removing certain pattern (removePattern
), removing numerics (removeNumeric
) and removing non numerics (removeNonNumeric
), substituting multiple patterns within a string (gsubAll
) and splitting strings into multiple or a fixed number of parts but at specific position (halveString
)
The function descr
provides simple descriptive statistics for a data.frame
, but in a format especially useful for further automated processing (long format data.frame
).
wtdVar
provides calculation of weighted variances (this can be done also by the package Hmisc
, which has, however, a very high number of dependencies). wtdTable
provides a weighted frequency table.
%$$%
is an operator that is mainly used internally in the eatRep and eatModel packages.
%$$%
is similar to $
, but gives error instead of
NULL
if the corresponding element does not exists.
x %$$% y
x %$$% y
x |
a list |
y |
name of the corresponding element of |
the selected element of the list x
## Not run: x <- list(value1 = 14, value2 = NULL) x$value2 # NULL x$value_not_defined # NULL x%$$%value2 # NULL x%$$%value_not_defined # error ## End(Not run)
## Not run: x <- list(value1 = 14, value2 = NULL) x$value2 # NULL x$value_not_defined # NULL x%$$%value2 # NULL x%$$%value_not_defined # error ## End(Not run)
Adds leading zeros to all columns that can be identified as integers in a data.frame that consists of character columns only.
addLeadingZerosToCharInt(dat)
addLeadingZerosToCharInt(dat)
dat |
a data.frame consisting of character columns only |
a data.frame of only character columns and the same dimensions as the input data.frame. In any column containing strings that can be converted to integers, these strings will be padded with leading zeros so that all values in the column have the same number of digits.
Karoline Sachse
dat <- data.frame(v1 = c("0","300","e",NA), v2=c("0","90","10000",NA), v3=c("k","kk","kkk",NA), v4=NA, v5=c("0","90","100","1")) dat <- set.col.type(dat) addLeadingZerosToCharInt(dat)
dat <- data.frame(v1 = c("0","300","e",NA), v2=c("0","90","10000",NA), v3=c("k","kk","kkk",NA), v4=NA, v5=c("0","90","100","1")) dat <- set.col.type(dat) addLeadingZerosToCharInt(dat)
This function converts vectors and matrices of all kinds to numeric
. The function can
also be used to convert all columns of a data.frame
to class numeric
for
which this conversion is possible i.e. without creating NA
when it fails.
Non-convertible columns are maintained.
asNumericIfPossible(x, maintain.factor.scores = TRUE, force.string = TRUE, transform.factors = TRUE, varName = NULL)
asNumericIfPossible(x, maintain.factor.scores = TRUE, force.string = TRUE, transform.factors = TRUE, varName = NULL)
x |
A vector or data frame which should be converted. |
maintain.factor.scores |
Logical:
If |
force.string |
Logical indicating whether columns should be force to
numeric, even if NAs are induced.
If |
transform.factors |
Logical indicating whether columns of class
|
varName |
Optional: Name of the corresponding variable. Doesn't have to be changed by user. |
In R, factors may represent ordered categories or categorical variables.
Depending on the meaning of the variable, a conversion of the nominal values
(of a factor variable) to numeric values may be desirable or not. The arguments
transform.factors
and maintain.factor.scores
specify if and how
factor variables should be treated. See examples.
Sebastian Weirich, Karoline Sachse, Benjamin Becker
dat <- data.frame(X1 = c("1",NA,"0"), X2 = c("a",NA,"b"), X3 = c(TRUE,FALSE,FALSE), X4 = as.factor(c("a",NA,"b")), X5 = as.factor(c("5","6","7")), stringsAsFactors = FALSE) str(dat) asNumericIfPossible(dat) asNumericIfPossible(dat, transform.factors=TRUE, maintain.factor.scores=FALSE) asNumericIfPossible(dat, transform.factors=TRUE, maintain.factor.scores=TRUE)
dat <- data.frame(X1 = c("1",NA,"0"), X2 = c("a",NA,"b"), X3 = c(TRUE,FALSE,FALSE), X4 = as.factor(c("a",NA,"b")), X5 = as.factor(c("5","6","7")), stringsAsFactors = FALSE) str(dat) asNumericIfPossible(dat) asNumericIfPossible(dat, transform.factors=TRUE, maintain.factor.scores=FALSE) asNumericIfPossible(dat, transform.factors=TRUE, maintain.factor.scores=TRUE)
asNumericIfPossible
with modified warning.
This function uses asNumericIfPossible
but lets the user change the warning
issued by asNumericIfPossible
. Suited for use in other R
packages.
catch_asNumericIfPossible(x, warn, maintain.factor.scores = TRUE, force.string = TRUE, transform.factors = TRUE)
catch_asNumericIfPossible(x, warn, maintain.factor.scores = TRUE, force.string = TRUE, transform.factors = TRUE)
x |
A vector or data frame which should be converted. |
warn |
A character vector of length 1 with the desired warning. |
maintain.factor.scores |
Logical:
If |
force.string |
Logical indicating whether columns should be force to
numeric, even if NAs are induced.
If |
transform.factors |
Logical indicating whether columns of class
|
For details see asNumericIfPossible
Benjamin Becker
char <- c("a", "b", 1) catch_asNumericIfPossible(x = char, warn = "Vector could not be converted")
char <- c("a", "b", 1) catch_asNumericIfPossible(x = char, warn = "Vector could not be converted")
Removes special characters from a character string. Also applicable to factor variables and data.frames
.
cleanifyString(x, removeNonAlphaNum = TRUE, replaceSpecialChars = TRUE, oldEncoding = NULL, ...)
cleanifyString(x, removeNonAlphaNum = TRUE, replaceSpecialChars = TRUE, oldEncoding = NULL, ...)
x |
a character variable, factor variable or |
removeNonAlphaNum |
logical. If |
replaceSpecialChars |
logical. If |
oldEncoding |
character. The encoding of the input data if it should be transformed to |
... |
further arguments passed to other methods. |
If unwanted characters are removed from a character string in a factor variable, this can lead
to a change in the factor structure (according to the reference category, for example). cleanifyString
restores the factor structure after removing special characters. Function is mainly used internally in the
eatRep
, eatGADS
, and eatModel
packages.
a character variable, factor variable or data.frame
with removed special characters
fac1 <- factor(c("Tablet-Paper", "Computer.(Laptop)", "Computer.(Laptop)"), levels = c("Tablet-Paper", "Computer.(Laptop)")) table(fac1) # Remove special characters fac2 <- cleanifyString(fac1) fac2
fac1 <- factor(c("Tablet-Paper", "Computer.(Laptop)", "Computer.(Laptop)"), levels = c("Tablet-Paper", "Computer.(Laptop)")) table(fac1) # Remove special characters fac2 <- cleanifyString(fac1) fac2
Function works equivalent to contr.wec
from the wec
package, but allows
for weighted contrasts.
contr.wec.weighted (x, omitted, weights)
contr.wec.weighted (x, omitted, weights)
x |
grouping variable of class factor |
omitted |
Label of the factor label that should be taken as the omitted category |
weights |
Numeric vector of non-negative weights |
Returns a contrast matrix based on weighted effect coding.
Sebastian Weirich, based upon the contr.wec
function of the wec
package
### exemplary data according to wec paper dat <- data.frame ( group = as.factor(c(rep(1,3), rep(2,2))), wgt = c(2/3, 4/3, 2, 3/8, 5/8)) ### default contrasts contrasts(dat[,"group"]) ### weighted effect coding for weighted data contr.wec.weighted(x= dat[,"group"], omitted=1,weights=dat[,"wgt"]) ### equal to weighted effect coding: wec::contr.wec(x= dat[,"group"], omitted=1) contr.wec.weighted(x= dat[,"group"], omitted=1,weights=rep(1, nrow(dat)))
### exemplary data according to wec paper dat <- data.frame ( group = as.factor(c(rep(1,3), rep(2,2))), wgt = c(2/3, 4/3, 2, 3/8, 5/8)) ### default contrasts contrasts(dat[,"group"]) ### weighted effect coding for weighted data contr.wec.weighted(x= dat[,"group"], omitted=1,weights=dat[,"wgt"]) ### equal to weighted effect coding: wec::contr.wec(x= dat[,"group"], omitted=1) contr.wec.weighted(x= dat[,"group"], omitted=1,weights=rep(1, nrow(dat)))
Similarly to the function trim
from the gdata
package, this function
can be used to remove trailing and leading spaces from character strings. However,
in contrast to trim
, any character can be removed by crop
.
crop(x, char = " ")
crop(x, char = " ")
x |
character string |
char |
character to be removed from beginning and end of |
Martin Hecht, Sebastian Weirich
str <- c(" 12 kk ", "op j q ", "110") crop(str) crop(str, "op")
str <- c(" 12 kk ", "op j q ", "110") crop(str) crop(str, "op")
Function computes descriptive statistics for one variable or several variables within a data frame.
descr (variable, na = NA, p.weights = NULL, na.rm = FALSE, verbose=TRUE)
descr (variable, na = NA, p.weights = NULL, na.rm = FALSE, verbose=TRUE)
variable |
one variable or a data.frame with several variables |
na |
optional values with should be considered a missing values |
p.weights |
optional: vector with individual weights if weighted statistics should be computed |
na.rm |
logical: should missings be removed prior to estimation? |
verbose |
logical: Print messages to console? |
a data frame with the following columns
N |
number of observations |
N.valid |
number of non-missing observations |
Missing |
number of missings |
Minimum |
minimum of numeric variables |
Maximum |
maximum of numeric variables |
Sum |
sum of numeric variables |
Mean |
arithmetic mean of numeric variables |
std.err |
standard error of the arithmetic mean. Note: for weighted means, standard error
is estimated according to Cochran (1977): |
sig |
p value |
Median |
median of numeric variables |
SD |
standard deviation of numeric variables |
Var |
variance of numeric variables |
Sebastian Weirich
Cochran W. G. (1977). Sampling Techniques (3rd Edn). Wiley, New York
data(mtcars) descr(mtcars)
data(mtcars) descr(mtcars)
Use do.call(rbind, ...)
on a list of data.frames
while creating a new variable
(colName
) which contains, for example, the original list naming (name
).
do_call_rbind_withName(df_list, name = names(df_list), colName)
do_call_rbind_withName(df_list, name = names(df_list), colName)
df_list |
A list of |
name |
Vector of names to fill |
colName |
A single character; name for the new column. |
Returns a data.frame
.
Benjamin Becker
### create example list df_list <- lapply(mtcars, function(x) { data.frame(m = mean(x), sd = sd(x)) }) ### transform to a single data.frame do_call_rbind_withName(df_list, colName = "variable")
### create example list df_list <- lapply(mtcars, function(x) { data.frame(m = mean(x), sd = sd(x)) }) ### transform to a single data.frame do_call_rbind_withName(df_list, colName = "variable")
Function is necessary for eatRep
and eatModel
as well
and therefore exported to namespace.
existsBackgroundVariables (dat, variable, warnIfMissing = FALSE, stopIfMissingOnVars = NULL)
existsBackgroundVariables (dat, variable, warnIfMissing = FALSE, stopIfMissingOnVars = NULL)
dat |
A data frame |
variable |
column number or variable name |
warnIfMissing |
Logical: gives a warning if the variable contains missing values |
stopIfMissingOnVars |
Character vector of variable names. Only for these variables, warnings as raised
through |
a structured list of variable names
data(mtcars) existsBackgroundVariables(mtcars, 2:4)
data(mtcars) existsBackgroundVariables(mtcars, 2:4)
Function transforms all data frame columns of a specific class into another class.
facToChar ( dataFrame, from = "factor", to = "character")
facToChar ( dataFrame, from = "factor", to = "character")
dataFrame |
a data frame |
from |
which column class should be transformed? |
to |
target column class |
a data frame
Sebastian Weirich
data(mtcars) ### original classes sapply(mtcars, class) mtcars1 <- facToChar(mtcars, from = "numeric", to = "character") sapply(mtcars1, class)
data(mtcars) ### original classes sapply(mtcars, class) mtcars1 <- facToChar(mtcars, from = "numeric", to = "character") sapply(mtcars1, class)
Function is a wrapper for gsub()
which allows to replace more than one pattern.
Does not allow using regular expressions (internally, gsub(..., fixed = TRUE)
is used).
gsubAll ( string, old, new)
gsubAll ( string, old, new)
string |
a character vector where matches are sought |
old |
character vector containing strings to be matched in the given character vector named |
new |
a replacement for matched pattern |
Internally, the function calls gsub()
repeatedly, beginning with the longest string in old
.
String length is evaluated using nchar()
.
This is done to avoid repeated modifications if strings in old
match each other.
character vector with replaced patterns
### replace all numbers by words txt <- "1 example for 2 reasons in 4 seasons" gsubAll ( txt, old = as.character(1:4), new = c("one", "two", "three", "four"))
### replace all numbers by words txt <- "1 example for 2 reasons in 4 seasons" gsubAll ( txt, old = as.character(1:4), new = c("one", "two", "three", "four"))
strsplit
splits a string according to a specific regular expression. The number
of occurrences of the splitting regular expression defines the number of splits. halveString
allows to split the string in only two parts, no matter how often the splitting regular expression occurs.
halveString (string, pattern, first = TRUE , colnames=c("X1", "X2"))
halveString (string, pattern, first = TRUE , colnames=c("X1", "X2"))
string |
A character vector. |
pattern |
character vector (or object which can be coerced to such) to use for splitting. |
first |
Logical: Relevant if the pattern occurs more than one time in the string. Defines whether the first (default) or last occurrence is used for splitting. |
colnames |
Optional: character vector of length 2 to specify the colnames of the resulting data.frame. |
A matrix with two columns
str1 <- c("John_Bolton", "Richard_Milhouse_Nixon", "Madonna") strsplit(str1, split = "_") halveString(str1, pattern = "_") halveString(str1, pattern = "_", first=FALSE) # split patterns with more than one character and regular expression str2 <- c("John._.Bolton", "Richard._.Milhouse._.Nixon", "Madonna") halveString(str2, pattern = encodeString("._."), first=FALSE)
str1 <- c("John_Bolton", "Richard_Milhouse_Nixon", "Madonna") strsplit(str1, split = "_") halveString(str1, pattern = "_") halveString(str1, pattern = "_", first=FALSE) # split patterns with more than one character and regular expression str2 <- c("John._.Bolton", "Richard._.Milhouse._.Nixon", "Madonna") halveString(str2, pattern = encodeString("._."), first=FALSE)
data.frame
at a Specific Position
Insert columns into a data.frame
at a specific position. Transforms tibble
or data.table
to data.frame
.
insert.col(dat, toinsert, after)
insert.col(dat, toinsert, after)
dat |
A data frame |
toinsert |
Column name(s) or column number(s) of the columns to be reinserted |
after |
Column name or column number after which the columns specified in |
A data frame with columns in specified positions.
tbl
or data.table
objects to plain data.frames
for internal processingFunction is mainly used for internal checks in the eatRep and eatModel package: objects
which expected to be data.frames
for further processing are converted to data.frame when their class
is tbl
, for example.
makeDataFrame (dat, name = "dat", minRow = 1, onlyWarn=TRUE, verbose=TRUE)
makeDataFrame (dat, name = "dat", minRow = 1, onlyWarn=TRUE, verbose=TRUE)
dat |
An object which is intended to be a data.frame. |
name |
Optional: name of data.frame for use in messages |
minRow |
When used internally, function report when data.frame has less rows than specified in |
onlyWarn |
If |
verbose |
Logical: print messages to console? |
data frame.
dat <- data.table::data.table(x1 = 1:5, y1 = letters[1:5]) # unexpected in 'classical' data frames class(dat[,"x1"]) dat <- makeDataFrame(dat)
dat <- data.table::data.table(x1 = 1:5, y1 = letters[1:5]) # unexpected in 'classical' data frames class(dat[,"x1"]) dat <- makeDataFrame(dat)
Function is mainly used for eatAnalysis::wtdHetcor
function from the
eatAnalysis
package (https://github.com/beckerbenj/eatAnalysis/)
and the eatModel::q3FromRes
function in the eatModel
package: Triangular covariance/correlation matrices are tidily reshaped.
makeTria (dfr)
makeTria (dfr)
dfr |
A data frame consisting of a row name column and a square matrix. |
covariance/correlation matrices which are inherently symmetrical are often displayed
in a space-saving manner by only showing the upper or lower triangular part, omitting the
symmetrical counterpart. In R, covariance/correlation matrices tend to be displayed with their
upper and lower halves. Whereas lower.tri
and upper.tri
allows to replace upper or lower half with NA
s, the triangular shape could then be lost if the
covariance/correlation matrix was provided in a long format and reshaped afterwards. makeTria
sorts rows and columns appropriately to provide triangular shape if redundant entries are replaced by
NA. Please note that the functions expects row names in the first column of the input data.frame.
data frame.
dfr <- data.frame ( vars = paste0("var", 2:4), matrix(c(1:3, NA, NA, 5, 4,NA,6), nrow=3, ncol=3, dimnames=list(NULL, paste0("var", 1:3)))) makeTria(dfr)
dfr <- data.frame ( vars = paste0("var", 2:4), matrix(c(1:3, NA, NA, 5, 4,NA,6), nrow=3, ncol=3, dimnames=list(NULL, paste0("var", 1:3)))) makeTria(dfr)
This is a wrapper for the merge
function. merge
does not maintain variable attributes. mergeAttr
might be useful if variable
attributes should be maintained. For example, if SPSS data are imported via
read.spss
, variable and value labels are stored
as attributes which get lost if data are merged subsequently. Moreover, function gives
additional messages if (combinations of) by-variables are not unique in at least one data.frame,
or if by-variables have different classes, or if some units of the by-variables are missing in
one of the data sets. Users are free to specify which kind of messages are desirable.
mergeAttr(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), setAttr = TRUE, onlyVarValLabs = TRUE, homoClass = TRUE, unitName = "unit", xName = "x", yName = "y", verbose = c("match", "unique", "class", "dataframe", "common", "convert"))
mergeAttr(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), setAttr = TRUE, onlyVarValLabs = TRUE, homoClass = TRUE, unitName = "unit", xName = "x", yName = "y", verbose = c("match", "unique", "class", "dataframe", "common", "convert"))
x |
first data frame to be merged. |
y |
second data frame to be merged. |
by |
specifications of the columns used for merging |
by.x |
specifications of the columns used for merging |
by.y |
specifications of the columns used for merging |
all |
logical; |
all.x |
logical; if |
all.y |
logical; analogous to |
sort |
logical. Should the result be sorted on the |
suffixes |
a character vector of length 2 specifying the suffixes to be used for making unique
the names of columns in the result which not used for merging (appearing in |
setAttr |
Logical: restore the variable attributes? If FALSE, the behavior of |
onlyVarValLabs |
Logical: If TRUE, only the variable and value labels as captured by |
homoClass |
Logical: Beginning with R version 3.5, |
unitName |
Optional: Set the name for the unit variable to get more informative messages. This is mainly
relevant if |
xName |
Optional: Set the name for the x data.frame to get more informative messages. This is mainly
relevant if |
yName |
Optional: Set the name for the y data.frame to get more informative messages. This is mainly
relevant if |
verbose |
Optional: Choose whether messages concerning missing levels in by-variables should be printed
on console ( |
data frame. See the help page of merge
for further details.
### data frame 1, variable 'y' with variable.label 'test participation' df1 <- data.frame ( id = 1:3, sex = factor ( c("male", "male", "female")), happy = c("low", "low", "medium")) attr(df1[,"happy"], "variable.label") <- "happieness in the workplace" ### data frame 2 without labels df2 <- data.frame ( id = as.factor(c(2,2,4)), status = factor ( c("married", "married", "single")), convicted = c(FALSE, FALSE, TRUE)) ### lost label after merging df3 <- merge(df1, df2, all = TRUE) attr(df3[,"happy"], "variable.label") ### maintain label df4 <- mergeAttr(df1, df2, all = TRUE, onlyVarValLabs = FALSE) attr(df4[,"happy"], "variable.label") ### adapt messages df5 <- mergeAttr(df1, df2, all = TRUE, onlyVarValLabs = FALSE, unitName = "student", xName = "student questionnaire", yName = "school questionnaire", verbose = c("match", "unique"))
### data frame 1, variable 'y' with variable.label 'test participation' df1 <- data.frame ( id = 1:3, sex = factor ( c("male", "male", "female")), happy = c("low", "low", "medium")) attr(df1[,"happy"], "variable.label") <- "happieness in the workplace" ### data frame 2 without labels df2 <- data.frame ( id = as.factor(c(2,2,4)), status = factor ( c("married", "married", "single")), convicted = c(FALSE, FALSE, TRUE)) ### lost label after merging df3 <- merge(df1, df2, all = TRUE) attr(df3[,"happy"], "variable.label") ### maintain label df4 <- mergeAttr(df1, df2, all = TRUE, onlyVarValLabs = FALSE) attr(df4[,"happy"], "variable.label") ### adapt messages df5 <- mergeAttr(df1, df2, all = TRUE, onlyVarValLabs = FALSE, unitName = "student", xName = "student questionnaire", yName = "school questionnaire", verbose = c("match", "unique"))
creates a sequence for every unique value in a vector
multiseq(v)
multiseq(v)
v |
a vector |
a vector with multiple sequences
Martin Hecht
v <- c("a", "a", "a", "c", "b", "b" , "a") multiseq(v)
v <- c("a", "a", "a", "c", "b", "b" , "a") multiseq(v)
Drop rows containing missing values in selected columns.
na_omit_selection (dat, varsToOmitIfNA)
na_omit_selection (dat, varsToOmitIfNA)
dat |
a data.frame |
varsToOmitIfNA |
Name or column number of the variables which should be considered for row deletion due to NAs |
A data.frame with deleted rows
dat1 <- data.frame ( v1 = c(1,NA,3), v2 = c(letters[1:2],NA), v3 = c(NA, NA, TRUE), stringsAsFactors = FALSE) na.omit(dat1) na_omit_selection(dat1, "v2")
dat1 <- data.frame ( v1 = c(1,NA,3), v2 = c(letters[1:2],NA), v3 = c(NA, NA, TRUE), stringsAsFactors = FALSE) na.omit(dat1) na_omit_selection(dat1, "v2")
Function is useful if parameters on the ‘PISA’ metric should be transformed into competence levels.
num.to.cat(x, cut.points, cat.values = NULL)
num.to.cat(x, cut.points, cat.values = NULL)
x |
Numeric vector. |
cut.points |
Numeric vector with cut scores. |
cat.values |
Optional: vector with labels for the cut scores. Note: if specified, length of cat.values should be length(cut.points)+1. |
Vector with factor values.
Sebastian Weirich
values <- rnorm(10,0,1.5) * 100 + 500 num.to.cat(x = values, cut.points = 390+0:3*75) num.to.cat(x = values, cut.points = 390+0:3*75, cat.values = c("1a", "1b", 2:4))
values <- rnorm(10,0,1.5) * 100 + 500 num.to.cat(x = values, cut.points = 390+0:3*75) num.to.cat(x = values, cut.points = 390+0:3*75, cat.values = c("1a", "1b", 2:4))
Some (error) messages are more understandable if small (frequency)
tables are used for clearness. The function simplifies integration of these tables.
The function is intended to be used in combination with message
,
stop
, or cat
, for example.
print_and_capture (x, spaces = 0)
print_and_capture (x, spaces = 0)
x |
The object which should be integrated. Normally, a (small) table or data frame. |
spaces |
Number of spaces between left border and the table |
a string which may be combined with messages
frequency.table <- as.table(matrix(c(12,0,5,7),2,2)) attr(frequency.table, "dimnames") <- list("sex" = c("male", "female"), "migration" = c(TRUE, FALSE)) message("Some combinations of variables with zero observations: \n", print_and_capture(frequency.table, spaces = 5))
frequency.table <- as.table(matrix(c(12,0,5,7),2,2)) attr(frequency.table, "dimnames") <- list("sex" = c("male", "female"), "migration" = c(TRUE, FALSE)) message("Some combinations of variables with zero observations: \n", print_and_capture(frequency.table, spaces = 5))
Computes the part-whole correlation (correlation of an item with the whole scale except for this item)
pwc(dat)
pwc(dat)
dat |
a data.frame with numeric columns (items) |
A data.frame with three columns: First column item identifier, second column with conventional item-scale correlation, third column with part-whole correlation
dat <- data.frame ( item1 = c(0,1,1,3), item2 = c(2,3,1,3), item3 = c(1, NA, 3,3)) pwc(dat)
dat <- data.frame ( item1 = c(0,1,1,3), item2 = c(2,3,1,3), item3 = c(1, NA, 3,3)) pwc(dat)
rbind
s a list of data.frames, using only these columns which occur
in each of the single data.frames.
rbind_common(...)
rbind_common(...)
... |
input data frames to row bind together. The first argument can be a list of data frames, in which case all other arguments are ignored. Any NULL inputs are silently dropped. If all inputs are NULL, the output is NULL. If the data.frames have no common columns, the output is NULL and a warning is given. |
a single data frame
### data frame 1 df1 <- data.frame ( a = 1:3, b = TRUE) ### data frame 2 df2 <- data.frame ( d = 100, a = 11:13) ### data frame 3 df3 <- data.frame ( d = 1000, x = 101:103) ### one common col rbind_common(df1, df2) ### no common cols rbind_common(df1, df2, df3)
### data frame 1 df1 <- data.frame ( a = 1:3, b = TRUE) ### data frame 2 df2 <- data.frame ( d = 100, a = 11:13) ### data frame 3 df3 <- data.frame ( d = 1000, x = 101:103) ### one common col rbind_common(df1, df2) ### no common cols rbind_common(df1, df2, df3)
rbind
s a list of vectors of unequal length to a data.frame. Missing columns are filled with NA.
rbind_fill_vector(x)
rbind_fill_vector(x)
x |
A list of vectors. Each element of x must have a dimension of |
a single data frame
a <- list(NULL, 1:2, NA, "a", 11:13) rbind_fill_vector(a)
a <- list(NULL, 1:2, NA, "a", 11:13) rbind_fill_vector(a)
Read in character separated data.frames with separator characters >=1Byte.
readMultisep(file, sep, colnames=TRUE)
readMultisep(file, sep, colnames=TRUE)
file |
the name of the file which the data are to be read from. |
sep |
the field separator character(s). |
colnames |
logical. Whether first line in file contains colnames. |
A data frame containing a representation of the data in the file.
filePath <- tempfile(fileext = ".txt") dat <- data.frame(v1 = c("0","300","e",NA), v2=c("0","90","10000",NA), v3=c("k","kk","kkk",NA), v4=NA, v5=c("0","90","100","1")) write.table(dat, file = filePath, row.names = FALSE, col.names = FALSE, sep = "]&;") readMultisep(filePath, sep="]&;")
filePath <- tempfile(fileext = ".txt") dat <- data.frame(v1 = c("0","300","e",NA), v2=c("0","90","10000",NA), v3=c("k","kk","kkk",NA), v4=NA, v5=c("0","90","100","1")) write.table(dat, file = filePath, row.names = FALSE, col.names = FALSE, sep = "]&;") readMultisep(filePath, sep="]&;")
Recodes the values of a variable. Function resembles the recode
function from the car
package, but uses a lookup table to specify old and new values.
recodeLookup(var, lookup)
recodeLookup(var, lookup)
var |
a vector (e.g. numeric, character, or factor) |
lookup |
a data.frame with exact two columns. First column contains old values, second column new values. Values which do not occur in the old column remain unchanged. |
a vector of the same length as var
with recoded values
num_var <- sample(1:10, size = 10, replace = TRUE) lookup <- data.frame(old = c(2, 4, 6), new = c(200,400,600)) num_var2<- recodeLookup(num_var, lookup)
num_var <- sample(1:10, size = 10, replace = TRUE) lookup <- data.frame(old = c(2, 4, 6), new = c(200,400,600)) num_var2<- recodeLookup(num_var, lookup)
Function removes all non-numeric characters from a string.
removeNonNumeric ( string)
removeNonNumeric ( string)
string |
a character vector |
a character string
Sebastian Weirich
str <- c(".d1.nh.120", "empty", "110", ".nh.dgd", "only.nh") removeNonNumeric(str)
str <- c(".d1.nh.120", "empty", "110", ".nh.dgd", "only.nh") removeNonNumeric(str)
Function removes alphanumeric characters from a string.
removeNumeric ( string)
removeNumeric ( string)
string |
a character vector |
a character string
Sebastian Weirich
str <- c(".d1.nh.120", "empty", "110", ".nh.dgd", "only.nh") removeNumeric(str)
str <- c(".d1.nh.120", "empty", "110", ".nh.dgd", "only.nh") removeNumeric(str)
Function remove a specified string from a character vector.
removePattern ( string, pattern)
removePattern ( string, pattern)
string |
a character vector |
pattern |
a character pattern of length 1 |
a character string
str <- c(".d1.nh.120", "empty", "110", ".nh.dgd", "only.nh") removePattern(str, ".nh.")
str <- c(".d1.nh.120", "empty", "110", ".nh.dgd", "only.nh") removePattern(str, ".nh.")
data.frame
.Round all numeric variables in a data.frame
, leave the other variables untouched. Column and row names are preserved.
roundDF(dat, digits = 3)
roundDF(dat, digits = 3)
dat |
A |
digits |
Integer indicating the number of decimal places. |
Returns the rounded data.frame
.
roundDF(mtcars, digits = 0)
roundDF(mtcars, digits = 0)
Creates a sequence of integers. Modified version of seq
returning an empty vector if the starting point is larger than the end point.
Originally provided by rlang::seq2()
.
seq2(from, to)
seq2(from, to)
from |
The starting value of the sequence. Of length 1. |
to |
The end value of the sequence. Of length 1. |
A numerical sequence
seq2(from = 1, to = 5)
seq2(from = 1, to = 5)
This function converts the classes of columns to character
, numeric
, logical
, integer
or factor
.
set.col.type(dat, col.type = list("character" = NULL), verbose = FALSE, ...)
set.col.type(dat, col.type = list("character" = NULL), verbose = FALSE, ...)
dat |
A data frame |
col.type |
A named list of column names that are to be converted. The names of the list indicate the class to which the respective column should be converted ( |
verbose |
if |
... |
Additional arguments to be passed to |
Use col.type="numeric.if.possible"
if conversion to numeric should be tested upfront, see asNumericIfPossible
for details.
A data frame with column classes changed according to the specifications in col.type
Martin Hecht, Karoline Sachse
asNumericIfPossible
str(d <- data.frame("var1" = 1, "var2" = TRUE, "var3" = FALSE, "var4" = as.factor(1), "var5" = as.factor("a"),"var6" = "b", stringsAsFactors = FALSE)) str(set.col.type(d)) str(set.col.type(d, list("numeric" = NULL))) str(set.col.type(d, list("character" = c("var1" , "var2"), "numeric" = "var3", "logical" = "var4"))) str(set.col.type(d, list("numeric.if.possible" = NULL))) str(set.col.type(d, list("numeric.if.possible" = NULL), transform.factors = TRUE)) str(set.col.type(d, list("numeric.if.possible" = NULL), transform.factors = TRUE, maintain.factor.scores = FALSE))
str(d <- data.frame("var1" = 1, "var2" = TRUE, "var3" = FALSE, "var4" = as.factor(1), "var5" = as.factor("a"),"var6" = "b", stringsAsFactors = FALSE)) str(set.col.type(d)) str(set.col.type(d, list("numeric" = NULL))) str(set.col.type(d, list("character" = c("var1" , "var2"), "numeric" = "var3", "logical" = "var4"))) str(set.col.type(d, list("numeric.if.possible" = NULL))) str(set.col.type(d, list("numeric.if.possible" = NULL), transform.factors = TRUE)) str(set.col.type(d, list("numeric.if.possible" = NULL), transform.factors = TRUE, maintain.factor.scores = FALSE))
Function takes values and creates a frequency table including these values. Models behavior of factor variables.
tablePattern (x, pattern = NULL, weights, na.rm = TRUE, useNA = c("no", "ifany", "always"))
tablePattern (x, pattern = NULL, weights, na.rm = TRUE, useNA = c("no", "ifany", "always"))
x |
a vector |
pattern |
desired values for table output |
weights |
optional: weights |
na.rm |
should missing values be removed |
useNA |
whether to include [NA] values in the table |
a frequency table
Sebastian Weirich
grades <- c(1,1,3,4,2,3,4,5,5,3,2,1) table(grades) tablePattern(grades, pattern = 1:6)
grades <- c(1,1,3,4,2,3,4,5,5,3,2,1) table(grades) tablePattern(grades, pattern = 1:6)
Replaces the somehow buggy function combination table(unlist(data))
.
tableUnlist(dataFrame, useNA = c("no", "ifany", "always"))
tableUnlist(dataFrame, useNA = c("no", "ifany", "always"))
dataFrame |
Data frame with more than one column. |
useNA |
whether to include NA values in the table. See help file of |
A frequency table
dat <- data.frame ( matrix ( data = sample(0:1,200,replace=TRUE), nrow=20, ncol=10)) tableUnlist(dat)
dat <- data.frame ( matrix ( data = sample(0:1,200,replace=TRUE), nrow=20, ncol=10)) tableUnlist(dat)
The function closely resembles the match
function, but allows for
multiple matches.
whereAre(a,b,verbose=TRUE)
whereAre(a,b,verbose=TRUE)
a |
a scalar |
b |
a numeric or character vector |
verbose |
logical: print messages on console? |
A numeric vector
Sebastian Weirich
a <- 12 b <- c(10, 11, 12, 10, 11, 12) match(a, b) whereAre(a=a, b=b)
a <- 12 b <- c(10, 11, 12, 10, 11, 12) match(a, b) whereAre(a=a, b=b)
Data from large-scale assessments often are provided in the wide format. This function easily transform data into the long format required by eatRep.
wideToLong (datWide, noImp, imp, multipleColumns = TRUE, variable.name = "variable", value.name = "value")
wideToLong (datWide, noImp, imp, multipleColumns = TRUE, variable.name = "variable", value.name = "value")
datWide |
Data set in the wide format, i.e. one row per person |
noImp |
character vector of non-imputed variables which are desired for following analyses |
imp |
Named list of character vectors which include the imputed variables which are desired for following analyses |
multipleColumns |
Logical: use one column for each imputed variable (if more than one imputed variable is used)?
Alternatively, only one column for all imputed variables is used (this is the default behavior
of the |
variable.name |
Applies only if |
value.name |
Applies only if |
A data.frame in the long format.
Sebastian Weirich
### create arbitrary wide format large-scale assessment data for two ### subjects, each with three imputations datWide <- data.frame ( id = paste0("P",1:5), weight = abs(rnorm(5,10,1)), country = c("USA", "BRA", "TUR", "GER", "AUS"), sex = factor(c("female", "male", "female", "female", "male")), matrix(data = rnorm(n=15, mean = 500, sd = 75), nrow=5, dimnames = list(NULL, paste0("mat.pv", 1:3))), matrix(data = rnorm(n=15, mean = 480, sd = 80), nrow=5, dimnames = list(NULL, paste0("sci.pv", 1:3))), stringsAsFactors=FALSE) datLong <- wideToLong(datWide = datWide, noImp = c("id", "weight", "country", "sex"), imp = list ( math = paste0("mat.pv", 1:3), science = paste0("sci.pv", 1:3))) datLong2<- wideToLong(datWide = datWide, noImp = c("id", "weight", "country", "sex"), imp = list ( math = paste0("mat.pv", 1:3), science = paste0("sci.pv", 1:3)), multipleColumns = FALSE, variable.name = "varName", value.name = "val")
### create arbitrary wide format large-scale assessment data for two ### subjects, each with three imputations datWide <- data.frame ( id = paste0("P",1:5), weight = abs(rnorm(5,10,1)), country = c("USA", "BRA", "TUR", "GER", "AUS"), sex = factor(c("female", "male", "female", "female", "male")), matrix(data = rnorm(n=15, mean = 500, sd = 75), nrow=5, dimnames = list(NULL, paste0("mat.pv", 1:3))), matrix(data = rnorm(n=15, mean = 480, sd = 80), nrow=5, dimnames = list(NULL, paste0("sci.pv", 1:3))), stringsAsFactors=FALSE) datLong <- wideToLong(datWide = datWide, noImp = c("id", "weight", "country", "sex"), imp = list ( math = paste0("mat.pv", 1:3), science = paste0("sci.pv", 1:3))) datLong2<- wideToLong(datWide = datWide, noImp = c("id", "weight", "country", "sex"), imp = list ( math = paste0("mat.pv", 1:3), science = paste0("sci.pv", 1:3)), multipleColumns = FALSE, variable.name = "varName", value.name = "val")
This functions works quite equally as the wtd.table
function
from the Hmisc
package.
wtdTable(x , weights , na.rm = FALSE)
wtdTable(x , weights , na.rm = FALSE)
x |
a character or category or factor vector |
weights |
a numeric vector of non-negative weights |
na.rm |
set to |
a frequency table
x <- c(50, 1, 50) w <- c(1, 4, 1) wtdTable(x, w)
x <- c(50, 1, 50) w <- c(1, 4, 1) wtdTable(x, w)
This functions works quite equally as the wtd.var
function
from the Hmisc
package.
wtdVar(x , weights , na.rm = FALSE)
wtdVar(x , weights , na.rm = FALSE)
x |
numeric vector |
weights |
a numeric vector of non-negative weights |
na.rm |
set to |
a scalar
Benjamin Becker
x <- c(50, 1, 25) w <- c(1, 4, 1) wtdVar(x, w)
x <- c(50, 1, 25) w <- c(1, 4, 1) wtdVar(x, w)