Software

R packages on CRAN

Name	Latest Release	Downloads (RStudio CRAN Mirror Since October 2012)
aftgee
clusrank
coga
copula
dacc
dynsurv
eva
geepack
intsurv
jds.rmd
KMsurv
ramps
reda
rrpack
sgee
smam
som
spef
splines2
tls
tpr
touch
wdnet

{aftgee}: Accelerated Failure Time Model with Generalized Estimating Equations

aftgee-dev

{aftgee} provides a collection of methods for both the rank-based estimates and least-square estimates to the Accelerated Failure Time (AFT) model.

For rank-based estimation, it provides the computationally efficient Gehan’s weight and the general’s weight such as the logrank weight. See Chiou et al. (2014, 2015) for details.
For the least-square estimation, the estimating equation is solved with generalized estimating equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE’s setting. See Chiou et al. (2014) for details.

References

Chiou, S., Kang, S., Kim, J., & Yan, J. (2014). Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations. Lifetime Data Analysis, 20(4), 599–618. https://doi.org/10.1007/s10985-014-9292-x
Chiou, S., Kang, S. & Yan, J. (2014). Fast accelerated failure time modeling for case-cohort data. Statistics and Computing, 24(4), 559–568. https://doi.org/10.1007/s11222-013-9388-2
Chiou, S., Kang, S. & Yan, J. (2015). Rank-based estimating equations with general weight for accelerated failure time models: An induced smoothing approach. Statistics in Medicine, 34(9): 1495–1510. https://doi.org/10.1002/sim.6415

{clusrank}: Wilcoxon Rank Tests for Clustered Data

clusrank-dev

{clusrank} provides functions for Wilcoxon rank sum test and Wilcoxon signed rank test for clustered data. See Jiang et. al (2020) for details.

References

Jiang, Y., He, X., Lee, M. T., Rosner, B., & Yan, J. (2020). Wilcoxon rank-based tests for clustered data with R package clusrank. Journal of Statistical Software, 96(6), 1–26. https://doi.org/10.18637/jss.v096.i06

{coga}: Convolution of Gamma Distributions

coga-dev

{coga} provides functions for density and distribution evaluation of convolution of gamma distributions. Two related exact methods and one approximate method are implemented with efficient algorithm and C++ code.

References

Hu, C., Pozdnyakov, V. & Yan, J. (2020). Density and distribution evaluation for convolution of independent gamma variables. * Computational Statistics*, 35(1), 327–342. https://doi.org/10.1007/s00180-019-00924-9

{copula}: Multivariate Dependence with Copulas

gpl-badge

{copula} provides classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.

References

Yan, J. (2007). Enjoy the joy of copulas: With a package copula. Journal of Statistical Software, 21(4), 1–21. https://doi.org/10.18637/jss.v021.i04
Kojadinovic, I., & Yan, J. (2010). Modeling multivariate distributions with continuous margins using the copula R package. Journal of Statistical Software, 34(9), 1–20. https://doi.org/10.18637/jss.v034.i09
Hofert, M., Kojadinovic, I., Mächler, M., & Yan, J. (2018). Elements of Copula Modeling with R. Springer. https://doi.org/10.1007/978-3-319-89635-9

{dacc}: Detection and Attribution Analysis of Climate Change

dacc-dev

{dacc} provides functions for conducting detection and attribution of climate change using methods that include optimal fingerprinting via generalized total least squares or estimating equation approach from Ma et al. (2023). Additionally, it offers shrinkage estimators for covariance matrix from Ledoit and Wolf.

References

Ma, S., Wang, T., Yan, J., & Zhang, X. (2023). Optimal fingerprinting with estimating equations. Journal of Climate, 36(20), 7109–7122. https://doi.org/10.1175/JCLI-D-22-0681.1

{dynsurv}: Dynamic Models for Survival Data

dynsurv-dev

{dynsurv} provides functions to fit time-varying coefficient models for interval censored and right censored survival data. Three major approaches are implemented:

Bayesian Cox model with time-independent, time-varying or dynamic coefficients for right censored and interval censored data.
Spline based time-varying coefficient Cox model for right censored data.
Transformation model with time-varying coefficients for right censored data using estimating equations.

References

Wang, X., Chen, M., & Yan, J. (2013). Bayesian dynamic regression models for interval censored survival data with application to children dental health. Lifetime Data Analysis, 19(3), 297–316. https://doi.org/10.1007/s10985-013-9246-8
Wang, W., Chen, M., Chiou, S. H., Lai, H., Wang, X., Yan, J., & Zhang, Z. (2016). Onset of persistent pseudomonas aeruginosa infection in children with cystic fibrosis with interval censored data. BMC Medical Research Methodology, 16(1), 122. https://doi.org/10.1186/s12874-016-0220-5

{eva}: Extreme Value Analysis with Goodness-of-Fit Testing

eva-dev

{eva} provides functions for:

Goodness-of-fit tests for selection of r in the r-largest order statistics (GEVr) model.
Goodness-of-fit tests for threshold selection in the Generalized Pareto distribution (GPD).
Random number generation and density functions for the GEVr distribution.
Profile likelihood for return level estimation using the GEVr and Generalized Pareto distributions.
P-value adjustments for sequential, multiple testing error control.
Non-stationary fitting of GEVr and GPD.

References

Bader, B., Yan, J. & Zhang, X. (2017). Automated selection of \(r\) for the \(r\) largest order statistics approach with adjustment for sequential testing. Statistics and Computing, 27(6), 1435–1451. https://doi.org/10.1007/s11222-016-9697-3
Bader, B., Yan, J. & Zhang, X. (2018) Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Annals of Applied Statistics, 12(1), 310–329. https://doi.org/10.1214/17-AOAS1092

{geepack}: Generalized Estimating Equations

geepack-dev

{geepack} provides generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.

References

Yan, J., & Fine, J. (2004). Estimating equations for association structures. Statistics in Medicine, 23(6), 859–874. https://doi.org/10.1002/sim.1650
Halekoh, U., Højsgaard, S., & Yan, J. (2006). The R package geepack for generalized estimating equations. Journal of Statistical Software, 15(2), 1–11. https://doi.org/10.18637/jss.v015.i02

{intsurv}: Integrative Survival Modeling

intsurv-dev

{intsurv} contains implementations of

integrative Cox model with uncertain event times (Wang et al., 2020)
Cox cure rate model with uncertain event status (Wang et al., 2020)

and other survival analysis routines, including

regular Cox cure rate model
regularized Cox cure rate model with elastic net penalty
weighted concordance index

References

Wang, W., Aseltine, R. H., Chen, K., & Yan, J. (2020). Integrative survival analysis with uncertain event times in application to a suicide risk study. Annals of Applied Statistics, 14(1), 51–73. https://doi.org/10.1214/19-AOAS1287
Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2020). Suicide risk modeling with uncertain diagnostic records. arXiv preprint arXiv:2009.02597. https://doi.org/10.48550/arXiv.2009.02597

{jds.rmd}: R Markdown Templates for Journal of Data Science

jds.rmd-dev

{jds.rmd} provides R Markdown templates intended for Journal of Data Science, which can be useful for authoring a manuscript with code chunks or producing tables/figures on the fly.

{KMsurv}: Data sets from Klein and Moeschberger (1997), Survival Analysis

gpl-badge

{KMsurv} provides data sets and functions for Klein and Moeschberger (1997), “Survival Analysis, Techniques for Censored and Truncated Data”, Springer.

{ramps}: Bayesian Geostatistical Modeling with RAMPS

ramps-dev

{ramps} provides functions for Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm, designed to lower autocorrelation in MCMC samples.

References

Smith, B., Yan, J., Cowles, M. (2008). Unified geostatistical modeling for data fusion and spatial heteroskedasticity with R package ramps. Journal of Statistical Software, 25(10), 1–21. https://doi.org/10.18637/jss.v025.i10

{reda}: Recurrent Event Data Analysis

reda-dev

{reda} provides functions for:

Simulating survival, recurrent event, and multiple event data from stochastic process point of view.
Exploring and modeling recurrent event data through the mean cumulative function (MCF) or also called the Nelson-Aalen estimator of the cumulative hazard rate function, and gamma frailty model with spline rate function.
Comparing two-sample recurrent event responses with the pseudo-score tests.
Fitting Gamma fraitly model with spline baseline rate function.

{rrpack}: Reduced-Rank Regression

gpl-badge

{rrpack} provides implementations for multivariate regression methodologies including reduced-rank regression (RRR), reduced-rank ridge regression (RRS), robust reduced-rank regression (R4), generalized/mixed-response reduced-rank regression (mRRR), row-sparse reduced-rank regression (SRRR), reduced-rank regression with a sparse singular value decomposition (RSSVD), and sparse and orthogonal factor regression (SOFAR).

{sgee}: Stagewise Generalized Estimating Equations

gpl-badge

{sgee} provides stagewise techniques implemented with generalized estimating equations to handle individual, group, bi-level, and interaction selection. Stagewise approaches start with an empty model and slowly build the model over several iterations, which yields a ‘path’ of candidate models from which model selection can be performed. This ‘slow brewing’ approach gives stagewise techniques a unique flexibility that allows simple incorporation of generalized estimating equations. See Vaughan et al. (2017) for details.

References

Vaughan, G., Aseltine, R., Chen, K., & Yan, J. (2017). Stagewise generalized estimating equations with grouped variables. Biometrics, 73(4), 1332–1342. https://doi.org/10.1111/biom.12669

{smam}: Statistical Modeling of Animal Movements

smam-dev

{smam} provides functions for animal movement models, including:

Moving-resting process with embedded Brownian motion. See Yan et al. (2014) and Pozdnyakov et al. (2019) for details.
Brownian motion with measurement error. See Pozdnyakov et al. (2014) for details.
Moving-resting-handling process with embedded Brownian motion. See Pozdnyakov et al. (2020) for details.
Moving-resting process with measurement error. See Hu et al. (2021) for details.
Moving-moving process with two Embedded Brownian motions.

References

Pozdnyakov, V., Meyer, T., Wang, Y., & Yan, J. (2014). On modeling animal movements using Brownian motion with measurement error. Ecology, 95(2), 247–253. https://doi.org/10.1890/13-0532.1
Yan, J., Chen, Y., Lawrence-Apfel, K., Ortega, I., Pozdnyakov, V., Williams, S., & Meyer, T. (2014). A moving-resting process with an embedded Brownian motion for animal movements. Population Ecology, 56(2), 401–415. https://doi.org/10.1007/s10144-013-0428-8
Pozdnyakov, V., Elbroch, L., Labarga, A., Meyer, T., & Yan, J. (2019). Discretely observed Brownian motion governed by telegraph process: Estimation. Methodology and Computing in Applied Probability, 21(3), 907–920. https://doi.org/10.1007/s11009-017-9547-6
Pozdnyakov, V., Elbroch, L., Hu, C., Meyer, T., & Yan, J. (2020). On estimation for Brownian motion governed by telegraph process with multiple off states. Methodology and Computing in Applied Probability, 22(3), 1275–1291. https://doi.org/10.1007/s11009-020-09774-1
Hu, C., Elbroch, L., Pozdnyakov, V., & Yan, J. (2021). Moving-resting process with measurement error in animal movement modeling. Methods of Ecology and Evolution, 12(11), 2221–2233. https://doi.org/10.1111/2041-210X.13694

{som}: Self-Organizing Map

gpl-badge

{som} provides functions for self-organizing map (with application in gene clustering).

{spef}: Semiparametric Estimating Functions

spef-dev

{spef} provides a collection of functions to fit semiparametric regression models for panel count survival data. Estimating procedures include:

Wang-Yan’s augmented estimating equations (AEE, AEEX)
Huang-Wang-Zhang’s method (HWZ)
Zhang’s maximum pseudolikelihood (MPL)
Maximum pseudolikelihood with I-Splines (MPLs)
Maximum likelihood with I-Splines (MLs)
Sun-Wei’s method (‘EE.SWa’, ‘EE.SWb’, ‘EE.SWc’)
Hu-Sun-Wei’s method (‘EE.HSWc’, ‘EE.HSWm’)
Accelerated mean model (‘AMM’).

References

Wang, X., & Yan, J. (2011). Fitting semiparametric regressions for panel count survival data with an R package spef. Computer Methods and Programs in Biomedicine, 104(2), 278–285. https://doi.org/10.1016/j.cmpb.2010.10.005
Chiou, S., Huang, C., Xu, G., & Yan, J. (2019). Semiparametric regression analysis of panel count data: A practical review. International Statistical Review, 87(1), 24–43. https://doi.org/10.1111/insr.12271

{splines2}: Regression Spline Functions and Classes

splines2-dev

{splines2} is a supplement package to the base package {splines}. It provides functions to construct basis matrices of

B-splines
M-splines
I-splines
convex splines (C-splines)
periodic splines
natural cubic splines
generalized Bernstein polynomials
their integrals (except C-splines) and derivatives of given order by closed-form recursive formulas

In addition to the R interface, {splines2} provides a C++ header-only library integrated with {Rcpp}, which allows the construction of spline basis functions directly in C++ with the help of {Rcpp} and {RcppArmadillo}. Thus, it can also be treated as one of the Rcpp* packages. See Wang and Yan (2021) for details.

References

Wang, W., & Yan, J. (2021). Shape-restricted regression splines with R package splines2. Journal of Data Science, 19(3), 498–517. https://doi.org/10.6339/21-JDS1020

{tls}: Tools of Total Least Squares in Error-in-Variables Models

tls-dev

{tls} provides functions for point and interval estimation in error-in-variables models via total least squares or generalized total least squares method.

{touch}: Tools of Utilization and Cost in Healthcare

touch-dev

{touch} provides R implementation of the software tools developed in the H-CUP (Healthcare Cost and Utilization Project) and AHRQ (Agency for Healthcare Research and Quality). It contains functions for:

Mapping ICD-9 codes to the AHRQ comorbidity measures.
Translating ICD-9 (resp. ICD-10) codes to ICD-10 (resp. ICD-9) codes based on GEM (General Equivalence Mappings) from CMS (Centers for Medicare and Medicaid Services).

{tpr}: Temporal Process Regression

gpl-badge

{tpr} provides functions to fit regression models for temporal process responses with time-varying and time-independent coefficients.

References

Fine, J. P., Yan, J., & Kosorok, M. R. (2004). Temporal process regression. Biometrika, 91(3), 683–703. https://doi.org/10.1093/biomet/91.3.683
Yan, J., & Fine, J. (2004). Estimating equations for association structures. Statistics in Medicine, 23(6), 859–874. https://doi.org/10.1002/sim.1650

{wdnet}: Weighted and Directed Networks

wdnet-dev

{wdnet} provides functions for implementations of network analysis, including:

Assortativity coefficient of weighted and directed networks.
Centrality measures for weighted and directed networks.
Clustering coefficient of weighted and directed networks.
Rewiring networks with given assortativity coefficients.
Preferential attachment network generation.

References

Yuan, Y., Yan, J., & Zhang, P. (2021). Assortativity measures for weighted and directed networks. Journal of Complex Networks, 9(2), cnab017. https://doi.org/10.1093/comnet/cnab017
Zhang, P., Wang, T., & Yan, J. (2022). PageRank centrality and algorithms for weighted, directed networks. Physica A: Statistical Mechanics and its Applications, 586, 126438. https://doi.org/10.1016/j.physa.2021.126438
Wang, T., Yan, J., Yuan, Y., & Zhang, P. (2022). Generating directed networks with predetermined assortativity measures. Statistics and Computing, 32(5), 91. https://doi.org/10.1007/s11222-022-10161-8
Yuan, Y., Wang, T., Yan, J., & Zhang, P. (2023). Generating general preferential attachment networks with R package wdnet. Journal of Data Science, 21(3), 538–556. https://doi.org/10.6339/23-JDS1110