Software

R packages on CRAN

Name Latest Release Downloads (RStudio CRAN Mirror Since October 2012)
aftgee aftgee-version aftgee-log
clusrank clusrank-version clusrank-log
coga coga-version coga-log
copula copula-version copula-log
dacc dacc-version dacc-log
dynsurv dynsurv-version dynsurv-log
eva eva-version eva-log
geepack geepack-version geepack-log
intsurv intsurv-version intsurv-log
jds.rmd jds.rmd-version jds.rmd-log
KMsurv KMsurv-version KMsurv-log
ramps ramps-version ramps-log
reda reda-version reda-log
rrpack rrpack-version rrpack-log
sgee sgee-version sgee-log
smam smam-version smam-log
som som-version som-log
spef spef-version spef-log
splines2 splines2-version splines2-log
tls tls-version tls-log
tpr tpr-version tpr-log
touch touch-version touch-log
wdnet wdnet-version wdnet-log

{aftgee}: Accelerated Failure Time Model with Generalized Estimating Equations

gpl-badge aftgee-dev aftgee-ghs

{aftgee} provides a collection of methods for both the rank-based estimates and least-square estimates to the Accelerated Failure Time (AFT) model.

  • For rank-based estimation, it provides the computationally efficient Gehan’s weight and the general’s weight such as the logrank weight. See Chiou et al. (2014, 2015) for details.
  • For the least-square estimation, the estimating equation is solved with generalized estimating equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE’s setting. See Chiou et al. (2014) for details.

References

  • Chiou, S., Kang, S., Kim, J., & Yan, J. (2014). Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations. Lifetime Data Analysis, 20(4), 599–618. https://doi.org/10.1007/s10985-014-9292-x
  • Chiou, S., Kang, S. & Yan, J. (2014). Fast accelerated failure time modeling for case-cohort data. Statistics and Computing, 24(4), 559–568. https://doi.org/10.1007/s11222-013-9388-2
  • Chiou, S., Kang, S. & Yan, J. (2015). Rank-based estimating equations with general weight for accelerated failure time models: An induced smoothing approach. Statistics in Medicine, 34(9): 1495–1510. https://doi.org/10.1002/sim.6415

{clusrank}: Wilcoxon Rank Tests for Clustered Data

gpl-badge clusrank-dev clusrank-ghs

{clusrank} provides functions for Wilcoxon rank sum test and Wilcoxon signed rank test for clustered data. See Jiang et. al (2020) for details.

References

  • Jiang, Y., He, X., Lee, M. T., Rosner, B., & Yan, J. (2020). Wilcoxon rank-based tests for clustered data with R package clusrank. Journal of Statistical Software, 96(6), 1–26. https://doi.org/10.18637/jss.v096.i06

{coga}: Convolution of Gamma Distributions

gpl-badge coga-dev coga-ghs

{coga} provides functions for density and distribution evaluation of convolution of gamma distributions. Two related exact methods and one approximate method are implemented with efficient algorithm and C++ code.

References

  • Hu, C., Pozdnyakov, V. & Yan, J. (2020). Density and distribution evaluation for convolution of independent gamma variables. * Computational Statistics*, 35(1), 327–342. https://doi.org/10.1007/s00180-019-00924-9

{copula}: Multivariate Dependence with Copulas

gpl-badge

{copula} provides classes (S4) of commonly used elliptical, Archimedean, extreme-value and other copula families, as well as their rotations, mixtures and asymmetrizations. Nested Archimedean copulas, related tools and special functions. Methods for density, distribution, random number generation, bivariate dependence measures, Rosenblatt transform, Kendall distribution function, perspective and contour plots. Fitting of copula models with potentially partly fixed parameters, including standard errors. Serial independence tests, copula specification tests (independence, exchangeability, radial symmetry, extreme-value dependence, goodness-of-fit) and model selection based on cross-validation. Empirical copula, smoothed versions, and non-parametric estimators of the Pickands dependence function.

References


{dacc}: Detection and Attribution Analysis of Climate Change

gpl-badge dacc-dev dacc-ghs

{dacc} provides functions for conducting detection and attribution of climate change using methods that include optimal fingerprinting via generalized total least squares or estimating equation approach from Ma et al. (2023). Additionally, it offers shrinkage estimators for covariance matrix from Ledoit and Wolf.

References


{dynsurv}: Dynamic Models for Survival Data

gpl-badge dynsurv-dev dynsurv-ghs

{dynsurv} provides functions to fit time-varying coefficient models for interval censored and right censored survival data. Three major approaches are implemented:

  1. Bayesian Cox model with time-independent, time-varying or dynamic coefficients for right censored and interval censored data.
  2. Spline based time-varying coefficient Cox model for right censored data.
  3. Transformation model with time-varying coefficients for right censored data using estimating equations.

References

  • Wang, X., Chen, M., & Yan, J. (2013). Bayesian dynamic regression models for interval censored survival data with application to children dental health. Lifetime Data Analysis, 19(3), 297–316. https://doi.org/10.1007/s10985-013-9246-8
  • Wang, W., Chen, M., Chiou, S. H., Lai, H., Wang, X., Yan, J., & Zhang, Z. (2016). Onset of persistent pseudomonas aeruginosa infection in children with cystic fibrosis with interval censored data. BMC Medical Research Methodology, 16(1), 122. https://doi.org/10.1186/s12874-016-0220-5

{eva}: Extreme Value Analysis with Goodness-of-Fit Testing

gpl-badge eva-dev eva-ghs

{eva} provides functions for:

  1. Goodness-of-fit tests for selection of r in the r-largest order statistics (GEVr) model.
  2. Goodness-of-fit tests for threshold selection in the Generalized Pareto distribution (GPD).
  3. Random number generation and density functions for the GEVr distribution.
  4. Profile likelihood for return level estimation using the GEVr and Generalized Pareto distributions.
  5. P-value adjustments for sequential, multiple testing error control.
  6. Non-stationary fitting of GEVr and GPD.

References

  • Bader, B., Yan, J. & Zhang, X. (2017). Automated selection of \(r\) for the \(r\) largest order statistics approach with adjustment for sequential testing. Statistics and Computing, 27(6), 1435–1451. https://doi.org/10.1007/s11222-016-9697-3
  • Bader, B., Yan, J. & Zhang, X. (2018) Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Annals of Applied Statistics, 12(1), 310–329. https://doi.org/10.1214/17-AOAS1092

{geepack}: Generalized Estimating Equations

gpl-badge geepack-dev geepack-ghs

{geepack} provides generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses.

References

  • Yan, J., & Fine, J. (2004). Estimating equations for association structures. Statistics in Medicine, 23(6), 859–874. https://doi.org/10.1002/sim.1650
  • Halekoh, U., Højsgaard, S., & Yan, J. (2006). The R package geepack for generalized estimating equations. Journal of Statistical Software, 15(2), 1–11. https://doi.org/10.18637/jss.v015.i02

{intsurv}: Integrative Survival Modeling

gpl-badge intsurv-dev intsurv-ghs

{intsurv} contains implementations of

  • integrative Cox model with uncertain event times (Wang et al., 2020)
  • Cox cure rate model with uncertain event status (Wang et al., 2020)

and other survival analysis routines, including

  • regular Cox cure rate model
  • regularized Cox cure rate model with elastic net penalty
  • weighted concordance index

References

  • Wang, W., Aseltine, R. H., Chen, K., & Yan, J. (2020). Integrative survival analysis with uncertain event times in application to a suicide risk study. Annals of Applied Statistics, 14(1), 51–73. https://doi.org/10.1214/19-AOAS1287
  • Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2020). Suicide risk modeling with uncertain diagnostic records. arXiv preprint arXiv:2009.02597. https://doi.org/10.48550/arXiv.2009.02597

{jds.rmd}: R Markdown Templates for Journal of Data Science

gpl-badge jds.rmd-dev jds.rmd-ghs

{jds.rmd} provides R Markdown templates intended for Journal of Data Science, which can be useful for authoring a manuscript with code chunks or producing tables/figures on the fly.


{KMsurv}: Data sets from Klein and Moeschberger (1997), Survival Analysis

gpl-badge

{KMsurv} provides data sets and functions for Klein and Moeschberger (1997), “Survival Analysis, Techniques for Censored and Truncated Data”, Springer.


{ramps}: Bayesian Geostatistical Modeling with RAMPS

gpl-badge ramps-dev ramps-ghs

{ramps} provides functions for Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm, designed to lower autocorrelation in MCMC samples.

References

  • Smith, B., Yan, J., Cowles, M. (2008). Unified geostatistical modeling for data fusion and spatial heteroskedasticity with R package ramps. Journal of Statistical Software, 25(10), 1–21. https://doi.org/10.18637/jss.v025.i10

{reda}: Recurrent Event Data Analysis

gpl-badge reda-dev reda-ghs

{reda} provides functions for:

  1. Simulating survival, recurrent event, and multiple event data from stochastic process point of view.
  2. Exploring and modeling recurrent event data through the mean cumulative function (MCF) or also called the Nelson-Aalen estimator of the cumulative hazard rate function, and gamma frailty model with spline rate function.
  3. Comparing two-sample recurrent event responses with the pseudo-score tests.
  4. Fitting Gamma fraitly model with spline baseline rate function.

{rrpack}: Reduced-Rank Regression

gpl-badge

{rrpack} provides implementations for multivariate regression methodologies including reduced-rank regression (RRR), reduced-rank ridge regression (RRS), robust reduced-rank regression (R4), generalized/mixed-response reduced-rank regression (mRRR), row-sparse reduced-rank regression (SRRR), reduced-rank regression with a sparse singular value decomposition (RSSVD), and sparse and orthogonal factor regression (SOFAR).


{sgee}: Stagewise Generalized Estimating Equations

gpl-badge

{sgee} provides stagewise techniques implemented with generalized estimating equations to handle individual, group, bi-level, and interaction selection. Stagewise approaches start with an empty model and slowly build the model over several iterations, which yields a ‘path’ of candidate models from which model selection can be performed. This ‘slow brewing’ approach gives stagewise techniques a unique flexibility that allows simple incorporation of generalized estimating equations. See Vaughan et al. (2017) for details.

References

  • Vaughan, G., Aseltine, R., Chen, K., & Yan, J. (2017). Stagewise generalized estimating equations with grouped variables. Biometrics, 73(4), 1332–1342. https://doi.org/10.1111/biom.12669

{smam}: Statistical Modeling of Animal Movements

gpl-badge smam-dev smam-ghs

{smam} provides functions for animal movement models, including:

  • Moving-resting process with embedded Brownian motion. See Yan et al. (2014) and Pozdnyakov et al. (2019) for details.
  • Brownian motion with measurement error. See Pozdnyakov et al. (2014) for details.
  • Moving-resting-handling process with embedded Brownian motion. See Pozdnyakov et al. (2020) for details.
  • Moving-resting process with measurement error. See Hu et al. (2021) for details.
  • Moving-moving process with two Embedded Brownian motions.

References

  • Pozdnyakov, V., Meyer, T., Wang, Y., & Yan, J. (2014). On modeling animal movements using Brownian motion with measurement error. Ecology, 95(2), 247–253. https://doi.org/10.1890/13-0532.1
  • Yan, J., Chen, Y., Lawrence-Apfel, K., Ortega, I., Pozdnyakov, V., Williams, S., & Meyer, T. (2014). A moving-resting process with an embedded Brownian motion for animal movements. Population Ecology, 56(2), 401–415. https://doi.org/10.1007/s10144-013-0428-8
  • Pozdnyakov, V., Elbroch, L., Labarga, A., Meyer, T., & Yan, J. (2019). Discretely observed Brownian motion governed by telegraph process: Estimation. Methodology and Computing in Applied Probability, 21(3), 907–920. https://doi.org/10.1007/s11009-017-9547-6
  • Pozdnyakov, V., Elbroch, L., Hu, C., Meyer, T., & Yan, J. (2020). On estimation for Brownian motion governed by telegraph process with multiple off states. Methodology and Computing in Applied Probability, 22(3), 1275–1291. https://doi.org/10.1007/s11009-020-09774-1
  • Hu, C., Elbroch, L., Pozdnyakov, V., & Yan, J. (2021). Moving-resting process with measurement error in animal movement modeling. Methods of Ecology and Evolution, 12(11), 2221–2233. https://doi.org/10.1111/2041-210X.13694

{som}: Self-Organizing Map

gpl-badge

{som} provides functions for self-organizing map (with application in gene clustering).


{spef}: Semiparametric Estimating Functions

gpl-badge spef-dev spef-ghs

{spef} provides a collection of functions to fit semiparametric regression models for panel count survival data. Estimating procedures include:

  • Wang-Yan’s augmented estimating equations (AEE, AEEX)
  • Huang-Wang-Zhang’s method (HWZ)
  • Zhang’s maximum pseudolikelihood (MPL)
  • Maximum pseudolikelihood with I-Splines (MPLs)
  • Maximum likelihood with I-Splines (MLs)
  • Sun-Wei’s method (‘EE.SWa’, ‘EE.SWb’, ‘EE.SWc’)
  • Hu-Sun-Wei’s method (‘EE.HSWc’, ‘EE.HSWm’)
  • Accelerated mean model (‘AMM’).

References

  • Wang, X., & Yan, J. (2011). Fitting semiparametric regressions for panel count survival data with an R package spef. Computer Methods and Programs in Biomedicine, 104(2), 278–285. https://doi.org/10.1016/j.cmpb.2010.10.005
  • Chiou, S., Huang, C., Xu, G., & Yan, J. (2019). Semiparametric regression analysis of panel count data: A practical review. International Statistical Review, 87(1), 24–43. https://doi.org/10.1111/insr.12271

{splines2}: Regression Spline Functions and Classes

gpl-badge splines2-dev splines2-ghs JDS

{splines2} is a supplement package to the base package {splines}. It provides functions to construct basis matrices of

  • B-splines
  • M-splines
  • I-splines
  • convex splines (C-splines)
  • periodic splines
  • natural cubic splines
  • generalized Bernstein polynomials
  • their integrals (except C-splines) and derivatives of given order by closed-form recursive formulas

In addition to the R interface, {splines2} provides a C++ header-only library integrated with {Rcpp}, which allows the construction of spline basis functions directly in C++ with the help of {Rcpp} and {RcppArmadillo}. Thus, it can also be treated as one of the Rcpp* packages. See Wang and Yan (2021) for details.

References


{tls}: Tools of Total Least Squares in Error-in-Variables Models

gpl-badge tls-dev tls-ghs

{tls} provides functions for point and interval estimation in error-in-variables models via total least squares or generalized total least squares method.


{touch}: Tools of Utilization and Cost in Healthcare

gpl-badge touch-dev touch-ghs

{touch} provides R implementation of the software tools developed in the H-CUP (Healthcare Cost and Utilization Project) and AHRQ (Agency for Healthcare Research and Quality). It contains functions for:

  1. Mapping ICD-9 codes to the AHRQ comorbidity measures.
  2. Translating ICD-9 (resp. ICD-10) codes to ICD-10 (resp. ICD-9) codes based on GEM (General Equivalence Mappings) from CMS (Centers for Medicare and Medicaid Services).

{tpr}: Temporal Process Regression

gpl-badge

{tpr} provides functions to fit regression models for temporal process responses with time-varying and time-independent coefficients.

References


{wdnet}: Weighted and Directed Networks

gpl-badge wdnet-dev wdnet-gls

{wdnet} provides functions for implementations of network analysis, including:

  1. Assortativity coefficient of weighted and directed networks.
  2. Centrality measures for weighted and directed networks.
  3. Clustering coefficient of weighted and directed networks.
  4. Rewiring networks with given assortativity coefficients.
  5. Preferential attachment network generation.

References

  • Yuan, Y., Yan, J., & Zhang, P. (2021). Assortativity measures for weighted and directed networks. Journal of Complex Networks, 9(2), cnab017. https://doi.org/10.1093/comnet/cnab017
  • Zhang, P., Wang, T., & Yan, J. (2022). PageRank centrality and algorithms for weighted, directed networks. Physica A: Statistical Mechanics and its Applications, 586, 126438. https://doi.org/10.1016/j.physa.2021.126438
  • Wang, T., Yan, J., Yuan, Y., & Zhang, P. (2022). Generating directed networks with predetermined assortativity measures. Statistics and Computing, 32(5), 91. https://doi.org/10.1007/s11222-022-10161-8
  • Yuan, Y., Wang, T., Yan, J., & Zhang, P. (2023). Generating general preferential attachment networks with R package wdnet. Journal of Data Science, 21(3), 538–556. https://doi.org/10.6339/23-JDS1110