Title: | Joint and Individual Regression |
---|---|
Description: | An R package that implements the JICO algorithm [Wang, P., Wang, H., Li, Q., Shen, D., & Liu, Y. (2022). <arXiv:2209.12388>]. It aims at solving the multi-group regression problem. The algorithm decomposes the responses from multiple groups into shared and group-specific components, which are driven by low-rank approximations of joint and individual structures from the covariates respectively. |
Authors: | Peiyao Wang [aut, cre] |
Maintainer: | Peiyao Wang <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.0.9000 |
Built: | 2024-11-01 05:55:24 UTC |
Source: | https://github.com/peiyaow/jico |
This function converts the CR algorithm outputs to the regression coefficients
C2beta(X, Y, C, lambda)
C2beta(X, Y, C, lambda)
X |
The input feature matrix |
Y |
The input response vector |
C |
The weight matrix computed from CR algorithm |
lambda |
Deprecated. Regularization parameter if L2 penalization is used for CR. JICO uses zero as default. |
A list of regression coefficients to perform the prediction task.
This function performs an iteration update of the JICO algorithm using the CR algorithm. Details can be found in Appendix B in the JICO paper [Wang, P., Wang, H., Li, Q., Shen, D., & Liu, Y. (2022). <arXiv:2209.12388>]
continuum( X, Y, lambda, gam, om, U_old = matrix(, nrow = nrow(X), ncol = 0), D_old = matrix(, nrow = 0, ncol = 0), V_old = matrix(, nrow = 0, ncol = 0), Z_old = matrix(, nrow = 0, ncol = 0), verbose = FALSE )
continuum( X, Y, lambda, gam, om, U_old = matrix(, nrow = nrow(X), ncol = 0), D_old = matrix(, nrow = 0, ncol = 0), V_old = matrix(, nrow = 0, ncol = 0), Z_old = matrix(, nrow = 0, ncol = 0), verbose = FALSE )
X |
The input feature matrix |
Y |
The input response vector |
lambda |
Deprecated. Regularization parameter if L2 penalization is used for CR. JICO uses zero as default. |
gam |
The gamma parameter in the CR algorithm. Set gam=0 for OLS model, gam=0.5 for PLS model, gam >= 1e10 for PCR model |
om |
The desired number of weight vectors to obtain in the CR algorithm, i.e. the predefined rank of joint or individual component. |
U_old |
The given inputs U from the previous JICO iteration step |
D_old |
The given inputs D from the previous JICO iteration step |
V_old |
The given inputs V from the previous JICO iteration step |
Z_old |
The given inputs Z from the previous JICO iteration step |
verbose |
Boolean. If it's desired to print out intermediate outputs |
A list of CR outputs that serve as the input for the next JICO iteration
This function iteratively solves the multi-group regression problem using the JICO algorithm [Wang, P., Wang, H., Li, Q., Shen, D., & Liu, Y. (2022). <arXiv:2209.12388>]
continuum.multigroup.iter( X.list, Y.list, lambda = 0, gam, rankJ, rankA, maxiter = 1000, conv = 1e-07, center.X = TRUE, scale.X = TRUE, center.Y = TRUE, scale.Y = TRUE, orthIndiv = FALSE, I.initial = NULL, sd = 0 )
continuum.multigroup.iter( X.list, Y.list, lambda = 0, gam, rankJ, rankA, maxiter = 1000, conv = 1e-07, center.X = TRUE, scale.X = TRUE, center.Y = TRUE, scale.Y = TRUE, orthIndiv = FALSE, I.initial = NULL, sd = 0 )
X.list |
The list of feature matrices from multiple groups. |
Y.list |
The list of feature vectors from multiple groups. |
lambda |
Deprecated. Regularization parameter if L2 penalization is used for CR. JICO uses zero as default. |
gam |
The gamma parameter in the CR algorithm. Set gam=0 for OLS model, gam=0.5 for PLS model, gam >= 1e10 for PCR model. |
rankJ |
The rank for the joint component. |
rankA |
The ranks for individual components. |
maxiter |
The maximum number of iterations to conduct before algorithm convergence. |
conv |
The tolerance level for convergence. |
center.X |
Boolean. If X should be preprocessed with centralization. |
scale.X |
Boolean. If X should be preprocessed with scaling. |
center.Y |
Boolean. If Y should be preprocessed with centralization. |
scale.Y |
Boolean. If Y should be preprocessed with scaling. |
orthIndiv |
Boolean. If we impose the orthogonality constraint on individual components. |
I.initial |
The initial values for individual components. |
sd |
The standard deviation used to generate random initial values for individual weight vectors. |
The estimated parameters from JICO.
set.seed(76) X1 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the first group X2 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the second group X.list = list(X1, X2) Y1 = matrix(stats::rnorm(50)) # responses for the first group Y2 = matrix(stats::rnorm(50)) # responses for the second group Y.list = list(Y1, Y2) ml.JICO = continuum.multigroup.iter( X.list, Y.list, gam=1e10, rankJ=1, rankA=c(1, 1), maxiter = 300 )
set.seed(76) X1 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the first group X2 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the second group X.list = list(X1, X2) Y1 = matrix(stats::rnorm(50)) # responses for the first group Y2 = matrix(stats::rnorm(50)) # responses for the second group Y.list = list(Y1, Y2) ml.JICO = continuum.multigroup.iter( X.list, Y.list, gam=1e10, rankJ=1, rankA=c(1, 1), maxiter = 300 )
This function generate data folds for cross validation given stratified samples
createFolds(strat_id, k)
createFolds(strat_id, k)
strat_id |
A vector of the stratified sample id. E.g. In total of 5 samples, first three from group 1, last two from group 2 -> c(1, 1, 1, 2, 2) |
k |
Number of folds to create. |
A list of sample indices in k folds.
This function performs K-fold cross validations to select the best tuning parameters for JICO.
cv.continnum.iter( X.list, Y.list, lambda = 0, parameter.set, nfolds = 10, maxiter = 100, center.X = TRUE, scale.X = TRUE, center.Y = TRUE, scale.Y = TRUE, orthIndiv = FALSE, plot = F, criteria = c("min", "1se"), sd = 0 )
cv.continnum.iter( X.list, Y.list, lambda = 0, parameter.set, nfolds = 10, maxiter = 100, center.X = TRUE, scale.X = TRUE, center.Y = TRUE, scale.Y = TRUE, orthIndiv = FALSE, plot = F, criteria = c("min", "1se"), sd = 0 )
X.list |
The list of feature matrices from multiple groups. |
Y.list |
The list of feature vectors from multiple groups. |
lambda |
Deprecated. Regularization parameter if L2 penalization is used for CR. JICO uses zero as default. |
parameter.set |
The set of parameters to be tuned on. Containing choices of rankJ, rankA and gamma. |
nfolds |
number of folds to perform CV |
maxiter |
The maximum number of iterations to conduct before algorithm convergence. |
center.X |
Boolean. If X should be preprocessed with centralization. |
scale.X |
Boolean. If X should be preprocessed with scaling. |
center.Y |
Boolean. If Y should be preprocessed with centralization. |
scale.Y |
Boolean. If Y should be preprocessed with scaling. |
orthIndiv |
Boolean. If we impose the orthogonality constraint on individual components. |
plot |
Boolean. If we want to plot the rMSE vs different parameters |
criteria |
criteria for selecting the best parameter. Use "min" to choose the parameter giving the best performance. Use "1se" to choose the simplest model that gives performance within 1se from the best one. |
sd |
The standard deviation used to generate random initial values for individual weight vectors. |
The parameter from the parameter.set that fit the training data the best.
set.seed(76) X1 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the first group X2 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the second group X.list = list(X1, X2) Y1 = matrix(stats::rnorm(50)) # responses for the first group Y2 = matrix(stats::rnorm(50)) # responses for the second group Y.list = list(Y1, Y2) cv.parameter.set = parameter.set.G_2( maxrankA = 1, maxrankJ = 1, gamma = 1e10 ) # enumerate the set of tuning parameters cv.ml.JICO = cv.continnum.iter( X.list, Y.list, parameter.set = cv.parameter.set, criteria = "min", nfold = 5, maxiter = 300 ) # fit the model and use CV to find the best parameters
set.seed(76) X1 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the first group X2 = MASS::mvrnorm(50, rep(0, 200), diag(200)) # covariates of the second group X.list = list(X1, X2) Y1 = matrix(stats::rnorm(50)) # responses for the first group Y2 = matrix(stats::rnorm(50)) # responses for the second group Y.list = list(Y1, Y2) cv.parameter.set = parameter.set.G_2( maxrankA = 1, maxrankJ = 1, gamma = 1e10 ) # enumerate the set of tuning parameters cv.ml.JICO = cv.continnum.iter( X.list, Y.list, parameter.set = cv.parameter.set, criteria = "min", nfold = 5, maxiter = 300 ) # fit the model and use CV to find the best parameters
This function returns a diagonal matrix using the input vector or number as diagonal.
DIAG(e)
DIAG(e)
e |
Diagonal element. Can be a vector or a number |
A square diagonal matrix using the input as diagonal elements
This function computes the SVD results from a given matrix X. This is used as the initialization for the continuum regression.
initialize.UDVZ(X)
initialize.UDVZ(X)
X |
The input feature matrix |
A list of SVD results that are served as CR algorithm's inputs.
This function generate set of hyperparameters when there are two groups.
parameter.set.G_2(maxrankA, maxrankJ, gamma)
parameter.set.G_2(maxrankA, maxrankJ, gamma)
maxrankA |
The maximum rank for individual component |
maxrankJ |
The maximum rank for joint component |
gamma |
The gamma parameter. Need to be fixed. |
A list of hyperparameter candidates
This function generate set of hyperparameters when there are three groups.
parameter.set.G_3(maxrankA, maxrankJ, gamma)
parameter.set.G_3(maxrankA, maxrankJ, gamma)
maxrankA |
The maximum rank for individual component |
maxrankJ |
The maximum rank for joint component |
gamma |
The gamma parameter. Need to be fixed. |
A list of hyperparameter candidates
This function generate set of hyperparameters when the individual ranks are the same
parameter.set.rankA_eq(G, maxrankA, maxrankJ, gamma.list)
parameter.set.rankA_eq(G, maxrankA, maxrankJ, gamma.list)
G |
number of groups |
maxrankA |
The maximum rank for individual component |
maxrankJ |
The maximum rank for joint component |
gamma.list |
The list of candidate gammas to be tuned |
A list of hyperparameter candidates
This function computes the general inverse of X when it exists. If X contains a degenerated dimension, return the original X.
SOLVE(x)
SOLVE(x)
x |
The input matrix X |
Either the general inverse of X or the X itself