| Title: | Leave-Out Variance Component Estimation for Two-Way Fixed Effects Models |
|---|---|
| Description: | Implements leave-out estimation of variance components in two-way fixed effects models as an 'R' translation of the original 'MATLAB' package of Kline, Saggio, and Solvsten (2020) <doi:10.3982/ECTA16410>. The package includes graph-based connected-set pruning, leave-out bias correction, leverage computation by exact and randomized algorithms, fixed effect estimation helpers, and companion model-fit summaries for matched worker-firm panels in the spirit of Abowd, Kramarz, and Margolis (1999) <doi:10.1111/1468-0262.00020>. |
| Authors: | Vahid Moghani [aut, cre] |
| Maintainer: | Vahid Moghani <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-22 08:48:40 UTC |
| Source: | https://github.com/cran/LeaveOutKSS |
Constructs a symmetric sparse adjacency matrix of firm mobility links using worker transitions. Only movers contribute edges.
build_adj(id, firmid)build_adj(id, firmid)
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
A sparse square adjacency matrix whose nonzero entries count observed worker moves between firms.
connected_set(), pruning_unbal_v3()
build_adj( id = c(1, 1, 2, 2, 3, 3), firmid = c(1, 2, 2, 3, 3, 3) )build_adj( id = c(1, 1, 2, 2, 3, 3), firmid = c(1, 2, 2, 3, 3, 3) )
Builds a mobility graph from worker moves across firms and keeps only the largest connected component of firms. This is the first graph-based trimming step used by the leave-out routines before leave-one-worker-out pruning.
connected_set( y, id, firmid, lagfirmid, controls, prov_indicator = rep(1, length(y)), progress = FALSE )connected_set( y, id, firmid, lagfirmid, controls, prov_indicator = rep(1, length(y)), progress = FALSE )
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
lagfirmid |
Lagged firm identifier vector, typically constructed within worker. |
controls |
Matrix of controls aligned with the observations. |
prov_indicator |
Optional provider indicator carried along for interface compatibility. |
progress |
Logical scalar indicating whether stage messages should be emitted. |
The graph is built from observed worker transitions between lagged and current firms. Firms not connected to the largest component are removed. The function relabels worker and firm identifiers internally but preserves the originals in the returned table.
A list with two elements: DT, a data.table containing the
restricted sample and original identifiers, and DT_controls, the
correspondingly restricted controls.
pruning_unbal_v3(), strongc_set(), build_adj()
Solves a fixed effects model using conjugate gradients and returns fitted
values and adjusted outcomes as an object. When firmid is omitted, the
routine estimates a one-way worker fixed effects model. When firmid is
supplied, it estimates a two-way worker-firm fixed effects model.
fast_fe_est( y, id, firmid = NULL, controls = NULL, csv_file = NULL, progress = FALSE )fast_fe_est( y, id, firmid = NULL, controls = NULL, csv_file = NULL, progress = FALSE )
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Optional firm identifier vector. If |
controls |
Optional matrix or vector of controls. |
csv_file |
Optional path for exporting the fitted values table as a
|
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
This helper is useful when the goal is to recover fitted values and
residualized outcomes rather than the leave-out variance decomposition. The
returned fitted-values table includes y_hat, y_adj, and the original
identifiers. When csv_file is supplied, that table is also written to disk.
An object of class "fast_fe_est_result" containing the fitted
values table, model metadata, and elapsed time.
leave_out_KSS(), rsquared_comp()
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) res <- fast_fe_est( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], controls = cbind(year = dt[[3]]) ) print(res)path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) res <- fast_fe_est( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], controls = cbind(year = dt[[3]]) ) print(res)
Computes a covariance-like quadratic form from transformed coefficient
vectors and subtracts the Kline, Saggio, and Solvsten (KSS) bias adjustment
based on observation-specific variances and Bii weights.
kss_quadratic_form(sigma_i, A_1, A_2, beta, Bii)kss_quadratic_form(sigma_i, A_1, A_2, beta, Bii)
sigma_i |
Vector of leave-out variance estimates. |
A_1 |
Matrix used to transform the coefficient vector on the left side of the quadratic form. |
A_2 |
Matrix used to transform the coefficient vector on the right side of the quadratic form. |
beta |
Estimated coefficient vector. |
Bii |
Vector of observation-specific bias terms for the target variance component. |
A named list with theta, the plug-in estimate, and theta_KSS, the
bias-corrected estimate.
A <- diag(2) kss_quadratic_form( sigma_i = c(1, 2), A_1 = A, A_2 = A, beta = c(0.5, 1), Bii = c(0.1, 0.2) )A <- diag(2) kss_quadratic_form( sigma_i = c(1, 2), A_1 = A, A_2 = A, beta = c(0.5, 1), Bii = c(0.1, 0.2) )
Estimates plug-in and leave-out bias-corrected variance components for a two-way fixed effects model as part of the R translation of the original 'MATLAB' package of Kline, Saggio, and Solvsten (2020). The function starts from worker identifiers, firm identifiers, and an outcome, constructs the leave-one-worker-out connected set, optionally partials out controls, computes statistical leverages either exactly or via the Johnson-Lindenstrauss approximation (JLA), and returns decomposition summaries together with estimated worker and firm effects.
leave_out_KSS( y, id, firmid, controls = NULL, leave_out_level = "matches", type_algorithm = "JLA", simulations_JLA = 200, lincom_do = 0, Z_lincom = NULL, labels_lincom = NULL, csv_file = NULL, txt_file = NULL, paral = TRUE, Cd = 12345, progress = FALSE )leave_out_KSS( y, id, firmid, controls = NULL, leave_out_level = "matches", type_algorithm = "JLA", simulations_JLA = 200, lincom_do = 0, Z_lincom = NULL, labels_lincom = NULL, csv_file = NULL, txt_file = NULL, paral = TRUE, Cd = 12345, progress = FALSE )
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Optional matrix or vector of controls. When supplied, the function prepends an intercept internally and residualizes the outcome with respect to worker, firm, and control regressors before computing variance components. |
leave_out_level |
Character scalar. Use |
type_algorithm |
Character scalar. Use the randomized
Johnson-Lindenstrauss approximation ( |
simulations_JLA |
Integer number of random projections when
|
lincom_do |
Integer flag equal to |
Z_lincom |
Optional matrix of observables used by |
labels_lincom |
Optional labels for the columns of |
csv_file |
Optional path for exporting the estimated effects table as a
|
txt_file |
Optional path for exporting a text summary of the decomposition. |
paral |
Logical scalar indicating whether leverage computation should use
the parallel routine |
Cd |
Integer random seed passed to |
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
Relative to the original 'MATLAB' package, this implementation follows the same broad sequence: connected-set construction, leave-out pruning, optional residualization of controls, leverage computation, and bias correction of the variance of firm effects, the covariance of worker and firm effects, and the variance of worker effects.
The decomposition is based on an Abowd, Kramarz, and Margolis (1999;
AKM)-style model with worker effects, firm effects, and optional controls.
By default, the function leaves out matches, which corresponds to allowing
unrestricted heteroskedasticity and arbitrary serial correlation within
worker-firm matches, in line with the discussion in the original vignette.
When leave_out_level = "obs", the correction is based on leaving out one
person-year observation at a time.
When controls are supplied, the function first estimates their coefficients
in the leave-out connected set and then works with the residualized outcome.
When lincom_do = 1, the function additionally reports linear projections of
firm effects on observables using lincom_KSS().
The input vectors must be sorted by worker identifier and, within worker,
from earlier to later time periods before calling the function. When
controls or Z_lincom are supplied, they must follow that same sorted row
order.
The returned object is the primary estimation record. It stores the
decomposition summaries, estimated worker and firm effects, and optional
lincom output. When csv_file or txt_file are supplied, those summaries
are also written to disk.
An object of class "leave_out_kss_result" containing biased and
bias-corrected estimates, estimated worker and firm effects, optional
lincom results, sample summaries, and elapsed time.
Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. Econometrica, 88(5), 1859-1898.
Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage workers and high wage firms. Econometrica, 67(2), 251-333.
leave_out_KSS_fe(), rsquared_comp(), lincom_KSS(),
leverages(), leverages_parallel()
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) data.table::setorder(dt, V1, V3) res <- leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], simulations_JLA = 5, paral = FALSE, progress = FALSE ) print(res)path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) data.table::setorder(dt, V1, V3) res <- leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], simulations_JLA = 5, paral = FALSE, progress = FALSE ) print(res)
Variant of leave_out_KSS() that allows selected control columns to be
treated as categorical regressors and expanded into dummy variables inside the
routine. This mirrors the use case discussed in the original 'MATLAB'
vignette where time effects or other discrete controls are partialled out
before the leave-out variance decomposition is computed.
leave_out_KSS_fe( y, id, firmid, controls = NULL, absorb_col = NULL, leave_out_level = "matches", type_algorithm = "JLA", simulations_JLA = 200, lincom_do = 0, Z_lincom = NULL, labels_lincom = NULL, csv_file = NULL, txt_file = NULL, paral = TRUE, Cd = 12345, progress = FALSE )leave_out_KSS_fe( y, id, firmid, controls = NULL, absorb_col = NULL, leave_out_level = "matches", type_algorithm = "JLA", simulations_JLA = 200, lincom_do = 0, Z_lincom = NULL, labels_lincom = NULL, csv_file = NULL, txt_file = NULL, paral = TRUE, Cd = 12345, progress = FALSE )
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Optional matrix or vector of controls. When supplied, the function prepends an intercept internally and residualizes the outcome with respect to worker, firm, and control regressors before computing variance components. |
absorb_col |
Optional integer vector identifying columns of |
leave_out_level |
Character scalar. Use |
type_algorithm |
Character scalar. Use the randomized
Johnson-Lindenstrauss approximation ( |
simulations_JLA |
Integer number of random projections when
|
lincom_do |
Integer flag equal to |
Z_lincom |
Optional matrix of observables used by |
labels_lincom |
Optional labels for the columns of |
csv_file |
Optional path for exporting the estimated effects table as a
|
txt_file |
Optional path for exporting a text summary of the decomposition. |
paral |
Logical scalar indicating whether leverage computation should use
the parallel routine |
Cd |
Integer random seed passed to |
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
The function follows the same workflow as leave_out_KSS() but modifies the
control-adjustment step. When absorb_col is supplied, the corresponding
columns are treated as categorical effects and expanded into dummy variables
inside the leave-out connected set before residualization. This is convenient
for year effects or other high-level discrete controls that are easier to
supply in coded form than as a pre-built model matrix.
As with leave_out_KSS(), the input vectors must be sorted by worker
identifier and, within worker, from earlier to later time periods before
calling the function. Any supplied control columns must follow that same row
order.
The rest of the decomposition logic is unchanged: the function constructs a
leave-one-worker-out connected set, computes leverages, and returns plug-in
and bias-corrected variance components together with estimated worker and
firm effects. When csv_file or txt_file are supplied, those summaries are
also written to disk.
An object of class "leave_out_kss_result" containing biased and
bias-corrected estimates, estimated worker and firm effects, optional
lincom results, sample summaries, and elapsed time.
Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. Econometrica, 88(5), 1859-1898.
leave_out_KSS(), rsquared_comp()
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) data.table::setorder(dt, V1, V3) res <- leave_out_KSS_fe( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], controls = cbind(year = dt[[3]]), absorb_col = 1, simulations_JLA = 5, paral = FALSE, progress = FALSE ) print(res)path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) data.table::setorder(dt, V1, V3) res <- leave_out_KSS_fe( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], controls = cbind(year = dt[[3]]), absorb_col = 1, simulations_JLA = 5, paral = FALSE, progress = FALSE ) print(res)
Computes the observation-level leverage quantities used in the Kline, Saggio, and Solvsten (KSS) bias correction, either exactly or with a Johnson-Lindenstrauss approximation (JLA).
leverages(X_fe, X_pe, X, xx, type_algorithm, scale, progress = FALSE)leverages(X_fe, X_pe, X, xx, type_algorithm, scale, progress = FALSE)
X_fe |
Matrix used for the firm-effect variance component. |
X_pe |
Matrix used for the person-effect variance component. |
X |
Main design matrix. |
xx |
Crossproduct matrix |
type_algorithm |
Character scalar, either |
scale |
Number of random projections when |
progress |
Logical scalar indicating whether leverage progress should be displayed. |
The exact branch solves one linear system per observation. The Johnson-Lindenstrauss approximation (JLA) branch follows the randomized projection logic described in the original vignette to approximate the same quantities at lower computational cost on large panels.
A list with elements Pii, Mii, correction_JLA, Bii_fe,
Bii_cov, and Bii_pe.
leverages_parallel(), leave_out_KSS()
Parallel version of leverages() using foreach and doParallel.
leverages_parallel(X_fe, X_pe, X, xx, type_algorithm, scale, progress = FALSE)leverages_parallel(X_fe, X_pe, X, xx, type_algorithm, scale, progress = FALSE)
X_fe |
Matrix used for the firm-effect variance component. |
X_pe |
Matrix used for the person-effect variance component. |
X |
Main design matrix. |
xx |
Crossproduct matrix |
type_algorithm |
Character scalar, either |
scale |
Number of random projections when |
progress |
Logical scalar indicating whether leverage progress should be displayed. |
The exact and Johnson-Lindenstrauss approximation (JLA) branches mirror
leverages(), but the repeated linear solves are distributed across worker
processes. This routine is intended for larger problems where the leverage
stage dominates runtime.
A list with the same elements returned by leverages().
Regresses transformed fixed effects on observables and reports both naive and Kline, Saggio, and Solvsten (KSS)-corrected standard errors. This corresponds to the "lincom" discussion in the original vignette on regressing firm effects on observables.
lincom_KSS(y, X, Z, Transform, sigma_i, labels = NULL)lincom_KSS(y, X, Z, Transform, sigma_i, labels = NULL)
y |
Outcome vector used to estimate the original model. |
X |
Design matrix used to estimate the fixed effects model. |
Z |
Matrix of observables used in the linear projection. |
Transform |
Matrix that maps model coefficients into the fixed effect of interest, typically firm effects. |
sigma_i |
Observation-specific leave-out variance estimates. |
labels |
Optional labels for the columns of |
An object of class "lincom_kss_result" containing a results table
with coefficient estimates, naive standard errors, KSS-corrected standard
errors, and t statistics.
leave_out_KSS(), kss_quadratic_form()
Print a Fixed Effects Fit Result
## S3 method for class 'fast_fe_est_result' print(x, ...)## S3 method for class 'fast_fe_est_result' print(x, ...)
x |
A result returned by |
... |
Unused. |
x, invisibly.
Print a LeaveOutKSS Decomposition Result
## S3 method for class 'leave_out_kss_result' print(x, ...)## S3 method for class 'leave_out_kss_result' print(x, ...)
x |
A result returned by |
... |
Unused. |
x, invisibly.
Print a Lincom Result
## S3 method for class 'lincom_kss_result' print(x, ...)## S3 method for class 'lincom_kss_result' print(x, ...)
x |
A result returned by |
... |
Unused. |
x, invisibly.
Print an R-Squared Comparison Result
## S3 method for class 'rsquared_comp_result' print(x, ...)## S3 method for class 'rsquared_comp_result' print(x, ...)
x |
A result returned by |
... |
Unused. |
x, invisibly.
Iteratively removes articulation workers from the worker-firm mobility graph until the remaining sample stays connected after dropping any single worker. This implements the leave-one-worker-out connectivity requirement used by the main Kline, Saggio, and Solvsten (KSS) routines.
pruning_unbal_v3( y, firmid, id, id_old, firmid_old, controls, prov_indicator = rep(1, length(y)), progress = FALSE )pruning_unbal_v3( y, firmid, id, id_old, firmid_old, controls, prov_indicator = rep(1, length(y)), progress = FALSE )
y |
Numeric outcome vector. |
firmid |
Firm identifier vector. |
id |
Worker identifier vector. |
id_old |
Original worker identifiers. |
firmid_old |
Original firm identifiers. |
controls |
Matrix of controls aligned with the observations. |
prov_indicator |
Optional provider indicator carried along with the sample. |
progress |
Logical scalar indicating whether iterative pruning progress should be emitted. |
The routine constructs a bipartite worker-firm graph for movers, identifies articulation workers, removes them, and recomputes the largest connected component until no articulation worker remains.
A list containing the pruned outcome, identifiers, controls, and provider indicator.
connected_set(), build_adj(), leave_out_KSS()
Computes goodness-of-fit summaries for a two-way fixed effects model and for a saturated worker-firm interaction model on the same sample. The function is intended as a diagnostic companion to the leave-out decomposition routines and follows the same basic data-preparation conventions.
rsquared_comp( y, id, firmid, controls = NULL, txt_file = NULL, progress = FALSE )rsquared_comp( y, id, firmid, controls = NULL, txt_file = NULL, progress = FALSE )
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Optional matrix or vector of additional controls. |
txt_file |
Optional path for exporting a text summary of the comparison. |
progress |
Logical scalar indicating whether stage progress messages should be emitted. |
The two-way fixed effects model includes worker effects, firm effects, and optional controls. The saturated model replaces separate worker and firm effects with worker-firm interaction indicators. Comparing the two summaries can be useful when evaluating how much additional fit is obtained by moving from the standard Abowd, Kramarz, and Margolis (1999; AKM) specification to a fully saturated match design.
An object of class "rsquared_comp_result" containing a summary
table for the two fitted models and the elapsed time.
leave_out_KSS(), leave_out_KSS_fe(), fast_fe_est()
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) res <- rsquared_comp( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], progress = FALSE ) print(res)path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) res <- rsquared_comp( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], progress = FALSE ) print(res)
Computes the stayer-specific adjustment used when the main decomposition is performed at the match level. In that case, the current implementation uses a leave-one-observation-out style adjustment for stayers, following the approximation discussed in the original vignette.
sigma_for_stayers(y, id, firmid, peso, b)sigma_for_stayers(y, id, firmid, peso, b)
y |
Outcome vector in person-year space. |
id |
Worker identifier vector in collapsed match space. |
firmid |
Firm identifier vector in collapsed match space. |
peso |
Match weights used to expand back to person-year space. |
b |
Estimated coefficient vector from the worker-firm fixed effects regression. |
A vector of averaged stayer variance adjustments at the match level.
leave_out_KSS(), leave_out_KSS_fe()
Graph-based trimming helper that keeps firms whose degree in the mobility
graph is at least min_degree. This is a stronger restriction than the basic
connected-set filter and can be useful when the analyst wants a denser firm
network.
strongc_set(y, id, firmid, controls, min_degree = 1, progress = FALSE)strongc_set(y, id, firmid, controls, min_degree = 1, progress = FALSE)
y |
Numeric outcome vector. |
id |
Worker identifier vector. |
firmid |
Firm identifier vector. |
controls |
Matrix of controls aligned with the observations. |
min_degree |
Minimum graph degree required for a firm to remain in the sample. |
progress |
Logical scalar indicating whether graph summary messages should be emitted. |
A list with DT and DT_controls, analogous to connected_set().
Summarize a LeaveOutKSS Decomposition Result
## S3 method for class 'leave_out_kss_result' summary(object, ...)## S3 method for class 'leave_out_kss_result' summary(object, ...)
object |
A result returned by |
... |
Unused. |
object, invisibly.