• 登录
Skip to content

一起大数据-技术文章心得

一起大数据网由数据爱好者发起并维护,专注数据分析、挖掘、大数据相关领域的技术分享、交流。不定期组织爱好者聚会,期待通过跨行业的交流和碰撞,更好的推进各领域数据的价值落地。

Menu
  • 首页
  • 大数据案例
  • 数据&电子书
  • 视频
    • Excel视频
    • VBA视频
    • Mysql视频
    • 统计学视频
    • SPSS视频
    • R视频
    • SAS视频
    • Python视频
    • 数据挖掘视频
    • 龙星计划-数据挖掘
    • 大数据视频
    • Machine Learning with Python
  • 理论
    • 统计学
    • 数据分析
    • 机器学习
    • 大数据
  • 软件
    • Excel
    • Modeler
    • Python
    • R
    • SAS
    • SPSS
    • SQL
    • PostgreSQL
    • KNIME
  • 技术教程
    • SQL教程
    • SPSS简明教程
    • SAS教程
    • The Little SAS Book
    • SAS EG教程
    • R语言教程
    • Python3教程
    • IT 技术速查手册
    • Data Mining With Python and R
    • SAS Enterprise Miner
  • 问答社区
  • 我要提问
Menu

Structural Equation Models Using R

Posted on 2021年6月3日
Simply speaking, a structural equation model (SEM) is a combination of confirmatory factor analysis and path analysis. Structural equation modeling includes two sets of models – the measurement model and the structural model. The measurement model can be expressed as a factor model. Figure 1 is a model to measure cognitive ability using three variables – verbal ability, math ability, and speed ability (note each of them can be viewed as factors measured by lower level observed variables).

1. A factor model to measure cognitive ability

Figure 2 gives another example of measurement model – a model to measure health.

2. A factor model to measure health

If one believes that health influences cognitive ability, then one can fit a path model using the factors – cognitive ability and health. Therefore, a structural model is actually a path model. Putting them together, we have a model in Figure 3. This model is called SEM model.

3. A factor model to measure health

Example 1. Autoregressive model

In ACTIVE study, we have three variables – word series (ws), letter series (ls), and letter sets (ls) to measure reasoning ability. Also, we have data on all these three variables before and after training. Assume we want to test whether reasoning ability before training can predict reasoning ability after training. Then the SEM model in Figure 4 can be used. Not that we allow the factor in time 1 to predict the factor at time 2. In addition, we allow the uniqueness factors for each observed variable to be correlated. The R code for the analysis is given below.

4. An autoregressive factor model

First look at model fit. The chi-square value is 27 with 5 degrees of freedom. The p-value for chi-square test is almost 0. Thus, based on chi-square test, this is not a good model. However, CFI and TFI are both close to 1. The RMSEA is about 0.063 and SRMR is about 0.011. Considering the sample size here is large – N=1114, overall, we may accept this model is a fairly good model. Then we can answer our question. Because the regression coefficient from reasoning1 to reasoning2 is significant, reasoning ability before training seems to predict reasoning ability after training. In other words, those with higher reasoning ability before training tend to have higher reasoning ability after training.

> library(lavaan)
This is lavaan 0.5-23.1097
lavaan is BETA software! Please report any bugs.
> usedata('active.full.csv')
> automodel <- '
+ reasoning1 =~ ws1 + ls1 + lt1
+ reasoning2 =~ ws2 + ls2 + lt2
+ reasoning2 ~ reasoning1
+ ws1 ~~ ws2
+ ls1 ~~ ls2
+ lt1 ~~ lt2
+ '
> 
> auto.res <- sem(automodel, data=active.full)
> summary(auto.res, fit=TRUE)
lavaan (0.5-23.1097) converged normally after  66 iterations

  Number of observations                          1114

  Estimator                                         ML
  Minimum Function Test Statistic               27.213
  Degrees of freedom                                 5
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic             5827.630
  Degrees of freedom                                15
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.996
  Tucker-Lewis Index (TLI)                       0.989

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -16447.377
  Loglikelihood unrestricted model (H1)     -16433.771

  Number of free parameters                         16
  Akaike (AIC)                               32926.755
  Bayesian (BIC)                             33007.006
  Sample-size adjusted Bayesian (BIC)        32956.186

Root Mean Square Error of Approximation:

  RMSEA                                          0.063
  90 Percent Confidence Interval          0.041  0.087
  P-value RMSEA <= 0.05                          0.151

Standardized Root Mean Square Residual:

  SRMR                                           0.013

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning1 =~                                       
    ws1               1.000                           
    ls1               1.192    0.030   40.091    0.000
    lt1               0.422    0.016   26.301    0.000
  reasoning2 =~                                       
    ws2               1.000                           
    ls2               1.110    0.026   43.371    0.000
    lt2               0.411    0.014   29.443    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning2 ~                                        
    reasoning1        1.073    0.024   43.919    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .ws1 ~~                                              
   .ws2               1.216    0.327    3.718    0.000
 .ls1 ~~                                              
   .ls2               0.356    0.401    0.888    0.375
 .lt1 ~~                                              
   .lt2               1.596    0.138   11.544    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ws1               5.511    0.385   14.324    0.000
   .ls1               4.547    0.475    9.575    0.000
   .lt1               3.996    0.181   22.065    0.000
   .ws2               5.021    0.419   11.995    0.000
   .ls2               5.161    0.507   10.182    0.000
   .lt2               3.963    0.181   21.875    0.000
    reasoning1       19.103    1.060   18.014    0.000
   .reasoning2        2.447    0.289    8.479    0.000

> 

Example 2. Mediation analysis with latent variables

In path analysis, we have fitted a complex mediation model. Since we know that ws1, ls1, and lt1 are measurements of reasoning ability, we can form a latent reasoning ability variable. Thus, our mediation model can be expressed as in Figure 5.

5. A mediation model with latent variables

Given CFI = 0.997, RMSEA =  0.034 and SRMR = 0.015, we accept the model as a good model even though the chi-square test is significant. Based on the Sobel test, the total indirect effect from age to ept1 through hvltt1 and reasoning is significant.

> library(lavaan)
This is lavaan 0.5-23.1097
lavaan is BETA software! Please report any bugs.
> usedata('active.full.csv')
> #head(active.full)
> med.model <- '
+ reasoning =~ ws1 + ls1 + lt1
+ reasoning ~ p4*age + p8*edu
+ hvltt1 ~ p2*age + p7*edu
+ hvltt1 ~~ reasoning
+ ept1 ~ p1*age + p6*edu + p3*hvltt1 + p5*reasoning
+ indirect := p2*p3 + p4*p5
+ total := p1 + p2*p3 + p7*p3
+ '
> 
> med.res <- sem(med.model, data=active.full)
> summary(med.res, fit=TRUE)
lavaan (0.5-23.1097) converged normally after  51 iterations

  Number of observations                          1114

  Estimator                                         ML
  Minimum Function Test Statistic               18.363
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.019

Model test baseline model:

  Minimum Function Test Statistic             3344.605
  Degrees of freedom                                20
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.997
  Tucker-Lewis Index (TLI)                       0.992

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -20675.968
  Loglikelihood unrestricted model (H1)     -20666.787

  Number of free parameters                         17
  Akaike (AIC)                               41385.937
  Bayesian (BIC)                             41471.204
  Sample-size adjusted Bayesian (BIC)        41417.207

Root Mean Square Error of Approximation:

  RMSEA                                          0.034
  90 Percent Confidence Interval          0.013  0.055
  P-value RMSEA <= 0.05                          0.889

Standardized Root Mean Square Residual:

  SRMR                                           0.015

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning =~                                        
    ws1               1.000                           
    ls1               1.181    0.029   41.262    0.000
    lt1               0.430    0.016   26.758    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning ~                                         
    age       (p4)   -0.228    0.023   -9.737    0.000
    edu       (p8)    0.745    0.047   15.792    0.000
  hvltt1 ~                                            
    age       (p2)   -0.161    0.027   -6.074    0.000
    edu       (p7)    0.429    0.052    8.177    0.000
  ept1 ~                                              
    age       (p1)    0.030    0.022    1.389    0.165
    edu       (p6)    0.396    0.046    8.527    0.000
    hvltt1    (p3)    0.169    0.026    6.488    0.000
    reasoning (p5)    0.643    0.035   18.186    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .reasoning ~~                                        
   .hvltt1            7.255    0.592   12.263    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ws1               5.334    0.366   14.594    0.000
   .ls1               4.850    0.444   10.910    0.000
   .lt1               3.917    0.181   21.657    0.000
   .hvltt1           20.618    0.874   23.601    0.000
   .ept1             11.505    0.522   22.027    0.000
   .reasoning        13.814    0.777   17.777    0.000

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indirect         -0.174    0.019   -9.368    0.000
    total             0.075    0.025    3.008    0.003

> 

推荐访问


数据分析交流:数据分析交流
Excel学习: Excel学习交流
Python交流:一起学习Python(数据分
SQL交流:一起学习SQL(数据分析
微博:一起大数据

最新提问

  • SQL Chat
  • sql server 不允许保存更改。您所做的更改要求删除并重新创建以下表。您对无法重新创建的表进行了更改或者启用了”阻止保存要求重新创建表的更改”选项。
  • 偏相关分析
  • 复相关系数
  • 【R语言】熵权法确定权重
  • 如何破解Excel VBA密码
  • 解决 vba 报错:要在64位系统上使用,请检查并更新Declare 语句
  • 基于 HuggingFace Transformer 的统一综合自然语言处理库
  • sqlserver分区表索引
  • Navicat连接数据库后不显示库、表、数据

文章标签

ARIMA CBC Excel GBDT KNN Modeler Mysql pandas PostgreSQL python python数据可视化 R SAS sklearn SPSS SQL SVM Tableau TensorFlow VBA 主成分分析 关联规则 决策树 协同过滤 可视化 因子分析 大数据 大数据分析 推荐系统 数据分析 数据可视化 数据挖掘 数据透视表 文本挖掘 时间序列 机器学习 深度学习 神经网络 结构方程 统计学 联合分析 聚类 聚类分析 逻辑回归 随机森林
©2023 一起大数据-技术文章心得 | Design: Newspaperly WordPress Theme