• 登录
Skip to content

一起大数据-技术文章心得

一起大数据网由数据爱好者发起并维护,专注数据分析、挖掘、大数据相关领域的技术分享、交流。不定期组织爱好者聚会,期待通过跨行业的交流和碰撞,更好的推进各领域数据的价值落地。

Menu
  • 首页
  • 大数据案例
  • 数据&电子书
  • 视频
    • Excel视频
    • VBA视频
    • Mysql视频
    • 统计学视频
    • SPSS视频
    • R视频
    • SAS视频
    • Python视频
    • 数据挖掘视频
    • 龙星计划-数据挖掘
    • 大数据视频
    • Machine Learning with Python
  • 理论
    • 统计学
    • 数据分析
    • 机器学习
    • 大数据
  • 软件
    • Excel
    • Modeler
    • Python
    • R
    • SAS
    • SPSS
    • SQL
    • PostgreSQL
    • KNIME
  • 技术教程
    • SQL教程
    • SPSS简明教程
    • SAS教程
    • The Little SAS Book
    • SAS EG教程
    • R语言教程
    • Python3教程
    • IT 技术速查手册
    • Data Mining With Python and R
    • SAS Enterprise Miner
  • 问答社区
  • 我要提问
Menu
Granular Monotonic Binning in SAS

Granular Monotonic Binning in SAS

Posted on 2017年10月31日

In the post (https://statcompute.wordpress.com/2017/06/15/finer-monotonic-binning-based-on-isotonic-regression), it is shown how to do a finer monotonic binning with isotonic regression in R.

Below is a SAS macro implementing the monotonic binning with the same idea of isotonic regression. This macro is more efficient than the one shown in (https://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) without iterative binning and is also able to significantly increase the binning granularity.

%macro monobin(data = , y = , x = );
options mprint mlogic;
data _data_ (keep = _x _y);
  set &data;
  where &y in (0, 1) and &x ~= .;
  _y = &y;
  _x = &x;
run;
proc transreg data = _last_ noprint;
  model identity(_y) = monotone(_x);
  output out = _tmp1 tip = _t;
run;
proc summary data = _last_ nway;
  class _t_x;
  output out = _data_ (drop = _freq_ _type_) mean(_y) = _rate;
run;
proc sort data = _last_;
  by _rate;
run;
data _tmp2;
  set _last_;
  by _rate;
  _idx = _n_;
  if _rate = 0 then _idx = _idx + 1;
  if _rate = 1 then _idx = _idx - 1;
run;
  
proc sql noprint;
create table
  _tmp3 as
select
  a.*,
  b._idx
from
  _tmp1 as a inner join _tmp2 as b
on
  a._t_x = b._t_x;
  
create table
  _tmp4 as
select
  a._idx,
  min(a._x)                                               as _min_x,
  max(a._x)                                               as _max_x,
  sum(a._y)                                               as _bads,
  count(a._y)                                             as _freq,
  mean(a._y)                                              as _rate,
  sum(a._y) / b.bads                                      as _bpct,
  sum(1 - a._y) / (b.freq - b.bads)                       as _gpct,
  log(calculated _bpct / calculated _gpct)                as _woe,
  (calculated _bpct - calculated _gpct) * calculated _woe as _iv
from
  _tmp3 as a, (select count(*) as freq, sum(_y) as bads from _tmp3) as b
group by
  a._idx;
quit;
title "Monotonic WoE Binning for %upcase(%trim(&x))";
proc print data = _last_ label noobs;
  var _min_x _max_x _bads _freq _rate _woe _iv;
  label
    _min_x = "Lower"
    _max_x = "Upper"
    _bads  = "#Bads"
    _freq  = "#Freq"
    _rate  = "BadRate"
    _woe   = "WoE"
    _iv    = "IV";
  sum _bads _freq _iv;
run;
title;
%mend monobin;
Below is the sample output for LTV, showing an identical binning scheme to the one generated by the R isobin() function.

 更多相关文章请看 https://statcompute.wordpress.com/category/scorecard/

发表评论 取消回复

要发表评论,您必须先登录。

推荐访问


数据分析交流:数据分析交流
Excel学习: Excel学习交流
Python交流:一起学习Python(数据分
SQL交流:一起学习SQL(数据分析
微博:一起大数据

最新提问

  • SQL Chat
  • sql server 不允许保存更改。您所做的更改要求删除并重新创建以下表。您对无法重新创建的表进行了更改或者启用了”阻止保存要求重新创建表的更改”选项。
  • 偏相关分析
  • 复相关系数
  • 【R语言】熵权法确定权重
  • 如何破解Excel VBA密码
  • 解决 vba 报错:要在64位系统上使用,请检查并更新Declare 语句
  • 基于 HuggingFace Transformer 的统一综合自然语言处理库
  • sqlserver分区表索引
  • Navicat连接数据库后不显示库、表、数据

文章标签

ARIMA CBC Excel GBDT KNN Modeler Mysql pandas PostgreSQL python python数据可视化 R SAS sklearn SPSS SQL SVM Tableau TensorFlow VBA 主成分分析 关联规则 决策树 协同过滤 可视化 因子分析 大数据 大数据分析 推荐系统 数据分析 数据可视化 数据挖掘 数据透视表 文本挖掘 时间序列 机器学习 深度学习 神经网络 结构方程 统计学 联合分析 聚类 聚类分析 逻辑回归 随机森林
©2023 一起大数据-技术文章心得 | Design: Newspaperly WordPress Theme