propensity score的方法似乎应用越来越多,计算propensity score比较简单,困难的在于对被试进行匹配,目前可见的几种方法是stratification、nearest neighbour、radius、kernel四种方法,这也是stata内嵌的四种方法。
写了一个sas的nearest neighbour的macro,允许同一个控制组被试对应几个处理组被试,两组总体数量是1:1,macro如下:
%macro compare(infile, class, logit, diff, var, N, outfile);
*—– split data file ——*;
data t11; set &infile; if &class=0 and &logit ^= .; keep &var &class &logit; run;
proc sort data=t11; by &logit; run;
data t12; set &infile; if &class=1 and &logit ^= .; keep &var &class &logit; run;
proc sort data=t12; by &logit; run;
*—– match ——*;
%do i=1 %to &N;
data t21; set t11; if _n_=&i; run;
data t22; set t12; run;
proc iml;
use t21;
read all into a;
use t22;
read all into b;
brows=NROW(b);
bcols=NCOL(b);
aa=shape(a,brows,bcols);
bb=j(brows,bcols+1,0);
bb[1:brows,1:bcols]=b[1:brows,1:bcols];
aaa=aa[,bcols];
bbb=bb[,bcols];
bb[,bcols+1]=abs(aaa-bbb);
create t31 var{&var &class &logit &diff};
append from bb;
quit;
proc sort data=t31; by &diff; run;
data t32; set t31; if _n_=1; run;
%if &i=1 %then %do;
data &outfile; set t32; run;
%end;
%else %do;
data &outfile; set &outfile t32; run;
%end;
proc printto log=”d:\pscores.log” print=”d:\pscores.out”;run;
%end;
%mend compare;
对参数的解释:
%macro compare(infile, class, logit, diff, var, N, outfile);
infile是事先准备好的文件,需事先计算过logit;class变量指明是处理组还是控制组;logit变量,无须解释;diff是处理组和控制组被试logit上的差异;
var是infile中希望保留的变量,计算是在矩阵里进行的,变量越少占用内存空间越少,最好只保留id以便和之前变量merge;N是处理组的被试个数;outfile是和处理组匹配的nearest neighbour的控制组被试。
如有同学试用且发现问题可留言!