6 Stata
6.1 Simple linear regression
6.1.1 Setup
We will also set the pseudo-random number generator seed to 02138 to make the stochastic components of our simulations reproducible.
6.1.2 Step 4: Simulate
Next, we create a simulated dataset based on our assumptions about the model under the alternative hypothesis, and fit the model.
clear
set obs 400
generate age = runiformint(18,65)
generate female = rbinomial(1,0.5)
generate interact = age*female
generate e = rnormal(0,20)
generate sbp = 110 + 0.5*age + (-20)*female + 0.35*interact + e
We can then test the null hypothesis that the interaction term equals zero using a likelihood-ratio test.
regress sbp age i.female c.age#i.female
estimates store full
regress sbp age i.female
estimates store reduced
Likelihood-ratio test LR chi2(1) = 13.38
(Assumption: reduced nested in full) Prob > chi2 = 0.0003
The test yields a p-value of 0.0003.
6.1.3 Step 5: Automate
Next, let’s write a program that creates datasets under the alternative hypothesis, fits the models, and uses the simulate
command to test the program.
capture program drop simregress
program simregress, rclass
version 16
// DEFINE THE INPUT PARAMETERS AND THEIR DEFAULT VALUES
syntax, n(integer) /// Sample size
[ alpha(real 0.05) /// Alpha level
intercept(real 110) /// Intercept parameter
age(real 0.5) /// Age parameter
female(real -20) /// Female parameter
interact(real 0.35) /// Interaction parameter
esd(real 20) ] // Standard deviation of the error
quietly {
// GENERATE THE RANDOM DATA
clear
set obs `n'
generate age = runiformint(18,65)
generate female = rbinomial(1,0.5)
generate interact = age*female
generate e = rnormal(0,`esd')
generate sbp = `intercept' + `age'*age + `female'*female + ///
`interact'*interact + e
// TEST THE NULL HYPOTHESIS
regress sbp age i.female c.age#i.female
estimates store full
regress sbp age i.female
estimates store reduced
lrtest full reduced
}
// RETURN RESULTS
return scalar reject = (r(p)<`alpha')
end
Below, we use simulate
to run simregress
200 times and summarize the variable reject. The results indicate that we would have 74% power to detect an interaction parameter of 0.35 given a sample of 400 participants and the other assumptions about the model.
simulate reject=r(reject), reps(200) seed(12345): ///
simttest, n(200) m0(70) ma(75) sd(15) alpha(0.05)
Simulations (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
summarize reject
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
reject | 200 .735 .4424407 0 1
Next, let’s write a program called power\_cmd\_simregress
so that we can integrate simregress
into Stata’s power
command.
capture program drop power_cmd_simregress
program power_cmd_simregress, rclass
version 17
// DEFINE THE INPUT PARAMETERS AND THEIR DEFAULT VALUES
syntax, n(integer) /// Sample size
[ alpha(real 0.05) /// Alpha level
intercept(real 110) /// Intercept parameter
age(real 0.5) /// Age parameter
female(real -20) /// Female parameter
interact(real 0.35) /// Interaction parameter
esd(real 20) /// Standard deviation of the error
reps(integer 100)] // Number of repetitions
// GENERATE THE RANDOM DATA AND TEST THE NULL HYPOTHESIS
quietly {
simulate reject=r(reject), reps(`reps'): ///
simregress, n(`n') age(`age') female(`female') ///
interact(`interact') esd(`esd') alpha(`alpha')
summarize reject
}
// RETURN RESULTS
return scalar power = r(mean)
return scalar N = `n'
return scalar alpha = `alpha'
return scalar intercept = `intercept'
return scalar age = `age'
return scalar female = `female'
return scalar interact = `interact'
return scalar esd = `esd'
end
Finally, run power simregress
for a range of input parameter values, including the parameters listed in double quotes. To do this, we first need to create a program called power\_cmd\_simregress\_init
.
6.1.4 Step 6: Summarize
Now, we’re ready to use power simregress
! The output below shows the simulated power when the interaction parameter equals 0.2 to 0.4 in increments of 0.05 for samples of size 400, 500, 600, and 700.
power simregress, n(400(100)700) intercept(110) ///
age(0.5) female(-20) interact(0.2(0.05)0.4) ///
reps(1000) table graph(xdimension(interact) ///
legend(rows(1)))
Estimated power
Two-sided test
+--------------------------------------------------------------------+
| alpha power N intercept age female interact esd |
|--------------------------------------------------------------------|
| .05 .3 400 110 .5 -20 .2 20 |
| .05 .421 400 110 .5 -20 .25 20 |
| .05 .546 400 110 .5 -20 .3 20 |
| .05 .685 400 110 .5 -20 .35 20 |
| .05 .767 400 110 .5 -20 .4 20 |
| .05 .34 500 110 .5 -20 .2 20 |
| .05 .509 500 110 .5 -20 .25 20 |
| .05 .63 500 110 .5 -20 .3 20 |
| .05 .767 500 110 .5 -20 .35 20 |
| .05 .872 500 110 .5 -20 .4 20 |
| .05 .412 600 110 .5 -20 .2 20 |
| .05 .556 600 110 .5 -20 .25 20 |
| .05 .712 600 110 .5 -20 .3 20 |
| .05 .829 600 110 .5 -20 .35 20 |
| .05 .886 600 110 .5 -20 .4 20 |
| .05 .471 700 110 .5 -20 .2 20 |
| .05 .634 700 110 .5 -20 .25 20 |
| .05 .771 700 110 .5 -20 .3 20 |
| .05 .908 700 110 .5 -20 .35 20 |
| .05 .957 700 110 .5 -20 .4 20 |
+--------------------------------------------------------------------+
6.2 Mixed effects model
6.2.2 Step 4: Simulate
Next, we create a simulated dataset based on our assumptions about the model under the alternative hypothesis, and fit the model. We will simulate 5 observations at 4-month increments for 200 children.
clear
set obs 200
generate child = _n
generate female = rbinomial(1,0.5)
generate u_0i = rnormal(0,0.25)
generate u_1i = rnormal(0,0.60)
expand 5
bysort child: generate age = (_n-1)*0.5
generate interaction = age*female
generate e_ij = rnormal(0,1.2)
generate weight = 5.35 + 3.6*age + (-0.5)*female + (-0.25)*interaction ///
+ u_0i + age*u_1i + e_ij
Our dataset includes the random deviations that we would not observe in a real dataset. We can then use mixed to fit a model to our simulated data.
mixed weight age i.female c.age#i.female || child: age , stddev nolog noheader
estimates store full
mixed weight age i.female || child: age , stddev nolog noheader
estimates store reduced
lrtest full reduced
We can then test the null hypothesis that the interaction term equals zero using a likelihood-ratio test.
lrtest full reduced
Likelihood-ratio test LR chi2(1) = 8.23
(Assumption: reduced nested in full) Prob > chi2 = 0.0041
The \(p\)-value for our test is 0.0041, so we would reject the null hypothesis that the interaction term equals zero.
6.2.3 Step 5: Automate
Next, let’s write a program that creates datasets under the alternative hypothesis, fits the mixed effects models, tests the null hypothesis of interest, and uses the simulate
command to run many iterations of the program.
capture program drop simmixed
program simmixed, rclass
version 16
// PARSE INPUT
syntax, n1(integer) ///
n(integer) ///
[ alpha(real 0.05) ///
intercept(real 5.35) ///
age(real 3.6) ///
female(real -0.5) ///
interact(real -0.25) ///
u0i(real 0.25) ///
u1i(real 0.60) ///
eij(real 1.2) ]
// COMPUTE POWER
quietly {
drop _all
set obs `n'
generate child = _n
generate female = rbinomial(1,0.5)
generate u_0i = rnormal(0,`u0i')
generate u_1i = rnormal(0,`u1i')
expand `n1'
bysort child: generate age = (_n-1)*0.5
generate interaction = age*female
generate e_ij = rnormal(0,`eij')
generate weight = `intercept' + `age'*age + `female'*female + ///
`interact'*interaction + u_0i + age*u_1i + e_ij
mixed weight age i.female c.age#i.female || child: age, iter(200)
local conv1 = e(converged)
estimates store full
mixed weight age i.female || child: age, iter(200)
local conv2 = e(converged)
estimates store reduced
lrtest full reduced
local reject = cond(`conv1' + `conv2'==2, (r(p)<`alpha'), .)
}
// RETURN RESULTS
return scalar reject = `reject'
return scalar conv = `conv1'+`conv2'
end
We then use simulate to run simmixed 10 times using the default parameter values for 5 observations on each of 200 children.
simulate reject=r(reject) converged=r(conv), reps(10) seed(12345):
simmixed, n1(5) n(200)
command: simmixed, n1(5) n(200)
reject: r(reject)
converged: r(conv)
Simulations (10)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
simulate
saved the results of the hypothesis tests to a variable named reject. The mean of reject is our estimate of the power to test the null hypothesis that the age×sex interaction term equals zero, assuming that the weight of 200 children is measured 5 times.
We could stop with our quick simulation if we were interested only in a specific set of assumptions. But it’s easy to write an additional program named power\_cmd\_simmixed
that will allow us to use Stata’s power
command to create tables and graphs for a range of sample sizes.
capture program drop power_cmd_simmixed
program power_cmd_simmixed, rclass
version 16
// PARSE INPUT
syntax, n1(integer) ///
n(integer) ///
[ alpha(real 0.05) ///
intercept(real 5.35) ///
age(real 3.6) ///
female(real -0.5) ///
interact(real -0.25) ///
u0i(real 0.25) ///
u1i(real 0.60) ///
eij(real 1.2) ///
reps(integer 1000) ]
// COMPUTE POWER
quietly {
simulate reject=r(reject), reps(`reps'): ///
simmixed, n1(`n1') n(`n') alpha(`alpha') intercept(`intercept') ///
age(`age') female(`female') interact(`interact') ///
u0i(`u0i') u1i(`u1i') eij(`eij')
summarize reject
}
// RETURN RESULTS
return scalar power = r(mean)
return scalar n1 = `n1'
return scalar N = `n'
return scalar alpha = `alpha'
return scalar intercept = `intercept'
return scalar age = `age'
return scalar female = `female'
return scalar interact = `interact'
return scalar u0i = `u0i'
return scalar u1i = `u1i'
return scalar eij = `eij'
end
It’s also easy to write a program named power\_cmd\_simmixed\_init
that will allow us to simulate power for a range of values for the parameters in our model.
capture program drop power_cmd_simmixed_init
program power_cmd_simmixed_init, sclass
version 16
sreturn clear
// ADD COLUMNS TO THE OUTPUT TABLE
sreturn local pss_colnames "n1 intercept age female interact u0i u1i eij"
// ALLOW NUMLISTS FOR ALL PARAMETERS
sreturn local pss_numopts "n1 intercept age female interact u0i u1i eij"
end
6.2.4 Step 6: Summarize
Now, we can use power simmixed
to simulate power for a variety of assumptions. The example below simulates power for a range of sample sizes at both levels 1 and 2. Level 2 sample sizes range from 100 to 500 children in increments of 100. At level 1, we consider 5 and 6 observations per child.
power simmixed, n1(5 6) n(100(100)500) reps(1000)
table(n1 N power)
graph(ydimension(power) xdimension(N) plotdimension(n1)
xtitle(Level 2 Sample Size) legend(title(Level 1 Sample Size)))
xxxxxxxxxxxxxxxxxxxxxxxxxxx
Estimated power
Two-sided test
+-------------------------+
| n1 N power |
|-------------------------|
| 5 100 .2629 |
| 6 100 .313 |
| 5 200 .397 |
| 6 200 .569 |
| 5 300 .621 |
| 6 300 .735 |
| 5 400 .734 |
| 6 400 .855 |
| 5 500 .828 |
| 6 500 .917 |
+-------------------------+