Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Ad image

You can learn a lot from simple simulations (examples of experimental design with unequal distribution for treatment and control data).

MONews
4 Min Read

We were talking about blocking in today’s class experiment, and one student asked, “When should there be unequal units in the treatment and control?”

I replied that the simplest example was when treatment was not expensive. There may be 10,000 people in the population, but 99%will be in the control group because there is only a budget enough to apply treatment to 100 people. In other environments, treatment can be destructive and can only be applied to small parts of the available unit.

However, even if the cost is not interested and wants to maximize the statistical efficiency, it may be reasonable to assign different numbers to the two groups.

For example, I began to assume that your consequences are much more variable under treatment than control. Then, the basic estimates of the treatment effect (the average result of the therapeutic group, the more therapeutic observation to remove the average between the control group, and minimize to explain higher dispersion.

But I stopped for a while. I was confused.

There are two intuitions and the opposite direction.

(1) Treatment observation is more variable than control. Therefore, more treatment measurement is required to obtain an accurate quote for the treatment group.

(2) Processing observation is more variable than the control group. Therefore, the treatment observation should be scared and pay more budget for high quality control measurement.

I felt that the correct reasoning was not (1), but (2), but I was not sure.

How did you solve the problem?

Ruthless power.

Here r:

n <- 100
expt_sim <- function(n, p=0.5, s_c=1, s_t=2){
  n_c <- round((1-p)*n)
  n_t <- round(p*n)
  se_dif <- sqrt(s_c^2/n_c + s_t^2/n_t)
  se_dif
}
curve(expt_sim(100, x), from=.01, to=.99,
  xlab="Proportion of data in the treatment group",
  ylab="se of estimated treatment effect",
  main="Assuming sd of measurements is\ntwice as high for treated as for controls",
  bty="l")

The results are as follows:

Oh, shooting, I don’t really like how the Y -axis is not zero. Distributed decrease is more dramatic than it is actually. Zero is in the neighborhood, so let’s invite you:

curve(expt_sim(100, x), from=.01, to=.99,
  xlab="Proportion of data in the treatment group",
  ylab="se of estimated treatment effect",
  main="Assuming sd of measurements is\ntwice as high for treated as for controls", 
  bty="l",
  xlim=c(0, 1), ylim=c(0, 2), xaxs="i", yaxs="i")

And we can see the answer. If the control group is more than double the treatment group, it is necessary to perform twice as much as the treatment group. The curve is minimized at x = 2/3 (you can check anything without ploting, but the graph provides some intuition and mental inspection). The above argument 1 is correct.

On the other hand, the standard error of the optimal design is not much lower than the simple 50/50 design, as can be seen by calculating the ratio.

print(expt_sim(100, 1/2) / expt_sim(100, 2/3))

Create 0.95.

Therefore, if the design is improved, the standard error is reduced by 5%. In other words, 10% efficiency increases. Nothing, but not big.

Anyway, the main point of this post is that you can learn a lot from simulation. Of course, in this case, the problem can be solved analytically. S_t/s_c. It’s all okay, but I like the Brute-Force solution.

Share This Article
Leave a comment