Output Stata’s ttest results with esttab including means of each group

Esttab’s website provides a way of outputting results from ttest but there is no option to include the means of each group separately, only the difference between them.

Using a combination of estpost and estadd I have come up with a way of also including the means of each group in the table.

You can actually use this system to include any other summary statistic you may want in the table.

See the code below.

* use Stata’s dataset auto

sysuse auto

* the goal here is to build a table with two-sample mean-comparison tests for price, mpg and headroom by foreign;

* if you just want differences to be displayed in the table run:

estpost ttest price mpg headroom trunk, by(foreign)
esttab , noobs cells("b(star fmt(4)) se(fmt(4)) count(fmt(0))") star(* 0.1 ** .05 *** 0.01) ///
collabels("Diff." "Std. Error" "Obs.")

* if you instead also want the means of each group to be displayed, you need some extra steps:

* save summarize results for foreign==1 to an e-list using estpost

estpost sum price mpg headroom trunk if foreign==1

* save the matrix that stores the means e(mean) as matrix meanf1

matrix meanf1=e(mean)
matrix list meanf1

* save summarize results for foreign==0 to an e-list using estpost

estpost sum price mpg headroom trunk if foreign==0

* save the matrix that stores the means e(mean) as matrix meanf0

matrix meanf0=e(mean)
matrix list meanf0

* save ttest results to an e-list using estpost

estpost ttest price mpg headroom trunk, by(foreign)

* add the two matrices containing the means to ttest’s e-list

estadd matrix meanf1
estadd matrix meanf0

* now e-list contains the matrices e(meanf1) and e(meanf0) and you can add them in the cells option of esttab

esttab , noobs cells("meanf1(fmt(4)) meanf0(fmt(4)) b(star fmt(4)) se(fmt(4)) count(fmt(0))") star(* 0.1 ** .05 *** 0.01) ///
collabels("Mean(Foreign=1)" "Mean(Foreign=0)" "Diff." "Std. Error" "Obs.")

Here’s the output.



Merge datasets by partially matching key variables in Stata

I work with messy administrative data and very often have to merge datasets by people’s or cities’ names. String variables often come with typos, different spelling, etc. Think about languages that use diacritical marks and you have a complete mess.

Stata has a nice user written command called reclink built for this purpose. It uses record linkage methods to match identifiers in the two datasets. A nice feature is that you can use more than one identifier variable and give weights to each of them. For example, in the code below I want to merge the datasets based on make and foreign, but I know the mismatches are in the variable make, so I give a large weight (10) to it and only 2 to foreign.

To use reclink you also have to create ids in each dataset and feed them into the options idm() and idu(). This will allow the code to display the matches in terms of ids as well. Additionally, it creates a variable that contains the matching scores for the merged observations. The scores are in the 0-1 range and one means exact matches.

Differently from Stata’s merge, the resulting dataset will include all the variables from both datasets. The code adds a capital “U” to the beginning of each of the Using dataset variables. This is pretty handy because you can eyeball the matches straight away and check whether the weights you defined have done a good job. It’s easy enough to just drop all these afterwards. See the example below in which I have used Stata’s auto.dta. Check here for more details.

ssc install reclink
sysuse auto, clear
keep make price foreign
gen id_using=_n
replace make = make + "aa" in 1
replace make = make + "bb" in 2
save tempauto
sysuse auto, clear
keep make price foreign
gen id_master=_n
reclink make foreign using tempauto, gen(myscore) idm(id_master) idu(id_using) wmatch(10 2)
list in 1/5

How to save contents from Stata’s review window into a do-file

Screen Shot 2016-04-10 at 10.48.53

When I need to test some commands or run quick queries in a dataset it’s always easier to jump in and type the commands directly into the command window; specially because you can just pick the variables you need from the variables window (yes Stata Corp, you do need to add more functionality to your do-file editor!). In the end, I always regret not have written those commands in a do-file instead. There is a solution for that:

  1. Right click on a line of code in the command window, and
  2. One menu window will pop-up with the options of selecting all commands and sending them to a do-file. Alternatively, you can select only those lines you need with left click + cmd (or ctrl in Windows) and then right click and choose send to do-file editor. In both cases a new do-file window will show up with all the commands selected.

rbind multiple data frames loaded in the current environment

do.call("rbind", lapply(ls(),get))

“do.call” applies the function “rbind”, which combines data frames by rows (only two at a time), to a list of data frames loaded in the current environment. “ls()” gives a string vector of the data frames’ names. We therefore need to use “get” to input the data frames that carry these names instead of just their names. “lapply” applies “get” to each of the elements of “ls()” and returns a list containing the data frames, which is necessary when working with “do.call”.