Korea Hangeul Help Sitemap Calc Link KB
Stata Data Analysis
Stata is a general purpose statistical software package available for PC, Mac OS, and UNIX and works in the interactive, non-interactive, or point-and-click modes. Stata has four flavors: Small, Intercooled (Standard), Special Edition (SE), SE for multiprocessor (MP).
In UNIX, type in "stata -b do file_name" to run Stata in the non-interactive mode.
Version 8 revised graphics commands with enhanced features; Version 9 introduced Mata, a new matrix manipulation language, which is similar to SAS/IML; Version 12 introduced .sem to fit structural equation models.

Stata Icon STATA BASICS

Basic Commands

  • help regress
  • display "Hello"
  • di di (1-normal(1.96))*2
  • di sqrt(3.14)
  • describe x1 x2
  • des using jeeshim.dbt
  • list in 1/10
  • list male-income
  • list pop* pro? if male==1
  • list -2
  • summarize male grade
  • sum x1-x10 y* post?
  • sum income family if male~=.
  • sum income if (male==1) & (class >=3)
  • tabulate male educate
  • tab male grade, chi2 row col
  • tabi 12 33 \ 34 53, chi2 exact
  • tabi 34 53 23 \ 23 56 34 \ 45 32 21, chi2 all
  • tabstat math english, by(male) stats(n mean sum sd var max min skewness)

Management Commands

  • codebook male
  • label data "Pew Internet and American Life Project"
  • label variable male "Gender"
  • label variable male; // to remove a variable lable
  • label define yn 1 yes 0 no
  • label values open yn
  • compare math english
  • cf math english using indiana.dta
  • ci grade if male==0; /* confidence interval */
  • count if math >90
  • lookfor gnp
  • notes var: Need to be verified
  • update all
  • net search Spost
  • net from http://www.indiana.edu/~jslsoc/stata
  • net describe spost9_ado
  • net spost9_ado
  • net get spost9_do
  • ssc whatsnew
  • ssc describe ...
  • ssc install ...

Stata Icon STATA FUNCTIONS

Operators

  • +, -, *, /, ^
  • >, >=, <, <=, == (equal), ~= (not equal)
  • & (and), | (or), ~ (not)
  • +=, =+, -=, =-, /=, %=, &=, |=, *=
  • in (in if command)
  • + (other variables), 1/b (from a through b), . (missing values)
  • wild card (*, ?, /, -)
  • concatenation (+)

Math Functions

  • abs(); sin(); cos(); tan();asin(); acos(); atan()
  • ceil(); floor(); int() or trunc(); round()
  • exp(); sqrt(); log(); ln(); log10()
  • min(); max(); sign(); sum(); mod(x,y); comb(x,k)

Probability Distribution

  • binomal(h,k,p) // joint cumulative distribution of bivariate normal
  • chi2(df,x) // cumulative chi squared distribution
  • chi2tail(n,x) // reverse of chi2()
  • F(df1, df2, f) // cumulative F distribution
  • Ftail(df1, df2, f) // reverse cumulative (upper-tail) F
  • normal(z); normal(1.96) // returns .9750002
  • ttail(df, t) // reverse cumulative (upper-tail) T
  • uniform() // uniform distributionreverse cumulative (upper-tail) T
  • di chi2tail(10,18.31) // returns .04995417, p-value
  • di F(5, 10, 3.325) // returns .9499661
  • di Ftail(5, 10, 3.325) // returns .05000, the pa-value
  • di (1-normal(z))*2 // compute the p-value for the two-tailed test
  • di ttail(20, 2.086) // returns .02499818
  • di ttail(df, t)*2 // compute the p-value for the two-tailed test

String Functions

  • char(n); length(s); trim(s); ltrim(s); rtrim(s)
  • string(n); substr(s,begin, length)
  • real(s); reverse(s); word(); lower(); upper()

Stata Icon INPUT / OUTPUT

Handling Data Sets

  • use "k:\kucc625\open.dta", clear
  • use jeeshim.dta, clear nolabel
  • use using jeeshim.dta if gender==1, clear nolabel
  • save "c:\kucc625\open.dta", replace
  • save open.dta, replace nolabel
  • saveold "c:\kucc625\jeeshim.dta", replace
  • log using "c:\kucc625\open.log", append
  • log using open.log, append text
  • log on // log off; log close
  • logcmd using "c:\kucc625\open_cmd.log", replace
  • logcmd on // logcmd off; logcmd close

Import

  • infile a b c using jeeshim.txt, clear
  • infile str15 name float weight int height using student.txt
  • inf id _skip(1) q1-q3 using student.txt, clear
  • inf str20 id long (q1-q3) using student.txt, clear
  • inf id double (q10-q13) income if income >50000 using student.txt
  • inf using student.dct, clear
  • infix year 1-4 gnp 5-9 interest 10-13 using macro.txt
  • infix using macro.dct in 1/100, clear
  • insheet using jeeshim.csv, clear
  • insheet a b c d e f g using student.txt, comma clear
  • insheet using student.txt, delim("#") clear

Export

  • odbc list
  • odbc load ID=year gnp interest in 1/500, table("macro") dsn("jeeshim")
  • outfile using jeeshim.txt
  • outfile x1-x10 using jeeshim.txt, wide replace
  • outfile using jeeshim.txt, nolabel noquote replace
  • outsheet using jeeshim.xls, nolabel /* tab delimited */
  • outsheet using jeeshim.xls, comma replace

Stata Icon DATA MANIPULATION

Editing

  • keep gender grade korean math english if gender=1
  • keep id q1-q20
  • drop temp1-temp5 if gender=0
  • drop temp* pro? if income <5000
  • drop in 1/10
  • drop if gender==1 in 1/100
  • edit
  • edit in 10/-5
  • edit if gender==0
  • edit in 1/100 if income >5000
  • edit male class if income >=30000
  • mark yn_miss
  • markout yn_miss q1-q10 //0 if any one of variable has missing
  • isid college // to check for unique identifiers

Recoding

  • generate gender; gen gender=male
  • gen square=gnp^2
  • gen grade=(score <= 90 | attendance==0) if final~=.
  • egen avg = rowmean(english math stats)
  • egen gnp_bar = mean(gnp), by(country)
  • replace gender=0
  • replace gender=1 if male==1
  • replace male=1 in 3
  • recode class 1=0 2=1 *=.
  • recode class 1/3=0 4=1 if male==0
  • recode grade 1 2 3 5=1 4=2
  • recode grade 9999=.
  • recode grade min/5=min
  • recode grade 6/max=max

Reshaping Data Sets

  • set obs 100 // to change the numnber of observations
  • sort male grade
  • gsort -grade name, gen(rank)
  • append using c:\data\class
  • app using c:\data\class, keep(id state q1-q10)
  • expand 5 in -10/-1 // duplicate observations n-1 times
  • merge using school // one-to-one merging
  • merge state using school // match merging
  • merge state using school university, update replace
  • joinby id using secondary, unmatched(master) // unm(both), unm(using)
  • move male grade
  • order grade male // order variables as listed
  • rename male gender; /* from male to gender */
  • expand 5 if state=="IN" // duplicate a subset of observations
  • collapse a b (sd) c (count) d (max)
  • collapse a b (sd) c (count) d (max), by state
  • contract gender degree area, freq(count) zero
  • reshape long choice, i(id) j(orders)
  • stack best1-best3, into(best) clear
  • pkshape id row col1-col3, order(abc cab bca) outcome(y) sequence(rows) treat(treat) period(columns)
  • compress // all variable
  • compress name grade
  • xpose, clear varname

Stata Icon REGRESSION

Ordinary Least Squares (OLS)

  • regress dv iv1 iv2
  • regress depend indep1-indep10, noconstant
  • regress income school job location if gender==0
  • regress income school job location if gender==0, noconstant level(95)
  • predict p // xb
  • predict r, residual
  • fitstat
  • quietly fitstat, saving(model1)
  • fitstat, using(model1)
  • fitstat, dif
  • bgodfrey, lag(1 2 3); estat bgodfrey, lag(1 2 3)
  • dwstat; estat dwatson, lag(1 2 3)
  • stepwise, pr(.2): regress y x1-x10 // backward stepwise regression
  • constraint define 1 d1+d2+d3=0 // LSDV2 in Stata
  • constraint define 2 g1+g2+g3+g4=0
  • cnsregs y x1 x2 d1-d3 g1-g4, constraint(1 2)

Hypothesis Test

  • test school /* Wald Test */
  • test school location; test school=location
  • test job; test school, accumulate
  • lrtest, saving(0); /* Likelihood Ratio Test */
  • lrtest, saving(1)
  • lrtest, using(1) model(0) /* 1 for full model and 0 for nested one*/

Advanced Models

  • boxcox //Box-Cox regression model
  • eivreg // errors-in-variables regression
  • fracpoly // Fractional polynomial regression
  • frontier //Stochastic frontier models
  • glm // generalized linear model
  • intreg //interval regression
  • ivreg //instrumental variables (two-stage least squares) regression
  • ivreg dv iv1 iv2 (iv3= x1 x2 x3) iv4 iv5
  • mfp //multivariable fractional polynomial models
  • mvreg //multivariate regression
  • newey //Regression with Newey-West standard errors
  • nl //nonlinear least-squares estimation
  • orthog //Orthogonalize variables and compute orthogonal polynomials
  • prais dv1 rhs, rho(tscorr) twostep //Prais-Winsten two-step
  • prais dv1 rhs, rho(dw) // iterative two-step
  • prais dv1 rhs, rho(dw) corc // Cochrane-Orcutt
  • qreg //Quantile (including median) regression
  • reg3 //three-stage estimation for systems of simultaneous equations
  • reg3 (dv1 x1 x2) (dv2 x1 x3)
  • reg3 (dv1 dv2 = x1 x2 x3)
  • reg3 (dv1 dv2 = x1 x2 x3) (dv3 x1 x3)
  • rocfit //fit ORC model
  • rreg //robust regression
  • stcox //fit Cox proportional hazards model
  • streg //fit parametric survival model
  • sureg //Zellner's seemingly unrelated regression
  • stepwise //stepwise estimation
  • treatreg //treatment-effects model
  • treatreg y x1 x2 x3, treat(x4=z1 z2) twostep
  • vwls //variance-weighted least squares

Stata Icon PANEL DATA
  • tsset group year // set group and time
  • xtreg y x1 x2, re i(year) // random effect model
  • xtreg y x1 x2, fe i(group) // random effect model
  • xtreg y x1 x2, be i(group) // between effect model
  • areg // linear regression with a large dummy-variable set
  • xtabond // Arellano-Bond linear, dynamic panel-data estimator
  • xtcloglog // Random-effects, population-averaged cloglog models
  • xtgee // fit population-averaged panel-data models using GEE
  • xtfrontier // stochastic frontier models for panel data
  • xtgls // fit panel-data models using GLS
  • xthtaylor // Hausman-Taylor estimator for error components models
  • xtinreg // random-effects interval data regression models
  • xtivreg // Instrumental variables and two-stage least squares
  • xtlogit //fixed-effects, random-effects, population-averaged logit
  • xtmixed // multilevel mixed-effects linear regression
  • xtprobit // random-effects and population averaged probit models
  • xttobit // random-effects tobit models
  • xtnbreg //fixed-effects, random-effects, and population-averaged NB
  • xtpcse // Prais-Winsten models with panel-corrected standard errors
  • xtpoisson //fixed-effects, random-effects, population-averaged Poisson
  • xtrc // random-coefficients models
  • xtregar // fixed-and random-effects linear models with an AR(1)

Stata Icon LOGIT / PROBIT MODELS

Binary Logit/Probit

  • logit dv iv1 iv2
  • logit card income school job if gender==0, nolog nocon
  • logistict dv iv1 iv2
  • probit dv iv1 iv2
  • predict p
  • prchange; prchange income, x(school=1 job=1)
  • prchange school, x(income=10000) help
  • prtab income school, rest(mean)
  • prgen income, from(0) to(10000) x(school=1) rest(mean)
  • prvalue, rest (mean)
  • prvalue, x(income=10000 job=1) rest(mean)

Ordinal and Multinomial

  • ologit dv iv1 iv2 // Ordinal
  • ologit grade income school job if gender==0, nolog nocon
  • oprobit dv iv1 iv2
  • omodel logit card income school job // Approximate LR test
  • mlogit dv iv1 iv2; /* Nominal */
  • mlogit mode income school job, basecategory(1) nolog
  • mlogtest, lr
  • mlogtest, wald
  • mlogtest, hausman base
  • mprobit // multinomial probit regression
  • clogit dv iv1 iv2, group(var) // Conditional logit
  • clogit mode income school job, group(gender) nolog
  • nlogit // nested logit regression

Special Logit/Probit

  • asmprobit // alternative-specific multinomial probit
  • biprobit (dv1=rhs1) (dv2=rhs2) // bivariate probit
  • glogit //logit and porbit for grouped data
  • heckprob dv rhs, select(rhs2) // probit model with selection
  • hetprob // heteroskedastic probit
  • ivprobit // probit model with endogenous regressions
  • rologit // rank-ordered logistic
  • scobit // Skewed logit
  • slogit // sterotype logistic
  • xtlogit // logit models for panel data
  • xtprobit // probit models for panel data

Stata Icon EVENT COUNT / LIMITED DV MODELS

Event Count Data Models

  • poisson dv iv1 iv2
  • nbreg dv iv1 iv2
  • zip dv iv1 iv2 // zero-inflated Poisson Model
  • zinb dv iv1 iv2 // zero-inflated NB Model
  • ztp dv iv1 iv2 // zero-truncated Poisson Model
  • ztnb dv iv1 iv2 // zero-truncated NB Model

Truncated/Censored/Self-selected

  • cnreg // Censored-normal regression
  • heckman // Heckman selection model
  • ivtobit // Tobit model with endogenous regressors
  • tobit // Tobit regression
  • truncreg // truncated regression
  • ztp dv iv1 iv2 // zero-truncated Poisson Model
  • ztnb dv iv1 iv2 // zero-truncated NB Model

Related Commands

  • bootstrap // bootstrap sampling and estimation
  • bsample // Sampling with replacement
  • jackknife // Jackknife estimation
  • impute // imputation
  • permute // Monte Carlo permutation test
  • simulate // Monte Carlo simulation
  • sampsi // sample size and power determination

Stata Icon ANOVA / T-TEST

T-Test

  • ttest grade==10
  • ttest grade, by(male)
  • ttest grade, by(male), unequal welch
  • ttest math=english; ttest math==english, unpaired
  • ttesti 100 88.1 5.2 90; /* N mean sd hypothesis */
  • ttesti 100 88.1 5.2 200 91 10.2; /* N1 mean1 sd1 N2 mean2 sd2 */
  • ttesti 100 88.1 5.2 200 91 10.2, unequal
  • mean // estimate means
  • total // estimate totals
  • ratio // estimate ratios
  • proportion // one- and two-sample tests of proportions
  • ci // confidence intervals for means, proportions, and counts

ANOVA

  • anova score gender
  • anova score gender year gender*year
  • oneway score gender, tabulate
  • loneway //large one-way ANOVA, random effect, and reliability
  • sdtest // Variance-comparison test
  • Related: .manova; .pkshape; .xtmixed

Stata Icon MULTIVARIATE ANALYSIS

Correlation Analysis

  • correlate gnp interest inflation
  • corr gnp interest inflation, covariance
  • pcorr x1-x10 // partial correlation coefficients
  • pwcorr gnp interest inflation, sig
  • pwcorr gnp interest inflation, print(5) // .05 significance level
  • pwcorr gnp interest inflation, sig star(.05) // .05 level

Factor Analysis

  • factor x1-x30 // by default pcf (principal component factor)
  • factor x1-x30, ml // maximum likelihood factor
  • factor x1-x30, factors(5)
  • rotate, varimax // orthogonal, oblique, quartimax, equamax, parsimax, promax
  • pca // principal component analysis

Other Analysis

  • alphar // Cronbach's alpha
  • ca // correspondence analysis
  • canon // Canonical correlation
  • cluster // cluster analysis
  • mvreg // multivariate regression
  • manova // multivariate MANOVA
  • mds // multidimensional scaling for two way data
  • mdslong
  • mdsmat
  • biplot

Stata Icon NONPARAMETRIC ANALYSIS
  • swilk math english
  • sfrancia x1-x10
  • ranksum //Equality tests on unmatched data
  • signrank math=english // Equality tests on matched data
  • runtest // test for random order
  • spearman x1-x10
  • kwallis score, by(gender)
  • ksmirnov math, by(area)
  • alpha x1-x10, item
  • kappa eval1 eval2
  • bitest //Binomial probability test
  • prtest // one- and two-sample tests of proportions

Stata Icon PROGRAMMING

Basics

  • version; version 7
  • memory
  • set memory 100m
  • macro list // list a macro list
  • macro list _x // display contents of a macro x
  • macro drop _all

Macros

  • global var "dv"; global i=1;
  • gen $dv$i = income; gen ${dv$i}=income; // gen dv1=income
  • global rhs "x1 x2 x3"; regress y $rhs
  • local var "dv"; local i=1;
  • gen `dv'`i' = income; gen `dv`i''=income; // gen dv1=income
  • local rhs "x1 x2 x3"; regress y `rhs'
  • local x=stat[1,1] // from a matrix
  • display `x'

Loop

local i=0
foreach var in var1-var10 {
   local i=`i'+1
   sum `var'
   display `i'
}

foreach var of varlist var1-var10 {
   ...
}

local car "Sonata Camry Accord"
foreach lm of local car {
   display "`lm'"
}

global gm "Sonata Camry Accord"
foreach gm of global car {
   display "`gm'"
}

foreach str in "Sonata Camry Accord" {
   display "`str'"
}

forvalues num = 1/100 {
   ...
}

  • forvalues num = 1(1)100 {...}
  • forvalues num = 100(-1)1 {...}
  • forvalues num = 1 2: 100 {...}
  • forvalues num = 1 2 to 100 {...}

Matrices

  • mata // beginning of mata
  • : end // end of mata
  • :a=(1, 2 \ 3, 4); :b=(6, 3 \ 4, 9)
  • :a; :a[1,1]; :a[2,] :b[,2]
  • :c=a+b; :d=a-b
  • :e=a*b; :f=a:*b //element by element
  • :diag(); :diagonal() // create a diagonal matrix and extract diagonal
  • :vec() //convert matrix into column vector
  • :trace(); :det(); :rank()
  • :cond(); :eigenvalues();
  • :cholesky()
  • :I(); :e() // identity matrix, unit vectors
  • :J(row, col, constant) // matrix of constants
  • matrix input Y=(1, 2 \ 3, 4)
  • mkmat x1 x2 x3, matrix(X) // convert variables into a matrix
  • svmat x, name(x) // convert a matrix into variables (x1, x2, x3)
  • matrix COV=get(VCE)
  • matrix BETA=get(_b)
  • matrix list X
  • matrix INVX=inv(X)
  • matrix DETX=det(X)

Stata Icon STATA GRAPHICS

Graphics Basics

  • sysuse auto, clear
  • graph bar (mean) mpg turn, by(foreign)
  • graph bar (mean) mpg turn, over(foreign)
  • graph hbar (mean) mpg turn, over(foreign)
  • graph hbar (mean) mpg, over(foreign) over(class)
  • graph hbar (mean) mpg, over(class) over(foreign)
  • graph hbar (mean) mpg, over(class) over(foreign, sort(1) descending)
  • graph hbar (sum) mpg turn, over(class) stack
  • graph hbar (sum) mpg turn, over(class) by(foreign)
  • graph hbar (sum) mpg turn, over(class) by(foreign) stack
  • graph dot (mean) mpg, over(class)
  • graph dot (mean) mpg, over(class) over(foreign)
  • graph matrix mpg price turn, half
  • graph pie, over(class)
  • graph pie mpg turn trunk, plabel(_all name)

Scatter and Two-way Plotting (Examples)

  • scatter mpg weight
  • scatter mpg weight, sort
  • scatter mpg weight, sort connect(1)
  • scatter mpg weight, sort title("MPG versus Weight") subtitle("Year 2006")
  • scatter mpg weight, title("MPG versus Weight") caption("Source: Stata Corp. 2006")
  • scatter mpg weight, title("MPG versus Weight") xsize(4) ysize(3)
  • scatter mpg weight, ytitle("MPG (Mileage)") xtitle("Car Weight")
  • scatter mpg weight, title("MPG versus Weight") ylabel(#8) xlabel(0(2000)6000)
  • scatter mpg weight, title("MPG versus Weight") ylabel(minmax) xlabel(minmax)
  • scatter mpg weight, title("MPG versus Weight") yscale(log) xlabel(#5) // log scales
  • scatter mpg weight, sort xline(4000) yline(25)
  • scatter mpg weight, title("MPG versus Weight") msymbol(triangle)
  • scatter mpg weight || fpfit mpg weight
  • twoway fpfitci mpg weight
  • twoway fpfitci mpg weight || scatter mpg weight, m(d)
  • scatter mpg weight, sort title("MPG versus Weight") m(diamond) by(foreign)
  • scatter mpg weight, sort m(t) by(foreign, total row(1))
  • twoway fpfitci mpg weight, sort m(t) by(foreign, total row(1))
  • twoway fpfitci mpg weight || scatter mpg weight, sort m(t) by(foreign, total row(1))
  • scatter mpg turn weight
  • scatter mpg turn weight , yline(30) xline(3500)
  • scatter mpg trunk turn weight
  • scatter mpg weight || scatter trunk weight || scatter turn weight
  • scatter mpg weight, sort c(1) || line trunk weight, sort || scatter turn weight
  • twoway (line mpg weight, sort c(1)) (dropline trunk weight, sort) (scatter turn weight)

Plotting by Functions

  • twoway function y=x^3, range(-5 5) xsize(4) ysize(3) xlabel(#10) xline(0)
  • twoway function y=normalden(x), range(-5 5) xsize(4) ysize(2) xlabel(#10) xline(0)
  • twoway function y=1/sqrt(2*_pi)*exp(-x^2/2), range(-5 5) xsize(4) ysize(2) xlabel(#10) xline(0)
  • twoway function y=normalden(x), range(-4 -1.96) xlabel(#10) xline(0) recast(area) || function y=normalden(x), range(1.96 4) recast(area) || function y=normalden(x), range(-1.96 1.96) lstyle(foreground)
  • twoway function t=tden(3, x), range(-5 5) xsize(4) ysize(2) xline(0)
  • twoway function t=tden(1, x), range(-5 5) xsize(4) ysize(2) color(blue) lstyle(p1solid) xlabel(-5(1)5) recast(area) || function z=normden(x), range(-5 5) color(maroon) lwidth(thick)
  • scatter gear_ratio headroom, xsize(4) || function y=x, range(0 5)
  • twoway function c=chi2(1,x), range(0 5) xsize(4) ysize(3) yline(.5)
  • twoway function c=Fden(5, 10, x), range(0 5) xsize(4) ysize(3) yline(.3)