Stata

Better insight starts with Stata®

Stata statistical software provides everything you need for data science and inference:

data manipulation, exploration, visualization, statistics, reporting, and reproducibility.

Linear models

regression • censored outcomes • endogenous regressors • bootstrap, jackknife, and robust and cluster–robust variance • instrumental variables • three-stage least squares • constraints • quantile regression • GLS • more

Time series

ARIMA • ARFIMA • ARCH/GARCH • VAR • VECM • multivariate GARCH • unobserved-components model • dynamic factors • state-space models • Markov-switching models • business calendars • tests for structural breaks • threshold regression • forecasts • impulse–response functions • unit-root tests • filters and smoothers • rolling and recursive estimation GLS • more

Data wrangling/data management

data transformations • data frames • match-merge • import/export data • ODBC • SQL • Unicode • by-group processing • append files • sort • row–column transposition • labeling • save results • more

Panel/longitudinal data

random and fixed effects with robust standard errors • linear mixed models • random-effects probit • GEE • random- and fixed-effects Poisson • dynamic panel-data models • instrumental variables • panel unit-root tests • more

Survival analysis

Kaplan–Meier and Nelson–Aalen estimators, • Cox regression (frailty) • parametric models (frailty, random effects) • competing risks • hazards • time-varying covariates • left-, right-, and interval-censoring • Weibull, exponential, and Gompertz models • more

Reporting

reproducible reports • Word • Excel • PDF • HTML • dynamic documents • Markdown • Stata results and graphs • SVG • EPS • PNG • TIF • formatted text and tables • more

Multilevel mixed-effects models

continuous, binary, count, and survival outcomes • two-, three-, and higher-level models • generalized linear models • nonlinear models • random intercepts • random slopes • crossed random effects • BLUPs of effects and fitted values • hierarchical models • residual error structures • DDF adjustments • support for survey data • more

Bayesian analysis

thousands of built-in models • univariate and multivariate models • linear and nonlinear models • multilevel models • continuous, binary, ordinal, and count outcomes • bayes: prefix for 46 estimation commands • continuous univariate, multivariate, and discrete priors • add your own models • multiple chains • convergence diagnostics • posterior summaries • hypothesis testing • model fit • model comparison • predictions • more

Graphics

lines • bars • areas • ranges • contours • confidence intervals • interaction plots • survival plots • publication quality • customize anything • Graph Editor • more

Binary, count, and limited outcomes

logistic, probit, tobit • Poisson and negative binomial • conditional, multinomial, nested, ordered, rank-ordered, and stereotype logistic • multinomial probit • zero-inflated and left-truncated count models • selection models • marginal effects • more

Meta-analysis

effect sizes • common, fixed, and random effects • forest, funnel, and more plots • subgroup and cumulative analysis • meta-regression • small-study effects • publication bias • more

Programming features

adding new commands • scripting • object-oriented programming • menu and dialog-box programming • dynamic documents • Markdown • Project Manager • Python integration • Java plugins • C/C++ plugins • more

Choice models

discrete choice • rank-ordered alternatives • conditional logit • multinomial probit • nested logit • mixed logit • panel data • case-specific and alternative-specific predictors • interpret results—expected probabilities, covariate effects, comparisons across alternatives • more

Power, precision, and sample size

power • sample size • effect size • minimum detectable effect • CI width • means • proportions • variances • correlations • ANOVA • regression • cluster randomized designs • case–control studies • cohort studies • contingency tables • survival analysis • balanced or unbalanced designs • results in tables or graphs • more

Mata—Stata's serious programming language

interactive sessions • large-scale development projects • optimization • matrix inversions • decompositions • eigenvalues and eigenvectors • LAPACK engine • real and complex numbers • string matrices • interface to Stata datasets and matrices • numerical derivatives • object-oriented programming • more

Extended regression models (ERMs)

endogenous covariates • sample selection • nonrandom treatment • panel data • account for problems alone or in combination • continuous, interval-censored, binary, and ordinal outcomes • more

Treatment effects/Causal inference

inverse probability weight (IPW) • doubly robust methods • propensity-score matching • regression adjustment • covariate matching • multilevel treatments • endogenous treatments • average treatment effects (ATEs) • ATEs on the treated (ATETs) • potential-outcome means (POMs) • continuous, binary, count, fractional, and survival outcomes • panel data • more

Graphical user interface

menus and dialogs for all features • Data Editor • Variables Manager • Graph Editor • Project Manager • Do-file Editor • Clipboard Preview Tool • multiple preference sets • more

Generalized linear models (GLMs)

ten link functions • user-defined links • seven distributions • ML and IRLS estimation • nine variance estimators • seven residuals •more

Lasso

lasso • elastic net • model selection • prediction • inference • continuous, binary, and count outcomes • cross-validation • adaptive lasso • double selection • partialing out • cross-fit partialing out • double machine learning • endogenous covariates • more

Documentation

31 manuals • 15,000+ pages • seamless navigation • thousands of worked examples • quick starts • methods and formulas • references • more

Finite mixture models (FMMs)

fmm: prefix for 17 estimators • mixtures of a single estimator • mixtures combining multiple estimators or distributions • continuous, binary, count, ordinal, categorical, censored, truncated, and survival outcomes • more

SEM (structural equation modeling)

graphical path diagram builder • standardized and unstandardized estimates • modification indices • direct and indirect effects • continuous, binary, count, ordinal, and survival outcomes • multilevel models • random slopes and intercepts • factor scores, empirical Bayes, and other predictions • groups and tests of invariance • goodness of fit • handles MAR data by FIML • correlated data • survey data • more

Basic statistics

summaries • cross-tabulations • correlations • z and t tests • equality-of-variance tests • tests of proportions • confidence intervals • factor variables • more

Spatial autoregressive models

spatial lags of dependent variable, independent variables, and autoregressive errors • fixed and random effects in panel data • endogenous covariates • analyze spillover effects • more

Latent class analysis

binary, ordinal, continuous, count, categorical, fractional, and survival items • add covariates to model class membership • combine with SEM path models • expected class proportions • goodness of fit • predictions of class membership • more

Nonparametric methods

nonparametric regression • Wilcoxon–Mann–Whitney, Wilcoxon signed ranks, and Kruskal–Wallis tests • Spearman and Kendall correlations • Kolmogorov–Smirnov tests • exact binomial CIs • survival data • ROC analysis • smoothing • bootstrapping • more

ANOVA/MANOVA

balanced and unbalanced designs • factorial, nested, and mixed designs • repeated measures • marginal means • contrasts • more

Multiple imputation

nine univariate imputation methods • multivariate normal imputation • chained equations • explore pattern of missingness • manage imputed datasets • fit model and pool results • transform parameters • joint tests of parameter estimates • predictions • more

GMM and nonlinear regression

generalized method of moments (GMM) • nonlinear regression • more

Exact statistics

exact logistic and Poisson regression • exact case–control statistics • binomial tests • Fisher’s exact test for r × c tables • more

Survey methods

multistage designs • bootstrap, BRR, jackknife, linearized, and SDR variance estimation • poststratification • raking • calibration • DEFF • predictive margins • means, proportions, ratios, totals • summary tables • almost all estimators supported • more

Simple maximum likelihood

specify likelihood using simple expressions • no programming required • survey data • standard, robust, bootstrap, and jackknife SEs • matrix estimators • more

Epidemiology

standardization of rates • case–control • cohort • matched case–control • Mantel–Haenszel • pharmacokinetics • ROC analysis • ICD-10 • more

Cluster analysis

hierarchical clustering • kmeans and kmedian nonhierarchical clustering • dendrograms • stopping rules • user-extensible analyses • more

Programmable maximum likelihood

user-specified functions • NR, DFP, BFGS, BHHH • OIM, OPG, robust, bootstrap, and jackknife SEs • Wald tests • survey data • numeric or analytic derivatives • more

Survey methods

multistage designs • bootstrap, BRR, jackknife, linearized, and SDR variance estimation • poststratification • raking • calibration • DEFF • predictive margins • means, proportions, ratios, totals • summary tables • almost all estimators supported • more

Simple maximum likelihood

specify likelihood using simple expressions • no programming required • survey data • standard, robust, bootstrap, and jackknife SEs • matrix estimators • more

DSGE models

specify models algebraically • solve models • estimate parameters • identification diagnostics • policy and transition matrices • IRFs • dynamic forecasts • more

IRT (item response theory)

binary (1PL, 2PL, 3PL), ordinal, and categorical response models • item characteristic curves • test characteristic curves • item information functions • test information functions • multiple-group models • differential item functioning (DIF) • more

Other statistical methods

kappa measure of interrater agreement • Cronbach's alpha • stepwise regression • tests of normality • more

Tests, predictions, and effects

Wald tests • LR tests • linear and nonlinear combinations • predictions and generalized predictions • marginal means • least-squares means • adjusted means • marginal and partial effects • forecast models • Hausman tests • more

Multivariate methods

factor analysis • principal components • discriminant analysis • rotation • multidimensional scaling • Procrustean analysis • correspondence analysis • biplots • dendrograms • user-extensible analyses •more

Functions

statistical • random-number • mathematical • string • date and time • regular expressions • Unicode • more

Contrasts, pairwise comparisons, and margins

compare means, intercepts, or slopes • compare with reference category, adjacent category, grand mean, etc. • orthogonal polynomials • multiple-comparison adjustments • graph estimated means and contrasts • interaction plots • more

Internet capabilities

ability to install new commands • web updating • web file sharing • latest Stata news • more

Resampling and simulation methods

bootstrap • jackknife • Monte Carlo simulation • permutation tests • more

Community-contributed commands

search and download thousands of free additions • discover new features in the Stata Journal • share commands by posting to the SSC • discuss community-contributed commands on Statalist • more

Installation Qualification

IQ report for regulatory agencies such as the FDA • installation verification • more

FDA Compliance

Adherence to FDA regulatory requirement for statistical software • more

Accessibility

Section 508 compliance, accessibility for persons with disabilities • more

Linear models

Fit classical linear models of the relationship between a continuous outcome, such as weight, and the determinants of weight, such as height, diet, and levels of exercise.

Under the heading least squares, Stata can fit ordinary regression models, instrumental-variables models, constrained linear regression, nonlinear least squares, and two-stage least-squares models. Stata can also fit quantile regression models, which include median regression or minimization of the absolute sums of the residuals.

Back to top
Time series

Handle all the statistical challenges inherent to time-series data—autocorrelations, common factors, autoregressive conditional heteroskedasticity, unit roots, cointegration, and much more. From graphing and filtering to fitting complex multivariate models, let Stata reveal the structure in your time-series data.

Back to top
Data wrangling/data management

Scrape data from the web, import it from standard formats, or pull it in via ODBC and SQL. Match-merge, link, append, reshape, transpose, sort, filter. Stata handles Unicode, frames (multiple datasets in memory), BLOBs, regular expressions, and more, whether working with hundreds of thousands or even billions of data points.

Import and export to and frome Excel, import SAS files and import SPSS files. Import from Haver Analytics databases. Low-level cell-by-cell access to write and read data from Excel, including graphs, formulas, date formates, currency formates, bold, italics and more. Stata supports up to 1.5TB of RAM, with Stata/SE able to handle 32,767 variables. Stata/MP can accommodate 20 billion or more observations and 120,000 variables. See for details of Stata/MP.

Back to top
Panel / Longitudinal data

Take full advantage of the extra information that panel data provide, while simultaneously handling the peculiarities of panel data. Study the time-invariant features within each panel, the relationships across panels, and how outcomes of interest change over time. Fit linear models or nonlinear models for binary, count, ordinal, censored, or survival outcomes with fixed-effects, random-effects, or population-averaged estimators. Fit dynamic models or models with endogeneity. And much more.

Back to top
Survival analysis

Analyze duration outcomes—outcomes measuring the time to an event such as failure or death—using Stata's specialized tools for survival analysis. Account for the complications inherent in this type of data such as sometimes not observing the event (censoring), individuals entering the study at differing times (delayed entry), and individuals who are not continuously observed throughout the study (gaps). You can estimate and plot the probability of survival over time. Or model survival as a function of covariates using Cox, Weibull, lognormal, and other regression models. Predict hazard ratios, mean survival time, and survival probabilities. Do you have groups of individuals in your study? Adjust for within-group correlation using a random-effects or shared-frailty model.

Back to top
Reporting

With Stata's reporting features, you can easily incorporate Stata results and graphs with formatted text and tables in Word, PDF, HTML, and Excel formats. Take advantage of Stata's integrated versioning to create reproducible reports. Dynamic documents can be updated as your data change.

Back to top
Multilevel mixed-effects models

Whether the groupings in your data arise in a nested fashion (students nested in schools and schools nested in districts) or in a nonnested fashion (regions crossed with occupations), you can fit a multilevel model to account for the lack of independence within these groups. Fit models for continuous, binary, count, ordinal, and survival outcomes. Estimate variances of random intercepts and random coefficients. Compute intraclass correlations. Predict random effects. Estimate relationships that are population averaged over the random effects. And much more.

Back to top
Bayesian

Fit Bayesian regression models using one of the Markov chain Monte Carlo (MCMC) methods. You can choose from a variety of supported models or even program your own. Extensive tools are available to check convergence, including multiple chains. Compute posterior mean estimates and credible intervals for model parameters and functions of model parameters. You can perform both interval- and model-based hypothesis testing. Compare models using Bayes factors. Compute model fit using posterior predictive p-values. Generate predictions. And much more.

Back to top
Graphics

An important feature of Stata is that it has no modes or modules. The graphics commands are always available, so you can fit a regression and graph the residuals without performing computer gymnastics. Stata's graphs are designed not only to look good but to be informative analytic tools.

Bar charts, Box plots, Histograms, Spike plots, Pie charts, Scatterplot matrix, Dot chart, Line charts, Area charts, Two-way scatterplot.

Customise your charts with point accuracy. Merge and combine graphs.

Stata graphs

Back to top

Binary, fractional, count, and limited outcomes

Is your response binary (for example, employed or unemployed), ordinal (education level), count (number of children), or censored (ticket sales in an existing venue)? Stata has maximum likelihood estimators—logistic, probit, ordered probit, multinomial logit, Poisson, tobit, and many others—that estimate the relationship between such outcomes and their determinants. A vast array of tools is available to analyze such models. Predict outcomes and their confidence intervals. Test equality of parameters or any linear or nonlinear combination of parameters. And much more.

Back to top
Meta-analysis

Combine results of multiple studies to estimate an overall effect. Use forest plots to visualize results. Evaluate study heterogeneity with subgroup analysis or meta-regression. Use funnel plots and formal tests to explore publication bias and small-study effects. Assess the impact of publication bias on results with trim-and-fill analysis. Perform cumulative meta-analysis. Use the meta suite of commands, or let the Control Panel interface guide you through your entire meta-analysis.

Back to top
Programming features

Stata provides powerful programming features to extend the scope of Stata. Program in Ado the Stata scripting language, in Mata, Stata's compiled matrix programming language. Integrate with Python, Java plugins called directly from Stata.

Back to top
Choice models

Model your discrete-choice data—say, a choice to travel by bus, train, car, or airplane—with a conditional logit, multinomial probit, or mixed logit model. Is your outcome instead a ranking of preferred travel methods? Fit a rank-ordered probit or rank-ordered logit model. Regardless of the model fit, you can use margins to easily interpret the results. Estimate how much wait times at the airport affect the probability of traveling by air or even by train.

Back to top
Power, precision, and sample size

Before you conduct your experiment, determine the sample size needed to detect meaningful effects without wasting resources. Do you intend to compute confidence intervals (CIs) or perform hypothesis tests? For hypothesis testing, use Stata's power commands or interactive Control Panel to compute power and sample size, create customized tables, and automatically graph the relationships between power, sample size, and effect size for your planned study. For CIs, use Stata's ciwidth commands to do the same but compute precision or CI width instead of effect size and probability of CI width instead of power.

Back to top
Mata—Stata's serious programming language

Mata is a programming language that looks a lot like Java and C, but adds direct support for matrix programming. Mata is a compiled language, which makes it fast. You can use Mata interactively when you want to quickly perform matrix calculations, or you can use Mata when you need to write complex programs. Mata has the structures, pointers, and classes that you expect in your programming language. In fact, Mata is Stata's development language. Most new features of Stata are written in Mata. This includes multilevel modeling, latent class analysis, Bayesian estimation, and even the core algorithms of the graphical SEM Builder. But Mata is not just for Stata developers; you too can take advantage of this powerful programming language.

Back to top
Extended regression models (ERMs)

Extended regression models (ERMs) is our name for a specific class of models that address several complications that arise frequently in data: 1) endogenous covariates, 2) sample selection, 3) nonrandom treatment assignment, and 4) within-panel correlation. These complications can occur alone or in any combination. ERMs allow you to make valid inferences as if these complications did not occur in your data.

Back to top
Treatment effects / Causal inference

Stata's treatment effects allow you to estimate experimental-type causal effects from observational data. Whether you are interested in a continuous, binary, count, fractional, or survival outcome; whether you are modeling the outcome process or treatment process; Stata can estimate your treatment effect. With the most comprehensive set of treatment-effects estimators available in any software package, you will find the one that's right for you.

Back to top

You can access all of Stata’s data management, statistical, and analysis features from the menus and associated dialogs. Select any feature from the Data, Graphics, or Statistics menu and fill in the resulting dialog. All features can be found in the menus, from generating a new variable to match-merges and reshaping datasets, from tabulations and summary statistics to negative binomial regression of a count outcome with survey data.

We displayed the dialog by clicking the Statistics menu, selecting Survey data analysis, selecting Count outcomes, and selecting Negative binomial regression.

Besides making Stata easier to use, the GUI allows you to discover features you never knew existed. Just to make it easier, there is a topical index built into the online help system.

Back to top
Generalized linear models

Access a range of link functions and models: from Gaussian normal distribution to negative binomial and Gamma, along with bernoulli/binomial, inverse Gaussian and Poisson. Use predcictors and calculate residuals.

Back to top
Lasso

With Stata's lasso and elastic net features, you can perform model selection and prediction for your continuous, binary, and count outcomes. Want to estimate effects and test coefficients? With cutting-edge inferential methods, you can make inferences for variables of interest while lassos select control variables for you. You can even account for endogenous covariates.

Back to top
Documentation

Every installation of Stata includes all the documentation in PDF format. Stata’s documentation consists of over 15,000 pages detailing each feature in Stata including the methods and formulas and fully worked examples. You can transition seamlessly across entries using the links within each entry.

Check out the manuals - they are all here.

Back to top
Finite mixture models (FMMs)

Populations are often divided into groups or subpopulations—age groups, income brackets, levels of education. Regression models or distributions likely differ across these groups. But sometimes we don't have a variable that identifies the groups. Perhaps the identifying variable is simply missing. Perhaps it is hard to collect—honest reporting of drug use, sex of goldfish, etc. Perhaps it is inherently unobservable—penchant for risky behavior, high propensity to save money, etc. In such cases, we can use finite mixture models (FMMs) to model the probability of belonging to each unobserved group, to estimate distinct parameters of a regression model or distribution in each group, to classify individuals into the groups, and to draw inferences about how each group behaves.

Back to top
Structural equation modeling (SEM)

Estimate mediation effects, analyze the relationship between an unobserved latent concept such as depression and the observed variables that measure depression, model a system with many endogenous variables and correlated errors, or fit a model with complex relationships among both latent and observed variables. Fit models with continuous, binary, count, ordinal, fractional, and survival outcomes. Even fit multilevel models with groups of correlated observations such as children within the same schools. Evaluate model fit. Compute indirect and total effects. Fit models by drawing a path diagram or using the straightforward command syntax.

Back to top
Basic statistics

Summaries, tables, tabulations, pairwise comparisons, factor variables, t tests, z tests, tests of proportion, effects sizes, correlations and other common test like binomial, Bartlett's test, Chi-squared and variance ratio (f-test)

Back to top
Spatial autoregressive models

Spatial autoregressive (SAR) models are fit using datasets that contain observations on geographical areas or on any units with a spatial representation. Fit linear models with autoregressive errors and spatial lags of the dependent and independent variables. Specify spatial lags using spatial weighting matrices. Create standard weighting matrices, such as inverse distance or nearest neighbor, or create custom matrices. Fit random- and fixed-effects models for spatial panel data. Explore direct and indirect effects of covariates after fitting models.

Back to top
Latent class analysis (LCA)

Discover and understand unobserved groups (latent classes) in your data–whether the groups are consumers with different buying preferences, healthy and unhealthy individuals, or teens with high, medium, and low risk of high school drop out. You can use LCA as a model-based method of classification. Or you can fit SEM path models and test for differences across the unobserved groups. Estimate the proportion of the population in each group, estimate group means, and more.

Back to top
Nonparametric methods

Stata provides a range of nonparametric tests, nonparametric kernel regression and nonparametric series regression. Access quantile regression, centiles, and confidence intervals, resampling and simulations, smoothing and ROC analysis.

Back to top
ANOVA / MANOVA

Fit one- and two-way models. Or fit models with three, four, or even more factors. Analyze data with nested factors, with fixed and random factors, or with repeated measures. Use ANCOVA models when you have continuous covariates and MANOVA models when you have multiple outcome variables. Further explore the relationships between your outcome and predictors by estimating effect sizes and computing least-squares and marginal means. Perform contrasts and pairwise comparisons. Analyze and plot interactions. And much more.

Back to top
Multiple imputation

Account for missing data in your sample using multiple imputation. Choose from univariate and multivariate methods to impute missing values in continuous, censored, truncated, binary, ordinal, categorical, and count variables. Then, in a single step, estimate parameters using the imputed datasets, and combine results. Fit a linear model, logit model, Poisson model, multilevel model, survival model, or one of the many other supported models. Use the mi command, or let the Control Panel interface guide you through your entire MI analysis.

Back to top
Generalised methods of moments and non-linear regression

The generalized method of moments (GMM) is a very flexible estimation framework that has become a workhorse of modern econometric analysis. Unlike maximum likelihood estimation, GMM does not require the user to make strong distributional assumptions, thus providing for more robust estimates. Moreover, GMM is broad-based in that other commonly used estimators like least-squares and maximum likelihood can be viewed as special cases of GMM. GMM is popular in economics not only because of its favorable statistical properties, but also because many theoretical models, such as those involving rational expectations, naturally yield the moment conditions that underlie GMM.

Back to top

Stata's exlogistic fits exact logistic regression models and provides more reliable statistical inference with small-sample datasets. The dependent variable can be Bernoulli (0 or 1) or binomial (the number of successes in n trials). Exact joint hypothesis tests can be performed, and predictions with exact confidence intervals can be obtained.

Stata also supports Exact Piosson regression, postesimation selectors, exact epidemiologic statistics and others.

Back to top
Survey methods

Whether your data require simple weighted adjustment because of differential sampling rates or you have data from a complex multistage survey, Stata's survey features can provide you with correct standard errors and confidence intervals for your inferences. All you need to do is specify the relevant characteristics of your sampling design, including sampling weights (including weights at multiple stages), clustering (at one, two, or more stages), stratification, and poststratificaion. After that, most of Stata's estimation commands can adjust their estimates to correct for your sampling design.

Back to top
Maximum likelihood without programming

Maximization of user-specified likelihood functions has long been a hallmark of Stata, but you have had to write a program to calculate the log-likelihood function. Now it is even easier. The only requirements are that you be able to write the log likelihood for individual observations and that the log likelihood for the entire sample be the sum of the individual values.

Back to top
Epidemiology

Stata provides a range of tools specific to epidemiology from epidemiology tables to ICD-10 and ICD-9 codes. Stata handles Receiver operating characteristic (ROC) analysis, pharmacokinetics, Kappa measure of interrater agreement and Brier score decomposition.

Back to top
Cluster analysis

Stata is known for its cluster analysis with features such as Hierarchical clustering with single linkage, complete linkage, average linkage, Ward's linkage, Weighted average linkage, Centoid linkage and median linkage. It inlcudes a range of similarity and dissimilarity measures for binary data, the Gower measure for mixed binary and continuous data, stopping rules, dendrograms (full trees, subtrees, upper portion of trees, vertical or horizontal orientation and branch counts.

Back to top
Programmable maximum likelihood

In addition to providing built-in commands to fit many standard maximum likelihood models, such as logistic, Cox, Poisson, etc., Stata can maximize user-specified likelihood functions through programs to calculate the log of the likelihood function.

Stata’s likelihood-maximization procedures have been designed for both quick-and-dirty work and writing prepackaged estimation routines that obtain results quickly and robustly. For instance, Stata fits negative binomial regressions (a variation on Poisson regression) and Heckman selection models. We wrote those routines using Stata's ml command, although most users are not aware of that. They think that negative binomial and Heckman selection are just two more things Stata can do.

Back to top
DSGE models

Write your model using a simple syntax. Solve the model at specified parameter values or estimate model parameters by maximum likelihood. Graph impulse responses and compare models.

Back to top
Item response theory (IRT)

Explore the relationship between unobserved latent characteristics such as mathematical aptitude and the probability of correctly answering test questions (items). Or explore the relationship between unobserved health and self-reported responses to questions about mobility, independence, and other health-affected activities. IRT can be used to create measures of such unobserved traits or place individuals on a scale measuring the trait. It can also be used to select the best items for measuring a latent trait. IRT models are available for binary, graded, rated, partial-credit, and nominal response items. Visualize the relationships using item characteristic curves, and measure overall test performance using test information functions. And much more.

Back to top
Other statistical methods

In Stata you can perform Cronbach's alpha, interclass correlations, stepwise regression, Box-Cox transformation, Power transformations, Orthogonal polynomials, tests of normality (Shapiro-Wilk, Shapiro-Francia, Skewness and kurtosis tes, Doornik-Hansen, Henze-Zirkler and Two by Mardia).

Back to top
Tests, predictions, and effects

Perform hypothesis testing, generalised testing, predictions, generalised predictions, marginal analysis, then use the postestimation selectors and access forecast models.

Back to top
Multivariate methods

Use multivariate analyses to evaluate relationships among variables from many different perspectives. Perform multivariate tests of means, or fit multivariate regression and MANOVA models. Explore relationships between two sets of variables, such as aptitude measurements and achievement measurements, using canonical correlation. Examine the number and structure of latent concepts underlying a set of variables using exploratory factor analysis. Or use principal component analysis to find underlying structure or to reduce the number of variables used in a subsequent analysis. Discover groupings of observations in your data using cluster analysis. If you have known groups in your data, describe differences between them using discriminant analysis. And much more.

Back to top
Functions

Statistical functions: Beta and noncentral beta distributions, binomial distribution, Cauchy distribution, Chi-squared distribution, non-central chi-squared distribution, Dunnett;s multiple range, Exponential distribution, F and non-central F distribution, Gamma distribution, hypergeometric distribution, laplace distribution, logistic distribution, negative binomial distribution, normal distribution, log of normal and binormal distributions, Poisson distribution, Student's t and non-central Student's t distribution, Tukey's Studentised range, Weibell distribution, Weibell (proportional hazards) distrbution, Wishart and inverse Wishart distribution.

Date and time functions: complete suite of functions for manipulating dates and times including support for business calendars and leap seconds; convert dates and times from strings, extract seconds, minutes, hours, day, day of week, week of year, month, quarter, half year ir year from the specified date, convert between units and business dates and regular dates.

Time series functions

Random number functions and generators

Mathematical functions and Trigonometric functions

String functions

Matrix functions and low level programming functions

Back to top
Contrasts, pairwise comparisons, and margins

Contrasts, pairwise comparisons, marginal means and marginal effects let you analyze the relationships between your outcome variable and your covariates, even when that outcome is binary, count, ordinal, categorical, or survival. Compute adjusted predictions with covariates set to interesting or representative values. Or compute marginal means for each level of a categorical covariate. Make comparisons of the adjusted predictions or marginal means using contrasts. After fitting almost any model in Stata, analyze the effect of covariate interactions, and easily create plots to visualize those interactions.

Back to top
Internet capabilities

Stata can share datasets over the Internet. Stata can share programs over the Internet. You can use Stata to search the Internet for community contributions to Stata. You can update Stata over the web, obtain help files over the Internet, download datasets and manuals from the web.

Back to top
Resampling and simulation methods

Stata can be used for bootstrao sampling and estimation, jackknife estimation, permutation tests and Monte Carlo simulation

Back to top
Community-contributed commands

The Stata community is represented by a diverse group of researchers from a broad spectrum of fields, from anthropology to biostatistics, economics, finance, political science, psychology, public health, sociology, survey research, and zoology. Stata’s programming language lets users write commands that behave just like official Stata commands, and many users make their commands available to others through channels such as the Stata Journal, the SSC archive, or their own website. Stata’s search, net search, and ssc commands make finding and installing those commands a snap. So even if you don’t see something listed on our Features page, another user may have already written and made available a command to solve your problem.

Back to top
Installation Qualification

Installation Qualification (IQ) is provided by a tool you can download for free. The Installation Qualification Tool produces a report suitable for submission to regulatory agencies, such as the FDA, verifying that Stata has been installed properly. The report can be printed or saved as a PDF.

Back to top
Regulatory compliance

The U.S. Food and Drug Administration (FDA) accepts new drug applications and medical device trials performed using Stata. Although the FDA does not require use of any specific software for statistical analyses, they emphasize in their Statistical Software Clarifying Statement that:

"The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available.”

Stata satisfies these requirements and is one of the most respected and validated statistical tools for analyzing clinical data from pre-clinical through phase IV trials. At each step of analysis, you can rely on Stata’s extensive suite of statistics, data management, and graphics tools to provide accurate and reproducible results. We take reproducibility seriously. Stata is the only statistical package with integrated versioning. If you wrote a script to perform an analysis in 1985, that same script will still run and still produce the same results today. The same is true for any dataset you create.

Back to top
Accessible

All of Stata’s features are accessible to persons with disabilities through Stata’s command line interface and plain text output options. In addition, Stata is interoperable with common assistive technologies for vision-impaired users.

Back to top
Stata

Better insight starts with Stata®

Stata statistical software provides everything you need for data science and inference:

data manipulation, exploration, visualization, statistics, reporting, and reproducibility.