Fit classical linear models of the relationship between a continuous outcome, such as weight, and the determinants of weight, such as height, diet, and levels of exercise.
Under the heading least squares, Stata can fit ordinary regression models, instrumental-variables models, constrained linear regression, nonlinear least squares, and two-stage least-squares models. Stata can also fit quantile regression models, which include median regression or minimization of the absolute sums of the residuals.Back to top
Handle all the statistical challenges inherent to time-series data—autocorrelations, common factors, autoregressive conditional heteroskedasticity, unit roots, cointegration, and much more. From graphing and filtering to fitting complex multivariate models, let Stata reveal the structure in your time-series data.Back to top
Scrape data from the web, import it from standard formats, or pull it in via ODBC and SQL. Match-merge, link, append, reshape, transpose, sort, filter. Stata handles Unicode, frames (multiple datasets in memory), BLOBs, regular expressions, and more, whether working with hundreds of thousands or even billions of data points.
Import and export to and frome Excel, import SAS files and import SPSS files. Import from Haver Analytics databases. Low-level cell-by-cell access to write and read data from Excel, including graphs, formulas, date formates, currency formates, bold, italics and more. Stata supports up to 1.5TB of RAM, with Stata/SE able to handle 32,767 variables. Stata/MP can accommodate 20 billion or more observations and 120,000 variables. See for details of Stata/MP.Back to top
Take full advantage of the extra information that panel data provide, while simultaneously handling the peculiarities of panel data. Study the time-invariant features within each panel, the relationships across panels, and how outcomes of interest change over time. Fit linear models or nonlinear models for binary, count, ordinal, censored, or survival outcomes with fixed-effects, random-effects, or population-averaged estimators. Fit dynamic models or models with endogeneity. And much more.Back to top
Analyze duration outcomes—outcomes measuring the time to an event such as failure or death—using Stata's specialized tools for survival analysis. Account for the complications inherent in this type of data such as sometimes not observing the event (censoring), individuals entering the study at differing times (delayed entry), and individuals who are not continuously observed throughout the study (gaps). You can estimate and plot the probability of survival over time. Or model survival as a function of covariates using Cox, Weibull, lognormal, and other regression models. Predict hazard ratios, mean survival time, and survival probabilities. Do you have groups of individuals in your study? Adjust for within-group correlation using a random-effects or shared-frailty model.Back to top
With Stata's reporting features, you can easily incorporate Stata results and graphs with formatted text and tables in Word, PDF, HTML, and Excel formats. Take advantage of Stata's integrated versioning to create reproducible reports. Dynamic documents can be updated as your data change.Back to top
Whether the groupings in your data arise in a nested fashion (students nested in schools and schools nested in districts) or in a nonnested fashion (regions crossed with occupations), you can fit a multilevel model to account for the lack of independence within these groups. Fit models for continuous, binary, count, ordinal, and survival outcomes. Estimate variances of random intercepts and random coefficients. Compute intraclass correlations. Predict random effects. Estimate relationships that are population averaged over the random effects. And much more.Back to top
Fit Bayesian regression models using one of the Markov chain Monte Carlo (MCMC) methods. You can choose from a variety of supported models or even program your own. Extensive tools are available to check convergence, including multiple chains. Compute posterior mean estimates and credible intervals for model parameters and functions of model parameters. You can perform both interval- and model-based hypothesis testing. Compare models using Bayes factors. Compute model fit using posterior predictive p-values. Generate predictions. And much more.Back to top
An important feature of Stata is that it has no modes or modules. The graphics commands are always available, so you can fit a regression and graph the residuals without performing computer gymnastics. Stata's graphs are designed not only to look good but to be informative analytic tools.
Bar charts, Box plots, Histograms, Spike plots, Pie charts, Scatterplot matrix, Dot chart, Line charts, Area charts, Two-way scatterplot.
Customise your charts with point accuracy. Merge and combine graphs.
Is your response binary (for example, employed or unemployed), ordinal (education level), count (number of children), or censored (ticket sales in an existing venue)? Stata has maximum likelihood estimators—logistic, probit, ordered probit, multinomial logit, Poisson, tobit, and many others—that estimate the relationship between such outcomes and their determinants. A vast array of tools is available to analyze such models. Predict outcomes and their confidence intervals. Test equality of parameters or any linear or nonlinear combination of parameters. And much more.Back to top
Combine results of multiple studies to estimate an overall effect. Use forest plots to visualize results. Evaluate study heterogeneity with subgroup analysis or meta-regression. Use funnel plots and formal tests to explore publication bias and small-study effects. Assess the impact of publication bias on results with trim-and-fill analysis. Perform cumulative meta-analysis. Use the meta suite of commands, or let the Control Panel interface guide you through your entire meta-analysis.Back to top
Stata provides powerful programming features to extend the scope of Stata. Program in Ado the Stata scripting language, in Mata, Stata's compiled matrix programming language. Integrate with Python, Java plugins called directly from Stata.Back to top
Model your discrete-choice data—say, a choice to travel by bus, train, car, or airplane—with a conditional logit, multinomial probit, or mixed logit model. Is your outcome instead a ranking of preferred travel methods? Fit a rank-ordered probit or rank-ordered logit model. Regardless of the model fit, you can use margins to easily interpret the results. Estimate how much wait times at the airport affect the probability of traveling by air or even by train.Back to top
Before you conduct your experiment, determine the sample size needed to detect meaningful effects without wasting resources. Do you intend to compute confidence intervals (CIs) or perform hypothesis tests? For hypothesis testing, use Stata's power commands or interactive Control Panel to compute power and sample size, create customized tables, and automatically graph the relationships between power, sample size, and effect size for your planned study. For CIs, use Stata's ciwidth commands to do the same but compute precision or CI width instead of effect size and probability of CI width instead of power.Back to top
Mata is a programming language that looks a lot like Java and C, but adds direct support for matrix programming. Mata is a compiled language, which makes it fast. You can use Mata interactively when you want to quickly perform matrix calculations, or you can use Mata when you need to write complex programs. Mata has the structures, pointers, and classes that you expect in your programming language. In fact, Mata is Stata's development language. Most new features of Stata are written in Mata. This includes multilevel modeling, latent class analysis, Bayesian estimation, and even the core algorithms of the graphical SEM Builder. But Mata is not just for Stata developers; you too can take advantage of this powerful programming language.Back to top
Extended regression models (ERMs) is our name for a specific class of models that address several complications that arise frequently in data: 1) endogenous covariates, 2) sample selection, 3) nonrandom treatment assignment, and 4) within-panel correlation. These complications can occur alone or in any combination. ERMs allow you to make valid inferences as if these complications did not occur in your data.Back to top
Stata's treatment effects allow you to estimate experimental-type causal effects from observational data. Whether you are interested in a continuous, binary, count, fractional, or survival outcome; whether you are modeling the outcome process or treatment process; Stata can estimate your treatment effect. With the most comprehensive set of treatment-effects estimators available in any software package, you will find the one that's right for you.Back to top
You can access all of Stata’s data management, statistical, and analysis features from the menus and associated dialogs. Select any feature from the Data, Graphics, or Statistics menu and fill in the resulting dialog. All features can be found in the menus, from generating a new variable to match-merges and reshaping datasets, from tabulations and summary statistics to negative binomial regression of a count outcome with survey data.
We displayed the dialog by clicking the Statistics menu, selecting Survey data analysis, selecting Count outcomes, and selecting Negative binomial regression.
Besides making Stata easier to use, the GUI allows you to discover features you never knew existed. Just to make it easier, there is a topical index built into the online help system.Back to top
Access a range of link functions and models: from Gaussian normal distribution to negative binomial and Gamma, along with bernoulli/binomial, inverse Gaussian and Poisson. Use predcictors and calculate residuals.Back to top
With Stata's lasso and elastic net features, you can perform model selection and prediction for your continuous, binary, and count outcomes. Want to estimate effects and test coefficients? With cutting-edge inferential methods, you can make inferences for variables of interest while lassos select control variables for you. You can even account for endogenous covariates.Back to top
Every installation of Stata includes all the documentation in PDF format. Stata’s documentation consists of over 15,000 pages detailing each feature in Stata including the methods and formulas and fully worked examples. You can transition seamlessly across entries using the links within each entry.
Check out the manuals - they are all here.Back to top
Populations are often divided into groups or subpopulations—age groups, income brackets, levels of education. Regression models or distributions likely differ across these groups. But sometimes we don't have a variable that identifies the groups. Perhaps the identifying variable is simply missing. Perhaps it is hard to collect—honest reporting of drug use, sex of goldfish, etc. Perhaps it is inherently unobservable—penchant for risky behavior, high propensity to save money, etc. In such cases, we can use finite mixture models (FMMs) to model the probability of belonging to each unobserved group, to estimate distinct parameters of a regression model or distribution in each group, to classify individuals into the groups, and to draw inferences about how each group behaves.Back to top
Estimate mediation effects, analyze the relationship between an unobserved latent concept such as depression and the observed variables that measure depression, model a system with many endogenous variables and correlated errors, or fit a model with complex relationships among both latent and observed variables. Fit models with continuous, binary, count, ordinal, fractional, and survival outcomes. Even fit multilevel models with groups of correlated observations such as children within the same schools. Evaluate model fit. Compute indirect and total effects. Fit models by drawing a path diagram or using the straightforward command syntax.Back to top
Summaries, tables, tabulations, pairwise comparisons, factor variables, t tests, z tests, tests of proportion, effects sizes, correlations and other common test like binomial, Bartlett's test, Chi-squared and variance ratio (f-test)Back to top
Spatial autoregressive (SAR) models are fit using datasets that contain observations on geographical areas or on any units with a spatial representation. Fit linear models with autoregressive errors and spatial lags of the dependent and independent variables. Specify spatial lags using spatial weighting matrices. Create standard weighting matrices, such as inverse distance or nearest neighbor, or create custom matrices. Fit random- and fixed-effects models for spatial panel data. Explore direct and indirect effects of covariates after fitting models.Back to top
Discover and understand unobserved groups (latent classes) in your data–whether the groups are consumers with different buying preferences, healthy and unhealthy individuals, or teens with high, medium, and low risk of high school drop out. You can use LCA as a model-based method of classification. Or you can fit SEM path models and test for differences across the unobserved groups. Estimate the proportion of the population in each group, estimate group means, and more.Back to top
Stata provides a range of nonparametric tests, nonparametric kernel regression and nonparametric series regression. Access quantile regression, centiles, and confidence intervals, resampling and simulations, smoothing and ROC analysis.Back to top
Fit one- and two-way models. Or fit models with three, four, or even more factors. Analyze data with nested factors, with fixed and random factors, or with repeated measures. Use ANCOVA models when you have continuous covariates and MANOVA models when you have multiple outcome variables. Further explore the relationships between your outcome and predictors by estimating effect sizes and computing least-squares and marginal means. Perform contrasts and pairwise comparisons. Analyze and plot interactions. And much more.Back to top
Account for missing data in your sample using multiple imputation. Choose from univariate and multivariate methods to impute missing values in continuous, censored, truncated, binary, ordinal, categorical, and count variables. Then, in a single step, estimate parameters using the imputed datasets, and combine results. Fit a linear model, logit model, Poisson model, multilevel model, survival model, or one of the many other supported models. Use the mi command, or let the Control Panel interface guide you through your entire MI analysis.Back to top
The generalized method of moments (GMM) is a very flexible estimation framework that has become a workhorse of modern econometric analysis. Unlike maximum likelihood estimation, GMM does not require the user to make strong distributional assumptions, thus providing for more robust estimates. Moreover, GMM is broad-based in that other commonly used estimators like least-squares and maximum likelihood can be viewed as special cases of GMM. GMM is popular in economics not only because of its favorable statistical properties, but also because many theoretical models, such as those involving rational expectations, naturally yield the moment conditions that underlie GMM.Back to top
Stata's exlogistic fits exact logistic regression models and provides more reliable statistical inference with small-sample datasets. The dependent variable can be Bernoulli (0 or 1) or binomial (the number of successes in n trials). Exact joint hypothesis tests can be performed, and predictions with exact confidence intervals can be obtained.
Stata also supports Exact Piosson regression, postesimation selectors, exact epidemiologic statistics and others.Back to top
Whether your data require simple weighted adjustment because of differential sampling rates or you have data from a complex multistage survey, Stata's survey features can provide you with correct standard errors and confidence intervals for your inferences. All you need to do is specify the relevant characteristics of your sampling design, including sampling weights (including weights at multiple stages), clustering (at one, two, or more stages), stratification, and poststratificaion. After that, most of Stata's estimation commands can adjust their estimates to correct for your sampling design.Back to top
Maximization of user-specified likelihood functions has long been a hallmark of Stata, but you have had to write a program to calculate the log-likelihood function. Now it is even easier. The only requirements are that you be able to write the log likelihood for individual observations and that the log likelihood for the entire sample be the sum of the individual values.Back to top
Stata provides a range of tools specific to epidemiology from epidemiology tables to ICD-10 and ICD-9 codes. Stata handles Receiver operating characteristic (ROC) analysis, pharmacokinetics, Kappa measure of interrater agreement and Brier score decomposition.Back to top
Stata is known for its cluster analysis with features such as Hierarchical clustering with single linkage, complete linkage, average linkage, Ward's linkage, Weighted average linkage, Centoid linkage and median linkage. It inlcudes a range of similarity and dissimilarity measures for binary data, the Gower measure for mixed binary and continuous data, stopping rules, dendrograms (full trees, subtrees, upper portion of trees, vertical or horizontal orientation and branch counts.Back to top
In addition to providing built-in commands to fit many standard maximum likelihood models, such as logistic, Cox, Poisson, etc., Stata can maximize user-specified likelihood functions through programs to calculate the log of the likelihood function.
Stata’s likelihood-maximization procedures have been designed for both quick-and-dirty work and writing prepackaged estimation routines that obtain results quickly and robustly. For instance, Stata fits negative binomial regressions (a variation on Poisson regression) and Heckman selection models. We wrote those routines using Stata's ml command, although most users are not aware of that. They think that negative binomial and Heckman selection are just two more things Stata can do.Back to top
Write your model using a simple syntax. Solve the model at specified parameter values or estimate model parameters by maximum likelihood. Graph impulse responses and compare models.Back to top
Explore the relationship between unobserved latent characteristics such as mathematical aptitude and the probability of correctly answering test questions (items). Or explore the relationship between unobserved health and self-reported responses to questions about mobility, independence, and other health-affected activities. IRT can be used to create measures of such unobserved traits or place individuals on a scale measuring the trait. It can also be used to select the best items for measuring a latent trait. IRT models are available for binary, graded, rated, partial-credit, and nominal response items. Visualize the relationships using item characteristic curves, and measure overall test performance using test information functions. And much more.Back to top
In Stata you can perform Cronbach's alpha, interclass correlations, stepwise regression, Box-Cox transformation, Power transformations, Orthogonal polynomials, tests of normality (Shapiro-Wilk, Shapiro-Francia, Skewness and kurtosis tes, Doornik-Hansen, Henze-Zirkler and Two by Mardia).Back to top
Perform hypothesis testing, generalised testing, predictions, generalised predictions, marginal analysis, then use the postestimation selectors and access forecast models.Back to top
Use multivariate analyses to evaluate relationships among variables from many different perspectives. Perform multivariate tests of means, or fit multivariate regression and MANOVA models. Explore relationships between two sets of variables, such as aptitude measurements and achievement measurements, using canonical correlation. Examine the number and structure of latent concepts underlying a set of variables using exploratory factor analysis. Or use principal component analysis to find underlying structure or to reduce the number of variables used in a subsequent analysis. Discover groupings of observations in your data using cluster analysis. If you have known groups in your data, describe differences between them using discriminant analysis. And much more.Back to top
Statistical functions: Beta and noncentral beta distributions, binomial distribution, Cauchy distribution, Chi-squared distribution, non-central chi-squared distribution, Dunnett;s multiple range, Exponential distribution, F and non-central F distribution, Gamma distribution, hypergeometric distribution, laplace distribution, logistic distribution, negative binomial distribution, normal distribution, log of normal and binormal distributions, Poisson distribution, Student's t and non-central Student's t distribution, Tukey's Studentised range, Weibell distribution, Weibell (proportional hazards) distrbution, Wishart and inverse Wishart distribution.
Date and time functions: complete suite of functions for manipulating dates and times including support for business calendars and leap seconds; convert dates and times from strings, extract seconds, minutes, hours, day, day of week, week of year, month, quarter, half year ir year from the specified date, convert between units and business dates and regular dates.
Time series functions
Random number functions and generators
Mathematical functions and Trigonometric functions
Matrix functions and low level programming functionsBack to top
Contrasts, pairwise comparisons, marginal means and marginal effects let you analyze the relationships between your outcome variable and your covariates, even when that outcome is binary, count, ordinal, categorical, or survival. Compute adjusted predictions with covariates set to interesting or representative values. Or compute marginal means for each level of a categorical covariate. Make comparisons of the adjusted predictions or marginal means using contrasts. After fitting almost any model in Stata, analyze the effect of covariate interactions, and easily create plots to visualize those interactions.Back to top
Stata can share datasets over the Internet. Stata can share programs over the Internet. You can use Stata to search the Internet for community contributions to Stata. You can update Stata over the web, obtain help files over the Internet, download datasets and manuals from the web.Back to top
Stata can be used for bootstrao sampling and estimation, jackknife estimation, permutation tests and Monte Carlo simulationBack to top
The Stata community is represented by a diverse group of researchers from a broad spectrum of fields, from anthropology to biostatistics, economics, finance, political science, psychology, public health, sociology, survey research, and zoology. Stata’s programming language lets users write commands that behave just like official Stata commands, and many users make their commands available to others through channels such as the Stata Journal, the SSC archive, or their own website. Stata’s search, net search, and ssc commands make finding and installing those commands a snap. So even if you don’t see something listed on our Features page, another user may have already written and made available a command to solve your problem.Back to top
Installation Qualification (IQ) is provided by a tool you can download for free. The Installation Qualification Tool produces a report suitable for submission to regulatory agencies, such as the FDA, verifying that Stata has been installed properly. The report can be printed or saved as a PDF.Back to top
The U.S. Food and Drug Administration (FDA) accepts new drug applications and medical device trials performed using Stata. Although the FDA does not require use of any specific software for statistical analyses, they emphasize in their Statistical Software Clarifying Statement that:
"The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available.”
Stata satisfies these requirements and is one of the most respected and validated statistical tools for analyzing clinical data from pre-clinical through phase IV trials. At each step of analysis, you can rely on Stata’s extensive suite of statistics, data management, and graphics tools to provide accurate and reproducible results. We take reproducibility seriously. Stata is the only statistical package with integrated versioning. If you wrote a script to perform an analysis in 1985, that same script will still run and still produce the same results today. The same is true for any dataset you create.Back to top
All of Stata’s features are accessible to persons with disabilities through Stata’s command line interface and plain text output options. In addition, Stata is interoperable with common assistive technologies for vision-impaired users.Back to top