Stata Fundamental and structure

Stata Fundamental and structure

A statistical software programme, Stata is a comprehensive tool that can do the following: manage data, analyze data, and create visually appealing graphs. It lets its users manage and store data (both large and small data sets), analyze data, and create graphs. Because Stata is sophisticated software that allows you to do practically anything with your data, Stata is widely used among health researchers, particularly those working with very large data sets. It’s important to note that Stata isn’t the only statistical programme available; if you choose a career that requires you to work with data, you’ll likely come across a variety of others, like SPSS, SAS, R, MedCalc, Jamovi.

It can be approached from two different perspectives. The program can be used interactively: open Stata, load your data and start typing commands. This is a fantastic approach to look at your data, figure out what you want to do with it, and make sure your programmes are working correctly. It’s also excellent for learning new skills because you get fast feedback. Interactive work, on the other hand, cannot be easily or reliably duplicated, nor can it be changed if you change your mind. It’s also tough to repair mistakes because lacks a “undo” command.

To get the most out of Introduction to Stata, you must participate actively. To run the example code, open and type it in. Then, you’ll be able to recall and remember more details, and Stata will always correct you if you’re wrong.


Between 1985 and 2021, there were 17 main releases, with additional code and documentation changes in between.  Extra sets of Stata programmes were sometimes marketed as “kits” or supplied as Support Disks in the early days of the programme. Users began receiving updates via the web with the introduction of Stata 6 in 1999.

In 36-years of history, hundreds of commands have been added. Extensibility, platform freedom, and the active user community have all proven to be extremely significant and continue to affect the user experience today.


Stata was first released for the DOS operating system. Stata has since been released for Unix variations (including Linux), Windows, and Macintosh operating systems. Platform independence applies to all Stata files, including do-files and saved datasets.

User community

A strong user community has spearheaded a number of significant innovations.  The Stata Technical Bulletin was first published in 1991 and is published six times a year, allowing community-contributed commands to be shared. In 2001, it was reintroduced as the peer-reviewed Stata Journal, a quarterly journal that contained descriptions of community-contributed commands and suggestions for effective Stata use.

Structure of a Stata Data Set

Observations and Variables

Each observation has one row, and each variable has one column in a set. You might wonder, “What does an observation mean in this dataset?”. An observation simply refers to what you are observing or on what are you collecting the data. If you are collecting data on humans, then the people become your observations. If you are collecting data on vehicles, then vehicles become your observation.

Variable Types

Stata works perfectly with various variable types: interval-ratio, ordinal and nominal. Remember though that you as a researcher should be sure about your variable type as Stata doesn’t automatically classify variables. Moreover, inferential statistics depend significantly upon carriable types.

Value Labels

Value labels just simply labelling your variables. If you have a variable, gender, coded as 1 and 0. Then you can label the gender variable in Stata as 1 as “males” and 0 as “females”.

Missing Values

Missing values is a very common phenomenon in data analysis because usually respondents don’t answer every question asked to them. This could be because the question is not applicable or they don’t remember or they prefer not to say.

A Stata data set is a rectangular matrix and therefore each observation must contain data for each variable. If one of your participants hasn’t answered a particular answer, Stata will put it as a “ . “, which implies missing. Missing values in Stata can be recoded as something like “999” or there are other methods to replace missing values in Stata like taking average or regression analysis. We will cover in detail the missing value analysis in Stata in a separate blog.

The fundamentals of STATA are as follows:

Data Files in the Stata Format (.dta): Data files in the Stata Format (.dta) have the.dta extension.

1. Importing an Excel or Text Data File into Stata: Click File, then Import, then Excel spreadsheet to import an Excel file (for example, “Example Dataset.xlsx”). There will be a new window open. Select Browse and navigate to the folder containing the data file you wish to utilize, then select Open. In the “Import Excel” window, you’ll see a preview of the data file.

2. Saving a Dataset in Stata Format: It is recommended practice to save a modified dataset as a new data file rather than overwriting the original file whenever you make changes to an original dataset (for example, by recoding variables or adding new ones). That way, if the updated file has problems, you can always start over with the original dataset.

3. Recoding and Labelling Variables: In a variety of situations, recoding categorical or quantitative variables might be beneficial. For example, you may want to utilize fewer, more aggregated categories than those used in data collection, reorder the categories of a variable for any reason, or recode a quantitative variable as a categorical variable.

4. Creating a “Do” File in Stata: Stata commands are listed and executed in a do-file. It’s a quick and easy way to avoid typing commands into the Stata command box. You may simply replicate your results, re-run your analysis with revisions and elaborations, or rerun it after fixing errors by putting commands for a given study in a do-file. A do file is a distinct file with the suffix “.do.”

Leave a Reply