A SAS program is a collection of SAS statements that may include keywords, various names (e.g., data sets, and variables), special characters, and operators. A SAS statement may be used in a DATA step, PROC (procedure) steps, or anywhere in a SAS program.
A SAS program consists of DATA steps and PROC (procedure) steps.
DATA steps handle data sets, while PROC steps actually conduct analyses.
A DATA step is used to create or modify data sets by creating and modifying variables; checking and correcting errors in data sets; and writing programs (for simulations).
SAS has following basic rules.
A statement begins and ends at any place.
A statement ends with semi-colon (;). A line can have more than one statements.
SAS is not case-sensitive.
Operators (+, -, *, and /) do not work with missing values, while functions ignore missing.
A comment is enclosed by /* and */
SAS statements used in a DATA step are either executable (e.g., DO, INPUT, INFILE, OUTPUT) or declarative (e.g., ARRAY, DATALINES, DROP, RETAIN).
SAS has arithmetic, relational, logical, and concatenation (||) operators. But SAS does not have the modulus operator; the MOD function is used instead.
SAS has various functions for mathematics, statistics, string, date/time, probability, and randomization.
The OPTIONS statement changes the value of SAS system options that affect SAS system initialization, hardware and software interfacing, and the input, processing, and output of jobs and SAS files. See Chapter 8 of SAS Language Reference: Dictionary (1451-1647).
Let us consider a typical DATA Step example that reads an ASCII text file "tiger.dat".
INPUT id name $ math stat;
The DATA statement specifies a data set "student" in which outputs of the DATA step are stored. The "student" is a temporary file stored in the memory (RAM) and thus will be removed after the SAS program terminates. If you want to save a data set into a permanent file, use a SAS data library (LIBNAME).
INFILE 'A:\tiger.dat'; retrieves data from the 'tiger.dat' stored in the A: drive. The data file should be in the ASCII text format.
The INPUT statement specifies the arrangement of data to be read. The example reads a variable "id" as numeric, "name" as string, and numeric "math" and "stat." See the INPUT statement for details.
RUN; gives a cue to execute DATA steps and/or PROC steps.
The following example reads three varaibles directly from the data stream in the DATA step.
What is a data set in SAS? A SAS data set is a group of data values that SAS creates and processes. It contains a table of observations (rows) and variables (columns) as well as descriptor information (e.g., variable names and formats). A SAS data set is often referred to a SAS data file. A SAS data view is a virtual data set of descriptor information that points to data from other sources.
SAS has a powerful feature of data manipulation that can handle various data sources such as ASCII text, database, and spreadsheet. You may type in data and directly read them using the DATALINES (or CARDS) statement.
SAS can read ASCII text files delimited with space, comma (CSV), tab, and other characters using the INPUT/INFILE/DATALINES statements in a DATA step. The INFILE statement also reads remote data files through the SAS/ACCESS using the TCP/IP, FTP, and URL protocols.
The IMPORT procedure can read these ASCII text files, but it can also import database (dBASE III, FoxPro, Access) and spreadsheet (Excel and Lotus 1-2-3) files. SAS/SQL (PROC SQL) allows you to connect those database and spreadsheet files through the ODBC (Open Database Connection).
SAS data sets may be generated by PROC steps. For example, the MEANS procedure can produce a data set with aggregate statistics and matrices may be transformed into data sets in SAS/IML. The following PROC REG saves the residuals and predicted values to "pew_work" that includes original variables in jeeshim.pew2004 as well.
PROC REG DATA=jeeshim.pew2004;
MODEL engagement = interest knowledge income egov /R P;
OUTPUT OUT=pew_work R=residual P=predict;
Finally, you can generate data using functions, in particular, random number generators in a DATA step.
DO i=1 to 10 BY 1;
OUTPUT dgp; END;
A SAS Data library is an alias of the collection of data sets, thereby making data management more convenient and efficient. Like a directory or folder, a library tell SAS the place where data sets exist. Unlike a directory or folder, a library is not physical but logical in a sense that library itself does not exist in any secondary memory unit.
Every data set should be referred using a library in SAS, although the default library, .WORK, is often omitted.
If you want to retrieve a data set by a point-and-click method, use the SAS Enterprise Guide.
The LIBNAME statement associates a SAS data library with a library reference (specific directory or folder). It declares which directory is to be referred to the library specified. Libraries should be declared before DATA steps and PROC steps.
If you want to use the default WORK library, you do not need to declare any library. However, you should know that data sets in the WORK library remain in the RAM (primary memory units), not in the secondary memory units (e.g., hard disks or memory sticks). If you want to store data sets into physical files, you must use your libraries.
The following LIBNAME statement declares a library "jeeshim" that is associated with c:\temp. A specific SAS data file is referred using a library name and a file name divided by a period. The "jeeshim.nes2004" indicates the "nes2004.sas7bdat" in the "jeeshim" library (c:\temp).
LIBNAME jeeshim 'c:\temp\';
PROC REG DATA=jeeshim.nes2004;
How do you know which data sets are included in a library? Use the CONTENTS or DATASETS procedures with a system variable _ALL_.
DATASETS can also manipulate (e.g., copy and delete) datasets.
PROC CONTENTS DATA= jeeshim._ALL_;
PROC DATASETS LIBRARY=jeeshim DETAILS;
If you need to use specific libraries frequently, declare them in the autoexec.sas, an ASCII text file in the SAS root directory. SAS automatically executes statements in the file immediately after SAS is launched. Consider the following example.
OPTIONS PAGESIZE=55 LINESIZE=80 NOCENTER;
LIBNAME jeeshim 'c:\data\sas';
FILENAME nes 'c:\data\sas\nes2004.txt';
The FILENAME statement specifies a file name that refers to a physical file in a secondary memory unit.
You may specify SAS engine name like EXCEL. If you want to deassign a library, add CLEAR without a library reference.
The DATA statement begins a DATA step and provides data set names. The output of a DATA step is stored into the data set specified.
LIBNAME jeeshim 'c:\temp\';
A SAS DATA step can creates more than one data set. The following example creates two data sets WORK.egov1 and WORK.egov2 from jeeshim.egov. The gov1 and gov2 in the WORK library are identical except that the egov1 does not include variables state and msa, and has a variable id whose name is changed from respid.
egov2 (DROP=state msa RENAME=(respid=id));
If a data set name is omitted, the computer will automatically name each successive data set as WORK.data1, WORK.data2, WORK.data3, and so on. These data sets, however, may consume computing resources and slow down the access and response speed.
If you want to use a DATA step only for transactions, you may use the _NULL_ in the DATA statement to enhance memory management efficiency. The _NULL_ tells SAS not to create any data set when it execute the DATA step.
How to select and delete some observations in a data set? The IF... THEN statement can do that for you..
The following example retrieves observations from a data set jeeshim.pew2004; selects only male observations (male=1) and discards female observations; and stores the result into a data set WORK.pew_work. The IF statement may add the KEEP statement to get the identical result (IF male EQ 1 THEN KEEP;).
IF male = 1;
You may use the DELETE statement that works in the reverse way. This statement removes observations that meet the conditions provided.
IF male = 0 THEN DELETE;
The REMOVE statement following the MODIFY statement in a DATA step also delete observations.
IF male EQ 0 THEN REMOVE;
You may also select observations by specifying a range of record numbers. Use the _N_, a SAS system variable, that contains the record numbers of observations.
DATA pew_user pew_nonuser;
IF _N_ <= 500 THEN OUTPUT pew_user;
ELSE OUTPUT pew_nonuser;
The first 500 observations are saved into WORK.pew_user, while remaining observations are put into WORK.pew_nonuser.
You may try the WHERE statement, which selects observations from an existing data set without physically removing observations that do not meet a condition.
WHERE male EQ 0;
The above data step reads only female (male=0) from jeeshim.pew2004 and then stores them into pew_female.
Note that SAS checks if observations meet the condition when executing SET, MERGE, MODIFY, and UPDATE statements.
In a data step, WHERE cannot can used together with INFILE and DATALINES. In a procedure step, this statement limits observations used in analysis.
PROC REG DATA=pew_female;
MODEL money = income;
WHERE it_use AND age >= 20;
"WHERE it_use" means selecting observations whose values of it_use is not missing nor zero.