In UNIX, PROC IMPORT and PROC EXPORT must be executed under the X window. Otherwise, you have to add "noterminal" to a command (e.g., "sas -noterminal file_name.sas").
In SAS, there are various data sources as shown in the following figure. In general, SAS reads data using the INFILE statement and PROC IMPORT.
You may use the PUT statement in a Data Step or PROC EXPORT to export data sets into external files.
INFILE should be used in a DATA step, while PROC IMPORT and PROC EXPORT are independent procedures.
If you have simple data, read them using INFILE and DATALINES; otherwise, use PROC IMPORT. Personally, I prefer the IMPORT Wizard to PROC IMPORT due to its user-friendly interface and flexibility. If you have a data set generated in other software packages (e.g., Excel, dBASE III, Paradox, Stata, and SPSS), use a data conversion utility like Stat/Transfer.
The INFILE statement identifies external files to be read in the INPUT statement.
It reads an ASCII text file that is typically delimited by a space (default), Tab, comma, or other delimiters.
Let us take the following example first.
INFILE egov LRECL=250 FIRSTOBS=7 OBS=700;
INPUT name $ male training corrupt negative;
The LIBNAME statement designate a library, an alias of the collection of data sets, to the specified directory (c:\sas).
The FILENAME statement associates a file reference with a external file (drive+path+filename). Without the statement, you should explicitly specify the drive, paths, and file name in the INFILE statement as "INFILE 'c:\sas\egov.txt';
The "DATA js.seoul" creates a SAS data set "seoul" in the "js" library. The result of this DATA step is stored into the "seoul.sas7bdat" in the c:\sas\.
The INFILE statement specifies the file to be read. You may add options, such as LRECL=n FIRSTOBS=n OBS=n, if necessary.
The "LRECL=250", the logical record length, tells SAS to read a line up to 250 columns.
The "FIRSTOBS=7" tells SAS to read data from the 7th line; The first 6 lines are ignored.
The "OBS=700" tells SAS to read first 700 observations. Omitting this option reads all observations in a data file.
The INPUT statement specifies a list of variable name, type, and length depending on input styles.
DSD (delimiter-sensitive data) tells SAS to treats two consecutive delimiters as a missing value and removes quotation marks from character values.
FIRSTOBS specifies a record number that SAS begin reading input data records.
FLOWOVER (default) reads the next data line if there is no enough data in the current data line for all variables specified. This method may be problematic when there is missing values.
LRECL (logical record length) specifies the logical record length. This option is not valid when the DATALINES is specified.
MISSOVER sets all remaining variables without values to missing. And then read the next line for new observations.
STOPOVER stops a DATA step when INPUT does not find values for all the variables specified. This option is useful when we need to check missing values in raw data.
TRUNCOVER does not read a new data line when INPUT does not find values in the current data lines for all variables specified. This option sets all remaining variables without values to missing.
INFILE DATALINES MISSOVER;
INPUT id korean english;
01 87 65
The above example reads four observations with missing. Note that "english" of the second observation is missing since consecutive blanks are interpreted as a delimiter.
If MISSOVER option is omitted, SAS reads only two observations with data massed up; the "english" of the second observations is set "03".
If STOPOVER option is used instead of MISSOVER, SAS will stop at the second data line since there is not enough data values for variables specified.
Since the version 8.0, SAS provides the IMPORT procedure to read various types of external files, such as ACESS, EXCEL, and dBASE III.
In previous versions, the DBF and DIF procedures deal with dBASE III and spreadsheet files, respectively.
The IMPORT procedure creates data sets from various types of external files.
So an additional DATA step is not necessary unless the data set needs to be manipulated. In other words, the IMPORT procedure do not require the INPUT and DATALINES (or CARDS) statements.
There are two ways of using the IMPORT procedure. One is to write a IMPORT statement in a standard SAS program, the other is to use the IMPORT/EXPORT Wizard which is available under the FILE menu. Personally,
I prefer the IMPORT Wizard to the procedure, since the former provides a user-friendly interface, high flexibility, and other useful features.
Is the IMPORT statement always better than the INFILE? It depends.
For example, when you need to read a huge ACCESS file (say larger than 1GB), but only several variables are needed for analyses, the INFILE statement is recommended.
The Import/Export Wizard provides extremely flexible ways of reading external data.
The wizard works in the graphic user interface (GUI) environment such as X-window and Microsoft Windows.
If your data file is messy and ill-organized, the wizard will be a good solution.
The Import/Export Wizard actually works in the same way that the Import and Export procedures in a SAS program do; The Wizard can generate SAS programs for Import and Export procedures.
Click "File" from the top menu bar and choose "Import Data..."
Choose a file type from the list. You may choose "User-defined formats" for messy data.
Locate an external file to be imported.
Provide a library and SAS data set name.
If you want SAS to create a SAS program file that is equivalent to what the Wizard does, specify the file name.
READ DIRECT DATA ENTRY
You can directly input data in a Data Step. INFILE DATALINES tells SAS to construct a data set from data lines after the DATALINES statement.
The following example reads a numeric variable id, a string variable depart, and a numeric variable price.
Note that data items are delimited (separated) by a space. In this case of the space-delimited format, INFILE DATALINES can be omitted.
INPUT id depart $ employee;
The above example implies that ASCII text data (egov.txt) are delimited with space (or blank).
If a ASCII file is delimited by other than a space, you need to explicitly specify delimiter using DELIMITER (or DLM) option with using various characters or combinations of characters as a delimiter.
For example, DLM='09'x for Tab, DLM='&' for ampersand , DLM='^' for caret, DLM=';' for semicolon, DLM='END', and so forth.
INFILE egov.txt DELIMITER=',';
INPUT name $ male training corrupt negative;
The INFILE is also able to read the data stream of DATALINES (or CARDS).
You may specify DATALINES instead of an external file name.
Note that each data element in the following example is separated by caret (^).
INFILE DATALINES DELIMITER='^';
INPUT id fruit $ sales;
What if a comma is used as a delimiter and there is a missing value in the second data line?
The DSD option is the answer. DSD reads a value as missing between two consecutive delimiters (,,). Without this option, the above program reads only first observation.
INFILE DATALINES DLM=',' DSD;
INPUT id fruit $ sales;
"DATA fruit" does not specify a library, thus the data set "fruit" is stored in the "work" library, which is the default library existing on the RAM only.
Since the DSD option sets a comma as the default delimiter, DLM=',' is not necessary in this example. But if other delimiter is used, both DLM and DSD option should be used to read missing values correctly.
If you want to export a data set into an external file, use PUT statement. PUT allows you to control the output format flexibly, but it is a bit difficult, especially for beginners, to use this statement correctly.
The SHEET statement specifies the worksheet to be imported. If this statement is omitted, the worksheet will be referred to "sheet1", "sheet2", and so on.
If you want to read a subset of a worksheet, provide a starting and ending cell point after the worksheet name. For example, SHEET="computer$B3:F100"; reads data from B3 through F100, excluding column A and row 1 and 2.
The EXCEL2000 indicates the Microsoft EXCEL 2000 format. Other releases include EXCEL4, EXCEL5, and EXCEL97. For LOTUS 123 files, use WK1, WK3, or WK4.
The GETNAMES=YES reads variable names from the first row of the worksheet. The default is YES.
The following example exports a data set to an Excel file.
Notice that there is only one semicolon at the end of EXPORT statement.
Unlike a dBASE III or FoxPro file, an Access file can has more than one table with security features.
Thus, you have to provide database, table, account (identification), and password to access a database file.
The DATABASE statement specifies the drive, path, and access file name (e.g., "asset.mdb").
The TABLE option specifies the table name in the access file (e.g., "computer").
The UID statement specifies the user ID who is allowed to access the database. Similarly, the PWD specifies the password to log in the database. You may ignore them unless specific UID and PWD are designated.
The EXPORT procedure can export a SAS data set to a Access table.
Note that the OUTTABLE specifies a table name to be generated in the ACCESS file.
SAS can access data through FTP, HTTP, and TCP/IP.
This network functionality provides high flexibility and convenience in the information era.
The following example uses FTP to access the data resource.
Note that FTP, USER, PASS, and HOST are reserved words for protocol, account name, password, and host computer name, respectively.
The PROMPT specifies to prompt for the user login password, if necessary.
SAS cannot directly read customized data formats such as STATA *.dta. Accordingly, you have to export a data set in the software into a general file format (e.g., *.csv) and then import it into a SAS data set.
Or you may use the COPY procedure to create a transport file that can be recognized by other software.
However, these tasks often become cumbersome and burdensome to most researchers.
The most efficient solution in this case is to use professional data transferring utilities such as Stat/Transfer and DBMS/COPY that transfer data from one software to another.
These utilities support a variety of file formats such as SAS, Stata, Gauss, Rats, ACCESS, dBASE, FoxPro, Paradox, Lotus 1-2-3, Quattro, Excel, Sigmaplot, Minitab, and SPSS.
SAS Institute. 2005. SAS Language Reference: Concepts, Version 9. Cary, NC: SAS Institute.
SAS Institute. 2005. SAS Language Reference: Dictionary, 2nd ed., Version 9, Volumn 3. Cary, NC: SAS Institute.
Burlew, Michele M. 2002. Reading External Data Files Using SAS: Examples Handbook. Cary, NC: SAS Institute.