Korea University
Korea Hangeul Help Sitemap Calc Link
SAS Import & Export
This page explains how to import from and export to ASCII text (Tab and CSV), spreadsheet (Excel), and database files in SAS.

In UNIX, PROC IMPORT and PROC EXPORT must be executed under the X window. Otherwise, you have to add "noterminal" to a command (e.g., "sas -noterminal file_name.sas").

This web document may not be used for any commercial purposes. If you have any question and suggestion, please leave a message on the message board.

ASCII Text Files | Excel Files | dBASE III Files | ACCESS Files | Matrices | Network (FTP/HTTP)
INFILE Statement | INFILE Options | IMPORT & EXPORT | IMPORT & EXPORT Wizard
Data Conversion Utilities | Using Informats | Clipboard | ODBC/SQL | References

OVERVIEW

In SAS, there are various data sources as showin in the following figure. In general, SAS reads data using the INFILE statement and PROC IMPORT. You may use the PUT statement in a Data Step or PROC EXPORT to export data sets into external files. INFILE should be used in a DATA step, while PROC IMPORT and PROC EXPORT are independent procedures.

The INFILE statement reads data directly using the DATALINES (CARDS) statement, imports various ASCII text files, and imports data sets through network (i.e., FTP and HTTP). INFILE and DATALINES also read data in a matrix form. PROC IMPORT reads ASCII text files, database (ACCESS, dBASE), and spreadsheet (Lotus 1-2-3, Excel).

If you have simple data, read them using INFILE and DATALINES; otherwise, use PROC IMPORT. Personally, I prefer the IMPORT Wizard to PROC IMPORT due to its user-friendly interface and flexibility. If you have a data set generated in other software packages (e.g., Excel, dBASE III, Paradox, Stata, and SPSS), use a data conversion utility like Stat/Transfer.


INFILE STATEMENT

The INFILE statement identifies external files to be read in the INPUT statement. It reads an ASCII text file that is typically delimited by a space (default), Tab, comma, or other delimiters. Let us take the following example first.

LIBNAME js 'c:\sas';
FILENAME egov 'c:\sas\egov.txt';

DATA js.seoul;
   INFILE egov LRECL=250 FIRSTOBS=7 OBS=700;
   INPUT name $ male training corrupt negative;
RUN;
  • The LIBNAME statement designate a library, an alias of the collection of data sets, to the specified directory (c:\sas).
  • The FILENAME statement associates a file reference with a external file (drive+path+filename). Without the statement, you should explicitly specify the drive, paths, and file name in the INFILE statement as "INFILE 'c:\sas\egov.txt';
  • The "DATA js.seoul" creates a SAS data set "seoul" in the "js" library. The result of this DATA step is stored into the "seoul.sas7bdat" in the c:\sas\.
  • The INFILE statement specifies the file to be read. You may add options, such as LRECL=n FIRSTOBS=n OBS=n, if necessary.
  • The "LRECL=250", the logical record length, tells SAS to read a line up to 250 columns.
  • The "FIRSTOBS=7" tells SAS to read data from the 7th line; The first 6 lines are ignored.
  • The "OBS=700" tells SAS to read first 700 observations. Omitting this option reads all observations in a data file.
  • The INPUT statement specifies a list of variable name, type, and length depending on input styles.

Up

INFILE OPTIONS
  • DSD (delimiter-sensitive data) tells SAS to treats two consecutive delimiters as a missing value and removes quotation marks from character values.
  • FIRSTOBS specifies a record number that SAS begin reading input data records.
  • FLOWOVER (default) reads the next data line if there is no enough data in the current data line for all variables specified. This method may be problematic when there is missing values.
  • LRECL (logical record length) specifies the logical record length. This option is not valid when the DATALINES is specified.
  • MISSOVER sets all remaining variables without values to missing. And then read the next line for new observations.
  • STOPOVER stops a DATA step when INPUT does not find values for all the variables specified. This option is useful when we need to check missing values in raw data.
  • TRUNCOVER does not read a new data line when INPUT does not find values in the current data lines for all variables specified. This option sets all remaining variables without values to missing.
DATA js.miss;
INFILE DATALINES MISSOVER;
INPUT id korean english;

DATALINES;
01 87 65
02 86
03 95
04
RUN;

The above example reads four observations with missing. Note that "english" of the second observation is missing since consecutive blanks are interpreted as a delimiter. If MISSOVER option is omitted, SAS reads only two observations with data massed up; the "english" of the second observations is set "03". If STOPOVER option is used instead of MISSOVER, SAS will stop at the second data line since there is not enough data values for variables specified.

Up

IMPORT & EXPORT PROCEDURES

Since the version 8.0, SAS provides the IMPORT procedure to read various types of external files, such as ACESS, EXCEL, and dBASE III. In previous versions, the DBF and DIF procedures deal with dBASE III and spreadsheet files, respectively.

The IMPORT procedure creates data sets from various types of external files. So an additional DATA step is not necessary unless the data set needs to be manipulated. In other words, the IMPORT procedure do not require the INPUT and DATALINES (or CARDS) statements.

There are two ways of using the IMPORT procedure. One is to write a IMPORT statement in a standard SAS program, the other is to use the IMPORT/EXPORT Wizard which is available under the FILE menu. Personally, I prefer the IMPORT Wazard to the procedure, since the former provides a user-frendly interface, high flexibility, and other useful features.

Is the IMPORT statement always better than the INFILE? It depends. For example, when you need to read a huge ACCESS file (say larger than 1GB), but only several variables are needed for analyses, the INFILE statement is recommended.


IMPORT & EXPORT WIZARD

The Import/Export Wizard provides extremely flexible ways of reading external data. The wizard works in the graphic user interface (GUI) environment such as X-window and Microsoft Windows. If your data file is messy and ill-organized, the wizard will be a good solution.

The Import/Export Wizard actually works in the same way that the Import and Export procedures in a SAS program do; The Wizard can generate SAS programs for Import and Export procedures.

In order to use the Wizard,

  1. Click "File" from the top menu bar and choose "Import Data..."
  2. Choose a file type from the list. You may choose "User-defined formats" for messy data.
  3. Locate an external file to be imported.
  4. Provide a library and SAS data set name.
  5. If you want SAS to create a SAS program file that is equivalent to what the Wizard does, specify the file name.

Up

READ DIRECT DATA ENTRY

You can directly input data in a Data Step. INFILE DATALINES tells SAS to construct a data set from data lines after the DATALINES statement. The following example reads a numeric variable id, a string variable depart, and a numeric variable price. Note that data items are delimited (separated) by a space. In this case of the space-delimited format, INFILE DATALINES can be omitted.

DATA fruit;
INFILE DATALINES;
INPUT id depart $ employee;

DATALINES;
1 sales 16
2 finance 25
3 research 10
RUN;

Up

IMPORT & EXPORT ASCII TEXT FILES

The above example implies that ASCII text data (egov.txt) are delimited with space (or blank). If a ASCII file is delimited by other than a space, you need to explicitly specify delimiter using DELIMITER (or DLM) option with using various characters or combinations of chracters as a delimiter. For example, DLM='09'x for Tab, DLM='&' for ampersand , DLM='^' for caret, DLM=';' for semicolon, DLM='END', and so forth.

DATA js.seoul;
   INFILE egov.txt DELIMITER=',';
   INPUT name $ male training corrupt negative;
RUN;

The INFILE is also able to read the data stream of DATALINES (or CARDS). You may specify DATALINES instead of an external file name. Note that each data element in the following example is separated by caret (^).

DATA js.fruit;
INFILE DATALINES DELIMITER='^';
INPUT id fruit $ sales;

DATALINES;
1^Grape^100
2^Pear^77
RUN;

What if a comma is used as a delimiter and there is a missing value in the second data line? The DSD option is the answer. DSD reads a value as missing between two consecutive delimiters (,,). Without this option, the above program reads only first observation.

DATA fruit;
INFILE DATALINES DLM=',' DSD;
INPUT id fruit $ sales;

DATALINES;
1,Grape,100
2,,77
RUN;
  • "DATA fruit" does not specify a library, thus the data set "fruit" is stored in the "work" library, which is the default library existing on the RAM only.
  • Since the DSD option sets a comma as the default delimiter, DLM=',' is not necessary in this example. But if other delimiter is used, both DLM and DSD option should be used to read missing values correctly.

If you want to export a data set into an external file, use PUT statement. PUT allows you to control the output format flexibly, but it is a bit difficult, especially for beginners, to use this statement correctly.

Up

Now, let us use IMPORT and EXPORT procedures to handle ASCII text files. You need to specify delimiter of the ASCII text file in the DBMS option.

PROC IMPORT DATAFILE="c:\sas\ego.csv" OUT=jeeshim.egov DBMS=CSV REPLACE;
   GETNAMES=YES;
   DATAROW=2;
RUN;
  • The DATAFILE specifies an external file to be imported.
  • The OUT specifies the SAS data file to be created.
  • The DBMS specifies the type of the external file. For example,"DBMS=DLM", "DBMS=CSV", and "DBMS=TAB".
  • The REPLACE overwrites an existing file, if any.
  • The GETNAMES reads variable names from the first line of the data file.
  • The DATAROW option tells the row from which SAS reads observations.
  • GETNAMES and DATAROW require semicolon at the end.

The following example exports a data set to a space delimited ASCII text file. The DATA and the OUTFILE respectively specify a data set to be exported and an external file to which data are exported.

PROC EXPORT
   DATA=jeeshim.egov
   OUTFILE="c:\sas\ego.txt"
   DBMS=DLM REPLACE;
RUN;

Up

IMPORT & EXPORT EXCEL FILES

PROC IMPORT imports spreadsheet files such as Excel and Lotus 1-2-3.

PROC IMPORT DATAFILE="c:\sas\assets.xls" OUT=js.pc DBMS=EXCEL2000 REPLACE;
   SHEET="computer";
   GETNAMES=YES;
RUN;
  • The SHEET statement specifies the worksheet to be imported. If this statement is omitted, the worksheet will be referred to "sheet1", "sheet2", and so on.
  • If you want to read a subset of a worksheet, provide a starting and ending cell point after the worksheet name. For example, SHEET="computer$B3:F100"; reads data from B3 through F100, excluding column A and row 1 and 2.
  • The EXCEL2000 indicates the Microsoft EXCEL 2000 format. Other releases include EXCEL4, EXCEL5, and EXCEL97. For LOTUS 123 files, use WK1, WK3, or WK4.
  • The GETNAMES=YES reads variable names from the first row of the worksheet. The default is YES.

The following example exports a data set to an Excel file. Notice that there is only one semicolon at the end of EXPORT statement.

PROC EXPORT
   DATA=js.egov
   OUTFILE="c:\sas\egov.xls"
   DBMS=EXCEL2000 REPLACE;
RUN;

Up

IMPORT & EXPORT dBASE III FILES

A dBase III file has ony one table that has a well defined data structure. You may omit the "DBMS=DBF" option since SAS can recognized file format with the extension, "dbf."

PROC IMPORT
   DATAFILE="c:\sas\comuter.dbf"
   OUT=js.pc
   DBMS=DBF REPLACE;
RUN;

The following example exports a data set to a dBase III file.

PROC EXPORT
   DATA=js.pc
   OUTFILE="c:\sas\pc.dbf"
   DBMS=DBF REPLACE;
RUN;

Up

IMPORT & EXPORT ACCESS FILES

Unlike a dBASE III or FoxPro file, an Access file can has more than one table with security features. Thus, you have to provide database, table, account (identification), and password to access a database file.

PROC IMPORT TABLE="comuter" OUT=js.pc DBMS=ACCESS REPLACE;
   UID=""; PWD=""; WGDB="";
   DATABASE="c:\sas\asset.mdb";
RUN;
  • The DATABASE statement specifies the drive, path, and access file name (e.g., "asset.mdb").
  • The TABLE option specifies the table name in the access file (e.g., "computer").
  • The UID statement specifies the user ID who is allowed to access the database. Similarly, the PWD specifies the password to log in the database. You may ignore them unless specific UID and PWD are designated.

The EXPORT procedure can export a SAS data set to a Access table. Note that the OUTTABLE specifies a table name to be generated in the ACCESS file.

PROC EXPORT
   DATA=js.egov
   OUTTABLE="egov2000"
   DBMS=ACCESS2000 REPLACE;
   DATABASE="c:\sas\egov.mdb";
RUN;

Up

USING NETWORK RESOURCES

SAS can access data through FTP, HTTP, and TCP/IP. This network functionality provides high flexibility and convenience in the information era.

The following example uses FTP to access the data resource. Note that FTP, USER, PASS, and HOST are reserved words for protocol, account name, password, and host computer name, respectively. The PROMPT specifies to prompt for the user login password, if necessary.

FILENAME myftp FTP 'gov.txt' USER='kucc625' PASS='xxxxx' PROMPT
   HOST='mdss.iu.edu' CD='/sas/egov';

DATA js.egov;
   INFILE myftp;
   INPUT year country $ domain index;
RUN;

The following examples use URL to access data files available on a website.

FILENAME myurl URL 'http://masil.org/archives/airline.txt';

DATA js.egov;
   INFILE myurl;
   INPUT airline year output cost fuel load;
RUN;

FILENAME myurl URL 'http://www.masil.org/archives/smoking.txt';

DATA js.smoking;
   INFILE myurl FIRSTOBS=34;
   INPUT state $ cigar bladder lung kidney leukemia area;
RUN;

In order to use the TCP/IP connection, you have to use the SOCKET instead of URL.

Up

USING INFORMATS

If data include comma in numbers, dollar ($), and percent (%), you have to use informats such as COMMAn.n, DOLLARn.n, and PERCENTn.n in the formatted input style.

DATA sales;
INFILE DATALINES;
INPUT year quan COMMA9.0 sales DOLLAR10.0 rate PERCENT3.;

DATALINES;
2000 184,871 $2,875,879 80%
2001 875,877 $5,987,972 89%
RUN;

If data are in date informats, you may use MMDDYY10. and MMDDYY8. formats. Note that the period (.) in the third observation indicates a missing value.

DATA book;
INFILE DATALINES;
INPUT start MMDDYY10. end MMDDYY8.;

DATALINES;
08/12/1999 03/01/04
03/02/1987 02/25/91
03/02/1991 .
RUN;

Up

READING MATRICES

SAS can also read such matrix forms as correlation coefficients (TYPE=CORR), covariance (TYPE=COV), and parameter estimators (TYPE=EST).

DATA corr_mat (TYPE=CORR);
INFILE DATALINES MISSOVER;
matrix ='Correlation';
INPUT _NAME_ $ var1 var2 var3;

DATALINES;
var1 1.00000
var2 0.25757 1.00000
var3 0.57844 0.54865 1.00000
RUN;
  • The _TYPE_, a special SAS variable, is used to distinguish the various statistics such as _TYPE_='CORR', _TYPE='MEAN', and _TYPE_'STD'.
  • The _NAME_ is needed to identify the row of the correlation matrix.

Up

USING CLIPBOARD

SAS can read text data from and write text data to the clipboard.

You may import a worksheet of an Excel file. First highlight the part of worksheet in Excel and copy it into the Windows Clipboard. Suppose you choose 5 variables.

FILENAME clipboard CLIPBRD;

DATA excel;
   INFILE clipboard
   INPUT x1-x5;
RUN;

ACCESSING THROUGH ODBC

SAS can access data through the ODBC connection. You need to define a DSN for database or spreadsheet files in advance ("access_dsn").

  1. Run the "ODBC Data Source Administrator" at the Control Panel
  2. Click "System DSN" and then choose "Addd" to create a new DSN
  3. Select a proper driver (mdb, xls, etc.) and then provide DSN name and description
  4. Specify database, spreadsheet, or database server (local or SQL server name)
  5. Click "Advanced" to enter login ID and password, if necessary

The DSN is connected by the CONNECT statement in the SQL procedure. A series of SQL statements such as CREATE and SELECT follow the CONNECT stateemnt.

PROC SQL;;
CONNECT TO ODBC AS db_con
   (DATASRC="access_dsn" USER=kucc625 PASSWORD=xxxx);
SELECT * FROM CONNECTION TO ODBC(SELECT * FROM egov2004);
...;
QUIT;

Up

DATA CONVERSION UTILITIES

SAS cannot directly read customized data formats such as STATA *.dta. Accordingly, you have to export a data set in the software into a general file format (e.g., *.csv) and then import it into a SAS data set. Or you may use the COPY procedure to create a transport file that can be recognized by other software.

However, these tasks often become cumbersome and burdensome to most researchers. The most efficient solution in this case is to use professional data transferring utilities such as Stat/Transfer and DBMS/COPY that transfer data from one software to another.

These utilities support a variety of file formats such as SAS, Stata, Gauss, Rats, ACCESS, dBASE, FoxPro, Paradox, Lotus 1-2-3, Quattro, Excel, Sigmaplot, Minitab, and SPSS.

Up

REFERENCES
  • SAS Institute. 2005. SAS Language Reference: Concepts, Version 9. Cary, NC: SAS Institute.
  • SAS Institute. 2005. SAS Language Reference: Dictionary, 2nd ed., Version 9, Volumn 3. Cary, NC: SAS Institute.
  • Burlew, Michele M. 2002. Reading External Data Files Using SAS: Examples Handbook. Cary, NC: SAS Institute.