The INPUT statement describes the arrangement of a target data to be read in a DATA step.
You need to provide variable names followed by $ (indicating a character value), pointer control, column-specifications, informat, and/or line hold specifiers (i.e., @, and @@) in an INPUT statement.
Column pointer controls such as @n and +n move the input pointer to a specified column in the input buffer.
Line pointer controls such as #n and / move the input pointer to a specified line in the input buffer.
Column specifications specify the columns of the input record that contain the value to read.
A informat is an instruction that SAS uses to read data into variables.
@, a single trailing @, holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. Thus, the next INPUT statement reads from the same record (line).
@@, a double trailing @, holds the input record for the execution of the next INPUT statement across iterations of the DATA step. Thus, the INPUT statement for the next iteration of the DATA step continues to read the same record (line).
The DATALINES statement (replacing the old CARDS statement) indicates that data lines follow in a DATA step.
In order to read external data files, you have to use the INFILE statement.
There are six input styles used in the INPUT statement: list input, column input, formatted input, modified list input, named input, and mixed input.
The following table summarizes features of four major styles.
. or delimiter
Variable order, DSD
:, &, ~
@n, +n, #, /
Which input style is the best? It depends on your skills and characteristics of data sets.
If your data set has just a few observations with several variables, the list input or the named input will be better than the column input or the formatted input.
When data elements are not separated with a blank or other delimiters, you cannot use the list input style.
When data are well arranged, the column input or formatted input will be better than the list input.
Therefore, you need to examine the data structure carefully when deciding the best input style.
Of course, you must take this issue into account from the data coding stage.
The input style simply lists variables separated with a blank.
This style is also called the free format.
INPUT name $ id score;
Park 8740031 87.5
Hwang . 94.3
A character variable should be followed by $.
A missing value should be marked with a period (.); a blank does not mean a missing value in this input style.
Do not use more than one "." for a missing value.
The maximum length of a string variable is 8 characters (standard); that is, fixed 8bytes of memory
are assigned to each variable. Therefore, a string longer than 8 characters will be trimmed.
If you want to read a string longer than 8 characters, use LENGTH, INFORMAT, or ATTRIB statements.
Or you may use different input styles such as column input or formatted input.
LENGTH analysis $15.;
INFORMAT year MMDDYY10.;
INPUT analysis year;
FORMAT year DATE9.;
In the example above, you may use "INFORMAT analysis $15." instead of the LENGTH statement. INFORMAT tells how data are read, while FORMAT tells the
format to be displayed. MMDDYY10. reads data in the MM/DD/YYYY format. DATE9. displays date in the DDMMMYYYY format. Without the FORMAT for year, SAS will return odd
numbers such as 14884, which are internally used in SAS.
The following example reads an ASCII text file with a comma delimited. Remember the default delimiter is a blank.
See the INFILE statement for the detail.
INFILE 'a:\tiger.dat' DELIMITER=',' STOPOVER;
INPUT name $ id score
The modified list style is a mixture of the list input and the formatted input. This style can deal with ill-structured data.
There are three format modifiers to be used especially when reading complex data.
colon (:) reads data longer than standard 8 characters or numbers until encountering specified delimiter or reaching the variable width specified.
ampersand (&) format modifier reads character values that contain embedded blanks with list input and reads until encountering more than one consecutive delimiter. You may include " (double quotes) in the value of a character variable.
tilde (~) reads and retains single quotation marks, double quotation marks, and delimiters within quoted character values. That is, double quotation marks enclosing
a string are treated as values of a character variable.
The following example illustrates how : and & work in INPUT.
The "Lindblom80" in the first row is trimed since it exceeds 8 characters; only first 8 characters,
as specified in the INPUT statement, are read and the last two characters "08" are ignored.
In the second row, SAS reads the first four characters "Park", which are shorter than 8 characters, and then encounters a comma (delimiter);
SAS stops reading data for the variable "name" and moves on to next variable.
The variable "title" is defined by & with a maximum 50 characters. The delimiter, a comma, in the first and third row is treated as a character value.
Two consecutive double quotation marks are read as a double quotation marks.
Therefore, the title of the second observation is Readig "Small Is Beautiful" as shown in the output.
Characters exceeding the maximum, 50 characters in this case, will be ignored.
INFILE DATALINES DELIMITER=',' DSD;
INPUT name : $8. title & $50.;
Lindblom80,"Still Muddling, Not Yet Through"
Park, "Reading ""Small Is Beautiful"""
Simon, """It was a disaster,"" he continue..."
Lindblom Still Muddling, Not Yet Through
Park Reading "Small Is Beautiful"
Simon "It was a disaster," he continue...
The INFILE statement above says that data are comma delimited and will be listed after DATALINES.
DSD at the end of INFILE eliminates double quotation marks enclosing the character value when reading data.
If you omit DSD, SAS will consider a comma in character values as a delimiter and read enclosing double quotation marks as character values.
As a result, the output would look like,
Lindblom "Still Muddling
Park "Reading ""Small Is Beautiful"""
Simon """It was a disaster,"" he continue..."
The second example shows how ~ (tilde) and DSD work together to read a string with a delimiter.
SAS reads a comma in the string as a character value but does not eliminate double quoatation marks enclosing the string.
If you omit DSD, the title of the second row will be '"Still Muddling' because SAS treats a comma in the string as the delimiter and stops reading the character value for variable "title."
INFILE DATALINES DELIMITER=',' DSD;
INPUT name : $20. year : 4.0 title ~ $50.;
Meyer and Rowan,1977,"Institutionalized Organization"
Lindblom,1979,"Still Muddling, Not Yet Through"
Meyer and Rowan 1977 "Institutionalized Organization"
Lindblom 1979 "Still Muddling, Not Yet Through"
/* Output without DSD
Meyer and Rowan 1977 "Institutionalized Organization"
Lindblom 1979 "Still Muddling
You may not ommit : after "year" in the INPUT statement above even when data are in the same fixed format.
When the variable "year" is specified at the last of the list in the INPUT statement, : is not necessary.
The formatted input style reads input values with specified informats after variable names.
Informats provide the data type and the width of an input value.
Numeric variables are expressed in the w.d format, where w represents the total length of a variable and d the number of digits below the decimal point.
You cannot omit d even when d = 0.
The use $CHARw. or $w. format is used for character variables, while the DATEw. or DDMMYYw. format is used for the date type.
The following example illustrates how effectively the formatted input uses column holders, informats (e.g., COMMAn., DOLLarn., PERCENTn., and MMDDYY10.), and parentheses.
SAS reads a variable x1 as a string five characters long, a numeric variable x2 7 digits long without decimal point, three digit numeric variables x3 through x5, and then skip one column (+1) before reading a numeric variable income containing commas.
INPUT (x1-x5) ($CHAR5. 7. 3*3.0) +1 income COMMA7.;
The formattted input can use both column and line pointer controls.
These pointer controls are very useful when reading multiple observations from the same line or reading an observation from multiple lines.
@n, a column control, moves the input pointer to nth column
@@, a line holder, keeps the pointer in the line and wait other data input
+n, a column control, moves the pointer to the right by n columns
The named input reads a data value that follows its variable name.
A variable name and its data value are separated by an equal sign.
String data are NOT enclosed by double quotation marks in this style.
Like the list style, the named style supports standard length of variables only.
The format provides some sorts of flexibility, but it will not be appropriate for a large data set.
Let us read multiple observations in a line using the formatted input style. The following script reads string variables "name" and "id" consecutively, and reads three digit numeric variables x1 through x3, and then keep reading next observations, if available, without moving to next line.
Now, let us read observations whose data are provided in multiple lines.
The #n or / indicates a data line to be read for the variable.
INPUT #1 No 7.0 #2 name $CHAR15. / address $CHAR50. #4 phone $CHAR12.;
2451 E. 10th St. APT 311
800 N. Union St. APT 525
1 Park 2451 E. 10th St. APT 311 812-857-9425
2 Hun 800 N. Union St. APT 525 812-857-6256
The INPUT statement above says that read a 7 digit numeric variable "No" from the first line (#1), a 15 character string variable "name" from the second line (#2), a 50 character string variable "address" from the next line (/), and a 12 character string variable "phone" from the fourth line (#4). Alternatively, the INPUT may be replaced by "INPUT No 7.0 / Name $15 / Address $50 / Phone $12;."