May 27, 2021 SAS
Multiple SAS datasets can be combined based on specific public variables to give a single dataset. T his is done using the MERGE statement and the BY statement. T he total number of observations in the merged dataset is usually smaller than the sum of the observations in the original dataset. T his is because when the values of the public variables match, the variables form two data sets that are merged into one record.
Here are two prerequisites for merging datasets:
The basic syntax of THEGE and BY statements in SAS is:
MERGE Data-Set 1 Data-Set 2 BY Common Variable
Here is a description of the parameters used:
Consider two SAS datasets, one containing an employee ID with a name and salary, and the other containing an employee ID with an employee ID and department. I n this case, in order to get complete information about each employee, we can combine the two datasets. T he final dataset will still have an observation for each employee, but it will contain salary and department variables.
# Data set 1 ID NAME SALARY 1 Rick 623.3 2 Dan 515.2 3 Mike 611.5 4 Ryan 729.1 5 Gary 843.25 6 Tusar 578.6 7 Pranab 632.8 8 Rasmi 722.5 # Data set 2 ID DEPT 1 IT 2 OPS 3 IT 4 HR 5 FIN 6 IT 7 OPS 8 FIN # Merged data set ID NAME SALARY DEPT 1 Rick 623.3 IT 2 Dan 515.2 OPS 3 Mike 611.5 IT 4 Ryan 729.1 HR 5 Gary 843.25 FIN 6 Tusar 578.6 IT 7 Pranab 632.8 OPS 8 Rasmi 722.5 FIN
The above results are implemented by using the following code, where public variables (IDs) are used in BY statements. N ote that observations in both datasets are sorted in the ID column.
DATA SALARY; INPUT empid name $ salary ; DATALINES; 1 Rick 623.3 2 Dan 515.2 3 Mike 611.5 4 Ryan 729.1 5 Gary 843.25 6 Tusar 578.6 7 Pranab 632.8 8 Rasmi 722.5 ; RUN; DATA DEPT; INPUT empid dEPT $ ; DATALINES; 1 IT 2 OPS 3 IT 4 HR 5 FIN 6 IT 7 OPS 8 FIN ; RUN; DATA All_details; MERGE SALARY DEPT; BY (empid); RUN; PROC PRINT DATA=All_details; RUN;
There may be cases where some values of public variables do not match between datasets. I n this case, the dataset is still merged, but the missing value is given in the result.
ID NAME SALARY DEPT 1 Rick 623.3 IT 2 Dan 515.2 OPS 3 . . IT 4 Ryan 729.1 HR 5 Gary 843.25 FIN 6 Tusar 578.6 . 7 Pranab 632.8 OPS 8 Rasmi 722.5 FIN
To avoid missing values in the results, we can consider keeping only observations with matching values for public variables. T his is done by using an IN statement. T he TARGET statement of the SAS program needs to be changed.
In the following example, the IN-value retains only the observations that match the values from the dataset SALARY and DEPT.
DATA All_details; MERGE SALARY(IN=a) DEPT(IN=b); BY (empid); IF a=1 and b=1; RUN; PROC PRINT DATA=All_details; RUN;
When we execute the SAS program with the above-mentioned changes, we get the following output.
1 Rick 623.3 IT 2 Dan 515.2 OPS 4 Ryan 729.1 HR 5 Gary 843.25 FIN 7 Pranab 632.8 OPS 8 Rasmi 722.5 FIN