Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

The SAS connection dataset


May 27, 2021 SAS


Table of contents


You can use set statements to connect multiple SAS datasets to provide a single dataset. T he total number of observations in the connected dataset is the sum of the observations in the original dataset. T he order of observation is continuous. A ll observations from the first dataset are followed by all observations from the second dataset, and so on.

All combined datasets have the same variables, but if they have different numbers of variables, all variables appear in the result and have missing values for smaller datasets.

Syntactic

The basic syntax of set statements in SAS is:

SET data-set 1 data-set 2 data-set 3.....;

Here is a description of the parameters used:

  • data-set1, data-set2 is the name of the dataset written one by one.

Cases

Consider the employee data of the organization available in two different data sets, one for it and one for non-IT departments. F or full details of all employees, we use the SET statement shown below to connect the two datasets.

DATA ITDEPT; 
  INPUT empid name $ salary  ; 
DATALINES; 
1 Rick 623.3 
3 Mike 611.5 
6 Tusar 578.6 
; 
RUN; 
DATA NON_ITDEPT; 
  INPUT empid name $ salary  ; 
DATALINES; 
2 Dan 515.2 
4 Ryan 729.1 
5 Gary 843.25 
7 Pranab 632.8 
8 Rasmi 722.5 
RUN; 
DATA All_Dept; 
  SET ITDEPT NON_ITDEPT; 
RUN; 
PROC PRINT DATA=All_Dept; 
RUN;  

When executing the above code, we can get the following output.

The SAS connection dataset

Scheme

Variables can have different results when we have many variations in the dataset used for cascading, but the total number of observations in the cascading dataset is always the sum of the observations in each dataset. W e will consider many of these changes below.

Different numbers of variables

If one of the original datasets has more variables and the other dataset, the dataset is still combined, but in a smaller dataset, these variables appear to be missing.

Cases

In the following example, the first dataset has an additional variable called DOJ. I n the result, the DOJ value of the second dataset is displayed as missing.

DATA ITDEPT; 
  INPUT empid name $ salary DOJ date9.  ; 
DATALINES; 
1 Rick 623.3 02APR2001
3 Mike 611.5 21OCT2000
6 Tusar 578.6 01MAR2009  
; 
RUN; 
DATA NON_ITDEPT; 
  INPUT empid name $ salary  ; 
DATALINES; 
2 Dan 515.2 
4 Ryan 729.1 
5 Gary 843.25 
7 Pranab 632.8 
8 Rasmi 722.5 
RUN; 
DATA All_Dept; 
  SET ITDEPT NON_ITDEPT; 
RUN; 
PROC PRINT DATA=All_Dept; 
RUN;  

When executing the above code, we can get the following output.

The SAS connection dataset

Different variable names

In this case, the dataset has the same number of variables, but the variable names are different between them. I n this case, the normal connection produces all the variables in the result set and gives the missing results of the two different variables. A lthough we can't change the variable name in the original dataset, we can apply the RENAME function in the connection dataset we created. T his will produce the same result as a normal connection, but of course a new variable name will be used instead of the two different variable names that exist in the original data set.

Cases

In the following example, the dataset ITDEPT has the variable name ename, while the dataset NON_ITDEPT the variable name empame. B ut these two variables represent the same type (characters). W e apply the RENAME function in the SET statement, as shown below.

DATA ITDEPT; 
  INPUT empid ename $ salary  ; 
DATALINES; 
1 Rick 623.3 
3 Mike 611.5 
6 Tusar 578.6 
; 
RUN; 
DATA NON_ITDEPT; 
  INPUT empid empname $ salary  ; 
DATALINES; 
2 Dan 515.2 
4 Ryan 729.1 
5 Gary 843.25 
7 Pranab 632.8 
8 Rasmi 722.5 
RUN; 
DATA All_Dept; 
  SET ITDEPT(RENAME =(ename=Employee) ) NON_ITDEPT(RENAME =(empname=Employee) ); 
RUN; 
PROC PRINT DATA=All_Dept; 
RUN;  

When executing the above code, we can get the following output.

The SAS connection dataset

Different variable lengths

If the variable lengths in the two datasets are different, the connected dataset will have some of the values that the data is truncated for variables with a smaller length. T his can happen if the length of the first dataset is small. T o solve this problem, we apply a higher length to the dataset, as shown below.

Cases

In the following example, the variable ename is 5 in the first dataset and 7 in the second dataset. W hen connecting, we apply the LENGTH statement in the connection dataset to set the enumerity length to 7.

DATA ITDEPT; 
  INPUT  empid 1-2 ename $ 3-7 salary 8-14  ; 
DATALINES; 
1 Rick  623.3 
3 Mike  611.5 
6 Tusar 578.6 
; 
RUN;
DATA NON_ITDEPT; 
  INPUT  empid 1-2 ename $ 3-9  salary 10-16 ; 
DATALINES; 
2 Dan    515.2 
4 Ryan   729.1 
5 Gary   843.25
7 Pranab 632.8 
8 Rasmi  722.5 
RUN; 
DATA All_Dept; 
	 LENGTH ename $ 7   ;
  SET ITDEPT  NON_ITDEPT ; 
RUN; 
PROC PRINT DATA=All_Dept; 
RUN;  

When executing the above code, we can get the following output.

The SAS connection dataset