Data maybe structured or unstructured and in different formats and locations (Excel, SAS, etc).
Structured data has defined rows and columns that SAS knows how to read. For example, SAS, Microsoft Access, Excel, Oracle, Teradata, Hadoop, etc. SAS has engines that enable SAS to read structured data.
Unstructured data doesn’t have defined columns. Tab-delimited data may appear to be in columns, but it’s not really. For example, text, delimited, JSON, weblogs, etc. Data must be imported into SAS.
SAS Table
Structured data with defined rows and columns.
Extension of .sas7bdat
Two portions of table: Descriptor (metadata or properties) – table name, number of rows, data/time created, column names, column attributes. The data portion – data values stored in columns.
In SAS, a table is also called a data set. A column can also be called a variable. A row can also be called an observation.
Column attributes – in order to be defined
Name – 1 to 32 chars long; start with letter or underscore; continues with letters, numbers or underscores. Uppercase, lowercase, or mixed case. Ideally, stick to one naming convention.
Type – Character (letters; numbers; special characters, blanks) and Numeric (digits 0 to 9; minus sign; decimal point; scientific notation). SAS Dates are of numeric and start with 01Jan1960 (positive) and negative before that date.
Length – number of bytes allocated to store column values. Numeric is default to 8 bytes (16 sig digits). Characters are 1 to 32,767 bytes with one byte being one character.
Listing table and column attributes
Can use Proc contents data against the data set to examine values.
PROC CONTENTS DATA=data-set;
RUN;
Accessing Data thru Libraries
-It’s all about location and filetype. By using libraries, you can resolve many issues that would otherwise be hard-coding within code.
-Creating a library: LIBNAME libref engine “path”;
libref – is the library name
engine – name of behind scenes engine for reading structured data – Base; Excel; Teradata; Hadoop, etc
path – location (physical path)
Note: Location is relative to where SAS is running
Note: After setting up the library:
proc contents data=mylib.class;
run;
-Various libraries:
Work – temporary for session
SASHELP – same tables and other files
options validvarname=v7; – conforms standards within SAS of Excel formatting
libname xlstorm xlsx “s:/filepathname/storm.xlsx”;– xlstorm library and xlsx engine and path
proc contents data=xlstorm.storm_summary;
run;
libname xlstorm clear; — clears library
Importing Data into SAS (including unstructured)
-Unstructured needs to be structured first before going into SAS program
-You enter the type of data file
-You can accept default of SAS guessing first 20 rows to resolve column name and lengths, or tell it to review all records.
-If the source XCL file changes, you just re-run the proc import step