Matlab & Simulink Advanced Programming

MATLAB for Data Processing and Visualization

1. Getting Started

Summary: Getting Started with the Data

The readtable function creates a table in MATLAB from a data file.

Additional inputs can help import irregularly formatted files.
hurrs = readtable(“hurricaneData1990s.txt”,… “NumHeaderLines”,5,”CommentStyle”,”##”);
You can use dot notation to access variables in a table. The scatter function creates a scatter plot of two vectors.scatter(hurrs.Windspeed,hurrs.Pressure)
The categorical function creates a categorical array from data.hurrs.Country = categorical(hurrs.Country);
By default, the readtable function may import certain variables in the table as datetime.

In this example, hurrs.Timestamp is a datetime.
t = hurrs.Timestamp;
The hour function returns the hour numbers of the input datetime values.h = hour(t); histogram(h)

2. Preprocessing Data

Review – Preprocessing Data

Missing Data

When you import data into MATLAB, missing numerical values are replaced with NaN, which stands for Not a Number.data = readtable(“myfile”)data = 4×2 table Var1 Var2 ____ ____ 7 0.81 1 NaN 9 0.13 10 0.91
When you calculate statistics on arrays that contain NaNs, the result in another NaN.v = mean(data.Var2)v = NaN
To ignore NaNs in the calculation, use the "omitnan" flag.v = mean(data.Var2,”omitnan”)v = 0.6167
You can delete rows containing missing data with rmmissing.cleaned = rmmissing(data)cleaned = 3×2 table Var1 Var2 ____ ____ 7 0.81 9 0.13 10 0.91

Categories and Sets

Categorical arrays use less less memory and work with many plotting functions.x = categorical([“medium” “large” “large” “red” “small” “red”]);
Use the categories function to get a list of unique categories.c = categories(x)c = 4×1 cell array {‘large’ } {‘medium’} {‘red’ } {‘small’ }
Merge different categories with the mergecats function.x = mergecats(x,[“small” “medium” “large”],”size”)x = 1×6 categorical array size size size red size red
Rename categories with the renamecats function.x = renamecats(x,”red”,”color”)x = 1×6 categorical array size size size color size color

Discretizing Continuous Data

Ranges in continuous data can represent categories. Categorize continues data into discrete bins with the discretize function.

>> y = discretize(X,edges,"Categorical",cats)
yIf the "Categorical" option is set, y is a categorical array. Otherwise, y is numeric bin values.
XArray of continuous data. X is usually numeric or datetime.
edgesConsecutive elements in edges form discrete bins. There will be one fewer bins than the number of edges specified.

You can use inf in edges to create a bin with no edge.
"Categorical",catsOptional input for the name of each bin category.

3. Graphics Formatting Functions

Review – Graphics Formatting Functions

plot(x,y)











As a reminder, here is a plot with default properties. There are no markers and the line color is blue.


4. Review Project 1


5. Importing Data from Multiple Files

Review – Importing Data from Multiple Files

A datastore is a reference to a file or set of files. The datastore function informs where to find the files.

CodeDescription
ds = datastore(filename)Reference a single file
ds = datastore(directory)Reference a folder of files
data = read(ds)Read data incrementally
data = readall(ds)Read all data referenced in datastore

If your data isn’t formatted the way datastore expects, you can set the datastore properties. Examples of common properties are shown below. You can find all the properties in the the documentation.

>> ds = datastore(filename,"Delimiter","-","TextscanFormats","%D%C%f","SelectedVariableNames",var)
dsReference to a collection of data.
filenameFile location.
"Delimiter","-"Delimiter is one or more characters that separate data values in the file.
"TextscanFormats","%D%C%f"Import variables using the output class in the format specification string.
"SelectedVariableNames",varImport only the variables listed in var.

Merging Data

Once you read in multiple tables, you may want to join them together. You can join two tables in many ways. The various join functions are listed in the table below.

FunctionExample
join

Key1 in Tright must have unique values and contain every key in Tleft.
innerjoin
outerjoin

Two key variables are created.
outerjoin with "MergeKeys" on

6. Analyzing Groups within Data

Review – Analyzing Groups within Data

The table petdata has two categorical variables, Species and Color.

Using these two variables, there are five potential groups:Orange catOrange fishBlack catBlack fishWhite cat
petdata = readtable(“petdata.txt”,”Format”,”%C%C%f”) 5×3 table Species Color Weight _______ ______ ______ cat orange 12 fish orange 0.68 cat black 14 cat white 8 fish black 0.54
The findgroups function will return a group number for each element in an array.

The second output is the name associated with each group number. Here, the value 1 means cat.
[grpS,speciesVals] = findgroups(petdata.Species)grpS = 1 2 1 1 2 speciesVals = 2×1 categorical array cat fish
The splitapply function will peform a calculation on each inputted group.

You can interpret this code as “What is the average weight of each species?”
splitapply(@mean,Weight,grpS)ans = 11.3333 0.6100
findgroups and splitapply are commonly used together. This code answers “What is the minimum weight of each color?”

Notice that grpC has values 1, 2, and 3 because there are three different colors in the data. colorVals contains the meaning for each value.
[grpC,colorVals] = findgroups(petdata.Color) splitapply(@min,Weight,grpC)grpC = 2 2 1 3 1 colorVals = 3×1 categorical array black orange white ans = 0.5400 0.6800 8.0000
accumarray calculates a value for all five potential groups.

The first input is an array containing both group numbers. The first vector (grpS) corresponds to the output rows, and the second vector (grpC) corresponds to the output colulmns.

Notice that the element in the second column, third row (white fish) is 0 because there’s no data in that group.
maxWeight = accumarray([grpS grpC],Weight,[],@max)maxWeight = 14.0000 12.0000 8.0000 0.5400 0.6800 0
The output of accumarray can be difficult to interpret on its own, but the format is convenient for visualizations or further processing.

For example, the output can be passed directly to the bar function.
bar(maxWeight) xticklabels(speciesVals) ylabel(“Weight”) legend(colorVals)

7. Customizing Graphics Objects

Review – Customizing Graphics Objects

All graphics objects are part of a hierarchy. Most graphics objects consist of a figure window, containing one or more axes, which contain any number of plot objects.

You can use the graphics object hierarchy to modify specific graphics objects after a plot is created.

If you stored a handle to Figure, you could use the Children properties to modify the Bar plot.


8. Review Project 2


9. Images and 3-D Surface Plots

Review – Images and 3-D Surface Plots

Images or 3-D plots generally begin with x, y, and z data. In many cases, the x and y data are not evenly spaced on a grid.data = readtable(“my3Ddata”) plot3(data.x,data.y,data.z,’.’) x y z _________ ________ ___________ 2.2506 -0.30105 0.012974 -1.3443 -0.79976 -0.11638 0.53421 -0.92891 0.16945 -0.070088 -0.67461 -0.044245 … … …
To interpolate the data onto a grid, start by defining the grid points. Here, yvec is denser than xvec.xvec = -2:.2:2; yvec = -2:.05:2;
The meshgrid function will convert your vectors into the grid expected by surf and pcolor.[xgrid,ygrid] = meshgrid(xvec,yvec);
Then use the griddata function to interpolate your data onto the grid.

Consistent naming of your variables from previous steps will the griddata syntax easier.
zgrid = griddata(data.x,data.y,data.z,xgrid,ygrid);
Once your x, y, and z data is gridded, you can visualize it in a variety of ways. surf creates a surface plot.

Notice the difference between the x and y axes. This is because xvec and yvec had a different number of grid points.
surf(xgrid,ygrid,zgrid);
You can also visualize your 3-D data as an pseudocolor image.im = pcolor(xgrid,ygrid,zgrid); im.EdgeAlpha = 0;
This scaled image contains the same data, but the first two inputs are the vectors of grid points instead of the output from meshgrid.

If you inspect the right yellow shape, you can see that the imagesc plot is vertically flipped from the pcolor plot.
imagesc(xvec,yvec,zgrid);

10. Importing Unstructured Data

Review – Importing Unstructured Data

To import data from files where the formatting changes and must be inferred from the data itself, you can use functions that allow you to interact directly with files.

Open the file and store the file identifier. You’ll use fid with the other low-level import functions.fid = fopen(“myfile”);
You can import files line-by-line using fgetl.

There is a file position indicator that keeps track of where you’re located in the file, so calling fgetl twice will return the first two lines.
firstLine = fgetl(fid)firstLine = ’09/12/2005 Level1 12.34 45 1.23e10 inf’secondLine = fgetl(fid)secondLine = ’10/12/2005 Level2 23.54 60 9e19 -inf 0.001′
To return back to the beginning of the file, you can rewind the file position indicator.frewind(fid)
If you know the format of the data, you can pass a format specification string to textscan.formatSpec = “%{MM/dd/uuuu}D %s %f32 %d8 %u %f”; myData = textscan(fid, formatSpec)myData = 1×9 cell array {3×1 datetime} {3×1 cell} {3×1 single} {3×1 int8} {3×1 uint32} {3×1 double}
When you’re finished importing, make sure you close the file connection.fclose(fid);

11. Conclusion

Pages: 1 2 3 4 5

Create Account



Log In Your Account