MATLAB for Data Processing and Visualization

1. Getting Started

Summary: Getting Started with the Data

2. Preprocessing Data

Review – Preprocessing Data

Missing Data

When you import data into MATLAB, missing numerical values are replaced with NaN, which stands for Not a Number.		data = readtable(“myfile”)data = 4×2 table Var1 Var2 ____ ____ 7 0.81 1 NaN 9 0.13 10 0.91
When you calculate statistics on arrays that contain NaNs, the result in another NaN.		v = mean(data.Var2)v = NaN
To ignore NaNs in the calculation, use the `"omitnan"` flag.		v = mean(data.Var2,”omitnan”)v = 0.6167
You can delete rows containing missing data with `rmmissing`.		cleaned = rmmissing(data)cleaned = 3×2 table Var1 Var2 ____ ____ 7 0.81 9 0.13 10 0.91

Categories and Sets

Categorical arrays use less less memory and work with many plotting functions.		x = categorical([“medium” “large” “large” “red” “small” “red”]);
Use the `categories` function to get a list of unique categories.		c = categories(x)c = 4×1 cell array {‘large’ } {‘medium’} {‘red’ } {‘small’ }
Merge different categories with the `mergecats` function.		x = mergecats(x,[“small” “medium” “large”],”size”)x = 1×6 categorical array size size size red size red
Rename categories with the `renamecats` function.		x = renamecats(x,”red”,”color”)x = 1×6 categorical array size size size color size color

Discretizing Continuous Data

Ranges in continuous data can represent categories. Categorize continues data into discrete bins with the discretize function.

>> y = discretize(X,edges,"Categorical",cats)

`y`	If the `"Categorical"` option is set, `y` is a categorical array. Otherwise, `y` is numeric bin values.

`X`	Array of continuous data. `X` is usually numeric or datetime.
`edges`	Consecutive elements in `edges` form discrete bins. There will be one fewer bins than the number of edges specified. You can use `inf` in `edges` to create a bin with no edge.
`"Categorical",cats`	Optional input for the name of each bin category.

3. Graphics Formatting Functions

Review – Graphics Formatting Functions

plot(x,y)

As a reminder, here is a plot with default properties. There are no markers and the line color is blue.

4. Review Project 1

5. Importing Data from Multiple Files

Review – Importing Data from Multiple Files

A datastore is a reference to a file or set of files. The datastore function informs where to find the files.

Code	Description
ds = datastore(filename)	Reference a single file
ds = datastore(directory)	Reference a folder of files
data = read(ds)	Read data incrementally
data = readall(ds)	Read all data referenced in datastore

If your data isn’t formatted the way datastore expects, you can set the datastore properties. Examples of common properties are shown below. You can find all the properties in the the documentation.

>> ds = datastore(filename,"Delimiter","-","TextscanFormats","%D%C%f","SelectedVariableNames",var)

`ds`	Reference to a collection of data.

`filename`	File location.
`"Delimiter","-"`	Delimiter is one or more characters that separate data values in the file.
`"TextscanFormats","%D%C%f"`	Import variables using the output class in the format specification string.
`"SelectedVariableNames",var`	Import only the variables listed in `var`.

Merging Data

Once you read in multiple tables, you may want to join them together. You can join two tables in many ways. The various join functions are listed in the table below.

6. Analyzing Groups within Data

Review – Analyzing Groups within Data

7. Customizing Graphics Objects

Review – Customizing Graphics Objects

All graphics objects are part of a hierarchy. Most graphics objects consist of a figure window, containing one or more axes, which contain any number of plot objects.

You can use the graphics object hierarchy to modify specific graphics objects after a plot is created.

If you stored a handle to Figure, you could use the Children properties to modify the Bar plot.

8. Review Project 2

9. Images and 3-D Surface Plots

Review – Images and 3-D Surface Plots

10. Importing Unstructured Data

Review – Importing Unstructured Data

To import data from files where the formatting changes and must be inferred from the data itself, you can use functions that allow you to interact directly with files.

Open the file and store the file identifier. You’ll use `fid` with the other low-level import functions.		fid = fopen(“myfile”);
You can import files line-by-line using `fgetl`. There is a file position indicator that keeps track of where you’re located in the file, so calling `fgetl` twice will return the first two lines.		firstLine = fgetl(fid)firstLine = ’09/12/2005 Level1 12.34 45 1.23e10 inf’secondLine = fgetl(fid)secondLine = ’10/12/2005 Level2 23.54 60 9e19 -inf 0.001′
To return back to the beginning of the file, you can rewind the file position indicator.		frewind(fid)
If you know the format of the data, you can pass a format specification string to `textscan`.		formatSpec = “%{MM/dd/uuuu}D %s %f32 %d8 %u %f”; myData = textscan(fid, formatSpec)myData = 1×9 cell array {3×1 datetime} {3×1 cell} {3×1 single} {3×1 int8} {3×1 uint32} {3×1 double}
When you’re finished importing, make sure you close the file connection.		fclose(fid);

11. Conclusion

Pages: 1 2 3 4 5