West Lab Journal Club: June 2011

Wednesday, June 29, 2011

Plot Tutorial

Monday, June 27, 2011

RuCCS - Matlab/Psychtoolbox Course for Beginners

For individuals who are highly self-motivated, I have located the link for a Matlab Course that the Cognitive Science Department at Rutgers University offered last summer.

There are videos, PowerPoints, and Matlab scripts for most, if not all, of the lectures under the "Course Materials". These lectures provide a well-rounded introduction to Matlab before getting into some of the specifics of the Psych Toolbox. While many of the exercises using the Psych Toolbox are somewhat removed from what you might encounter during your daily data analysis, they are a fun and challenging way to practice working with programming loops and logic.

RuCCS - Matlab/Psychtoolbox Course for Beginners - HOME/Announcements

A special Thanks to Hristiyan (Chris) Kourtev and Xiaotao Su Ph.D. at RUCCS for putting the course together.

NOTE: You will need to download the PsychToolbox to be able to complete some of the exercises in the course. This can be found at http://psychtoolbox.org/wikka.php?wakka=HomePage. You will need an internet connection to install the toolbox.

Sunday, June 26, 2011

PivotTable reports 101 - Excel - Office.com

One component of Microsoft Excel analysis that we will not be covering (at least not in the near future) is the Pivot Table function.

Microsoft Office has a quick tutorial for those of you that might be interested in learning more about these tables:

Pivot Table reports 101 - Excel - Office.com

Pivot Tables require some practice to understand. For this reason, I would recommend that you practice using them with a small data set and comparing your results to others that you plot, sum, count etc. on your own using your growing mastery of functions.

Microsoft Excel: Advanced Techniques for Data analysis

Many of the techniques to be discussed in the following lesson need to be practiced in order to be fully understood. Still, realizing that these tools exist, and learning the basics of their use is an important first step.

LOGICAL FUNCTIONS:

=IF( Logical test, Value if TRUE, value if False)
Checks whether a condition is met, and returns one value if the condition is true and other value if the condition is false.

=AND (Logical 1, Logical 2,…Logical n)
Checks whether all of the input arguments are TRUE, and returns “TRUE” if so, or “FALSE” if not.

= NOT()
Changes FALSE to TRUE and TRUE to FALSE (inverts the logic).

=OR(Logical 1, Logical 2,…Logical n)
Returns true if ANY of the input arguments are TRUE.

Understanding the use of logical functions takes practice. The best method for learning these logical functions is to use simple math to examine how the function is working. Still, you should remember that you can also use strings, or combinations of numbers and strings, (See NOT example below). Here are a few examples of the main logical functions:

NOTE: Most logical functions automatically produce a “TRUE” or “FALSE” output. However, with the “=IF( )” function, the values that are returned are custom. Here I have chosen to use true and false, but I could have just as easily used any other phrases I desired. Fortunately, if you wish to return Excel's true or false logic, you can simply nest the functions =true and =false inside the if function. This would simply be accomplished by removing the "'s.

CONDITIONAL FORMATTING:

One fun feature of Excel that can be used for a wide variety of tasks is conditional formatting. Conditional formatting can be accessed under the “Home” tab of MSExcel 2007-2010 under the “Styles” section.

Essentially, conditional formatting uses logic to highlight or color code certain bits of data. This can be useful for visually separating certain cells from others. A few ways one might use this is:

1) To highlight errors or odd values in the data (i.e. checking data)
2) To separate categories of a certain variable for easy viewing.

The menu for conditional formatting gives you a number of methods for deciding which data to color code or highlight. For instance, you could color code all numbers greater than 50, or all cells where the strings match your input string. In the example below, the conditional formatting has been used to color code the different experimental subjects by using the “Conditional Formatting”/”Highlight Cell Rules”/”Text that contains” option. In some cases, you will need to create multiple “rules”.

SORT OPTIONS:
The next function that is useful for data analysis is the “sort” function. In Excel 2007-2010 this can be found under the “Data” tab (1) under the “Sort & Filter section” (2). Under this section, there are two quick sort buttons (A->Z; Z->A) as well as a Sort button/icon that will display a pop-up menu to create more complex sorting (3). Importantly, the quick sort buttons will either sort a single column of data or sort a larger data set by the first column of your data (if the entire data set is selected). Still, for files with multiple columns, the pop-up menu is a far more versatile and prudent option.

Once the pop-up menu has been opened (3), you can “Add Level(s)” to the filtering. This effectively allows you to sort the data by as many variables as you choose. In the above example, the data is first sorted by the column named “Type Code” and then by the column named “Call Type”.

FILTERING:

Another option for organizing data and extracting meaningful information is Excel’s filtering tool. This tool can be found under the “Data” tab in the “Sort and Filter” section.

After selecting the filter option, a pull-down menu will appear at the top of each column. When the pull-down menu for a given column is selected, the menu will display a set of Sort and Filtering options.

While it is worth exploring the sub-menus, most filtering can be done by using the check boxes that appear under the pull-down menu. These checkboxes generally display all of the “unique” values that occur in the column and allow you to select one or more of these values to display. For example, when the check box for “TRILL” is selected above, the filter hides the rows containing the other two values in the column(FF and FM) and displays only the data that are categorized as a TRILL(see below).

Note: The filtering is NOT case sensitive.

Importantly, the other data are not removed from the worksheet, but instead removed from the display. Thus, the row numbers for the filtered cells (and un-filtered cells) are maintained. Unfortunately, this also means that when using functions and formulas, one must either select the individual cells or copy the post-filter results to another spreadsheet in order to safely work with the data. While Excel is designed to allow you to work with the data in the filtered worksheet, small errors and glitches can prove costly if you neglect to copy the filtered data into its own worksheet.

CALCULATING STANDARD ERROR OF THE MEAN:

When examining group data in our laboratory, we often need to calculate two parameters of the data: 1) a measure of central tendency (generally the mean) and 2) a measure of the variance (commonly the standard error).

Measures of central tendency can be easily derived in excel using functions for the mean, median, mode, etc. Furthermore, one measure of variance—the standard deviation—can also be calculated using the MSExcel function “=Stdev()”.

Still, given that the standard error of the mean is normalized by the number of subjects/observations, many consider it to be a more appropriate measure of variance. Fortunately, the standard error (which cannot be calculated using an Excel function) and the standard deviation (which can be calculated using the Stdev function) are intimately related. The formula for the standard error is as follows:

SEM= Standard deviation/sqrt(n)

Where:
SEM- is the standard error of the mean
Sqrt- is the square root of the value in the parentheses
n- is the number of subjects or observations

Knowing this, we can easily calculate the SEM in excel with a nested function:

=stdev(A1:A10)/sqrt(n)

OR

=stdev(A1:A10)/sqrt(count(A1:A10))

The first example involves manually inputting the number of subjects or observations, while the second example counts the number of cells for which you are taking the standard deviation (which is generally the number of observations you will use for n). Thus, by nesting the =stdev, =sqrt, and =count functions in excel, you can easily calculate the standard error of the mean.

NOTE: There are many instances where Excel is able to “calculate” the standard error automatically. The most common instance of this is when adding error bars to plots. While each instance is different, Microsoft Excel is often using different numbers to calculate the standard error than those that should be used. Therefore, the prudent researcher will calculate these values by hand to ensure their accuracy.

PLOTTING:

Plotting in Excel is useful for a quick look at data. We will spend some time in the future discussing plotting in general and plotting in other programs (E.g. Matlab and Sigma Plot). However, here are a few key points:

1) Graphs are not “one size fits all”. Certain plots are more appropriate for certain types of data. For instance, the categorical counts on the top left are best fit by a bar graph, but would be poorly represented by the scatter plot on the right, whereas the pairs of data on the bottom left would not fit well into a bar plot.

2) If data is set up correctly, Microsoft Excel is able to accurately label the graph automatically (see the upper left data and the bar plot).

3) Nearly every component of a plot can be selected and modified. In the Excel 2007-2010, most of the options for charts are contained under a series of “Chart Tools” tabs that appear whenever a chart is selected.

THE CANONICAL FORM: PREPARING DATA FOR MATLAB

Canonical- “Conforming to orthodox or well-established rules or patterns, as of procedure.”

For our purpose, this means that we will prepare data to be easily transferred to Matlab, SPSS, SAS, or any other program. Most programs can handle numerical data, but not all programs can adequately handle text/strings. Moreover, while some programs can handle strings, the use of strings often slows computations. Thus, the canonical form for data analysis is defined as an array of data wherein all of the data is numerical with the possible exception of one row of headers. Most often, this is accomplished by incorporating one observation per row.

This is best accomplished by creating your own “Rosetta Stone” for the data. Specifically, you can create one Excel spreadsheet containing the data with both the strings and their respective numerical coding and a second sheet that is free of all strings. For some programs (e.g. Matlab and SPSS) it is appropriate to leave the column headers in the second sheet.

The creation of the first sheet is most easily accomplished by inserting rows into Excel and subsequently using Excel functions to convert strings to data. The second sheet can be easily created by using “Paste Special/Values” in a new sheet and then deleting columns that contain strings.

Some commands that might prove helpful for converting your data to the canonical form are:
=IF,=Left,=Right,=Middle, or =Replace.

Note: Some text commands need to be nested within the =Value command in order to transform numbers in text form to numerical form.

CHALLENGE:

Find a set of data that has at least 3 subjects and at least one other variable with 2 or more levels. Then, complete the following steps, making sure to take a screen shot of your success at each stage:

1) use the Sort function to sort the data by subject, 2) use filtering to filter out one level of the independent variable of interest, 3) plot the data you have filtered using a graph of your choice and finally 4) repeat this process for the second level of the independent variable.
2) Once your plots have been created convert your dataset to the canonical form using one or more new excel sheets.

DON'T FORGET TO SUBMIT YOUR COMPLETED CHALLENGES SO THAT THEY CAN BE SHARED ON THE BLOG!

Wednesday, June 22, 2011

Barker's Code

Following Suit with Sisi, I thought it would be best to provide an example that works with text rather than numbers.

As we have said a number of times now, strings are often more difficult to work with, and yet they are often more intuitive than if you were to use a series of numbers to describe your data. For instance, I could describe two groups of rats as "0" and "1" or "low-dose" and "high-dose". To an individual than knows the code, both are essentially the same. However, to the naive onlooker, the amount of information provided by the second example is far greater.

Learning to work with strings is a powerful tool for data analysis. Moreover, as we continue to work with Matlab and Excel, you will likely realize that Matlab is not as "string-friendly" as excel, and that tasks involving strings are often better accomplished in excel. In fact, we will often use excel to convert strings to numbers before importing the data into Matlab.

The nested code I have written for you is as follows:

'=IF(AND(ISTEXT(A5),ISTEXT(B5)),CONCATENATE(A5," ",B5),"ERROR!")'

Lets break this down into a few pieces.

1) The deepest layer of this function is the "ISTEXT" command. Like many logical commands in excel this is designed to return either TRUE or FALSE, depending on whether or not a given criterion has been met.

In this case, ISTEXT is simply asking "Is the content of the referenced cell text?" and returning TRUE if the answer is YES, and FALSE if the answer is NO.

2) The two instances of ISTEXT in the above formula are embedded within the "AND" function. This is another logical function that returns TRUE if both values inside of the function are TRUE, and FALSE if either one, or both of the functions are FALSE.

To give you a more colloquial example, the function (in my formula) is asking "Was the answer to BOTH the first AND second intext command TRUE?".

3) While the "IF" function is the outermost layer of my nested function, it needs to be explained before the last function in order to make the whole thing easier to comprehend.

The IF function has 3 inputs (i.e."=IF(INPUT1, INPUT2, INPUT3)").

The first input is your logical criterion. In my example, the logical criteria is the the AND function described above. Thus, if both cells A5 and B5 contain text, the value would be "TRUE" and the criterion has been met.

The second input is the value you want the function to return when the function is TRUE. In this case, I have designated that the concatenate function be called when the function is true. The result of this function is described below.

The last output is the value to return when the criterion has NOT been met (i.e. is FALSE).

For example, if I created the function "=IF(5=4,"TRUE","FALSE")", the formula would return the string FALSE, because the numbers 5 and 4 are not equal. However if I modified the formula to "=IF(5=5,"TRUE","FALSE")", the string TRUE would be returned.

Returning to the main example, the value I have designated if cells A5 and B5 are NOT both strings is "ERROR!".

4)Finally, the fourth function in my nested function is the "CONCATENATE" function. This function is used to combine strings together. Therefore, when the criterion for the IF function is TRUE, the CONCATENATE function combines the strings in A5 and B5. To make things prettier I have even inserted a space by concatenating A5," ", and B5. As you can probably see, the middle string is nothing more than a space.

Sunday, June 19, 2011

Sisi's Code and Some Tips

Hi, Guys,

This is Sisi. I am posting my code, "The Awesome Detector" (Find a specific string, if the string is detected return one text, if not return another text).

Generally speaking, when I am making a program, I always ask myself three questions:

1, What information/data do I have at hand? i.e. What are my inputs?

2, What do I want from those information/data? i.e. What should be my outputs?

3, How can I get from the inputs to the outputs?

The answer to the first question sets your current location, the answer to the second question sets your goal/destination. The answer to the third question defines the route connecting the current location to the destination, which in most cases, is not unique. It is nice if we can find a fast (require less time) and cheap (less computing resource involved) way, but for most things we do in the lab, I would say, as long as your can get there!

Answering the third question require some thoughts. You can either go from the inputs, gradually approaching the outputs; or, start from outputs, and ask what do I need to get there, and back track to the inputs; or, starts from both ends, and hope to meet in the middle...

Take the following task as an example (essentially a text finding task ):

Inputs: sentences. Excel consider this type of input as the format of text (in other programs, they can be defined as string format or character format). The format of your input is important, since it restrict the operation you can perform. We will keep coming back to this point later.

Output: There are two possible outputs: (1) Awesome. If "sisi" appears in the input (2) :(. If "sisi" does not appear in the input. This is way I call this program "The Awesome Detector"

Algorithm (route connecting Input and Output) :

(1) Make the input upper case. We will search for the word "sisi" in the input, since most of the text search functions are case sensitive, we force the input to be uppercase to make the search step easier. function Upper is used.

When you Click enter, SISI LOVE CANDY (should be loves, excuse the foreigner), will appear in A5.

(2) Check if "SISI" appears in the converted input string. I used find function as following: =FIND("SISI",C5,1), when click enter output will showup as 1, indicating the text "SISI LOVE CANDY" matches "SISI" from the first position.

(3) get rid of some bugs,the find function will return "#VALUE!" error, if "SISI" did not appear in the input text, which looks bad. we'll fix it using iferror function. IFERROR fuction takes the index of the cell as argument, return true if an error indeed appear, otherwise will return false as output.

In the figure on the left hand side, cell E1 contains the ISERROR function (check out the fx bar on top right), it tests if D1 contains an error, which does, so in E1 is output is TRUE. Similarly, Cell E2 contains uses function "=ISERROR(D2)". D2 contains the number 6 (which means "SISI" appears at the 6th position in C2), so E2 shows up as FALSE, meaning D2 does not contain a error.

(4) Final move. In the figure above Column A contains the inputs (one per row), Column C makes text in Column A in to upper case by UPPER function (in the corresponding row, same in the following columns). Column D test if "SISI" appears in column C, return the position of "SISI" if it appears, return "#VALUE!" if it does not appear. Column E check if the corresponding row of Column D contains error, if have error return TRUE, otherwise return FALSE. Finally, remember the goal, we want to return AWESOME if "SISI" appears in the input (in this situation column E should have the value FALSE). We can simply use a IF function : =IF(E1,":(","AWESOME"). As shown in the following figure. the first argument in if function is a logical value (TRUE or FALSE), second argument is the output when the logic value is true, third argument is the output when the logic value is false. Therefore in our case F1 returns :(, since E1 is TRUE, whereas F2 returns AWESOME since E2 is FALSE.

Packaging: So, we completed our goal in the above 4 steps. Now, it is the time to package the process into a nice compact form, using function nesting as the following: =IF(ISERROR(FIND("SISI",UPPER(A1),1)),":(","AWESOME"), A1 is the cell where the input is written.

To summarize, getting output step by step makes trouble shooting easier, however, packing using nested functions gives you compact results and a clearer spreadsheet.

Dave's Code

Microsoft Excel Basics for Data Analysis:

First and foremost, it should be said that the current tutorial is meant to provide instructions that allow an individual to conduct basic data analysis. Most of the instructions are specifically tailored to this purpose. Excel tutorials for a wide range of topics can be found through Microsoft:

http://office.microsoft.com/en-us/excel/

http://office.microsoft.com/en-us/excel-help/

I would encourage anyone who wants to get better at excel to sit down with a few of these tutorials.

Microsoft excel is broken down into rows and columns. Rows are horizontal and designated by numbers, while columns are vertical and represented by letters. Together these create a coordinate system that allows you to reference a single cell or group of cells. For example the selected cell in the screenshot above would be cell “J9”. You can think of these coordinates like a game of battleship.

A cell is represented by a single box (Single coordinate) and can contain multiple types of data: numbers, strings, or functions.
A number is any numeric entity- e.g. zero, negative, positive, fraction, etc.
A string is a series or characters or symbols chosen from a given alphabet. While a string can contain numeric entities, it can also contain letters and other symbols. For example, “AWESOME” and “AW3$0ME” would both be examples of strings.

A function uses either mathematical formulas or logic to produce an output/return. In excel these can be mathematical equations (e.g. “=2+4”), but many are also designated by a string (e.g. “=ABS( )” to calculate the absolute value of a given input; “ABS” being the string). In essence, both are calling some basic mathematical function, logical function, or a combination of the two. Note: The inputs to a function are called arguments/inputs, while the value(s) returned by a function are called outputs/returns.

A complete list of Excel functions/formulas can be found by clicking the fx button. Mathematical functions/formulas follow standard notation (+,-,*,/,^) and order of operations.

You should also note that the blank section to the right of the fx symbol is used for editing functions/formulas. Alternatively, functions/formulas can also be edited by typing directly into a cell. In fact, newer versions of excel have pop-up windows to help you remember the names of functions/formulas and enter them correctly(see below). This window even provides a brief description of the selected function.

While numbers and strings can be directly inserted into a cell, functions/formulas always begin with an equals sign “=”. Microsoft excel will also tell you what values should be inserted into functions/formulas.

To put it another way, when considering numbers and strings, “what you see is what you get”. On the other hand, the value that you see for a cell with a function is the final output or returned value of that function/equation.

Note that in a functions/formula, the inputs can either be directly entered (e.g. =2+4) or they can be references to other cells. For example, cells B2 (column B, row 2) and C2 (column C, Row 2) are being referenced and added together in the above example for “Math”. Alternatively, a series of cells can be referenced by indicating the range of cells. This can be seen in “Excel functions” where cells B4:C4 (read B4 through C4) are being referenced. Finally, a list of discontinuous cells can be inserted into a function/formula by using a comma separator (e.g. “=AVERAGE(C5,D7,J22)”).
Finally, functions can be nested within other functions. There can be many layers to this “nesting”, but a good rule of thumb is to avoid nesting more than 7 functions at once. The limit to the number of functions that one can nest depends on the version of Excel.

An example of nesting functions would be to find the average of a series of negative numbers (e.g. -24, -25, and -26, the average of which is -25) and then take the absolute value of the average(+25). This could be accomplished by nesting the AVERAGE function within the ABS function.

=ABS(AVERAGE(-24,-25,-26))

Note: When nesting functions, only the first function is preceded by an equals sign.

On some occasions, it becomes necessary to fix a row, column, or both in a formula. This can be accomplished by “pinning”. This is accomplished by using the “$” before either the row number, column letter, or both.
For example, if I wanted to multiply all of the cells in the second row by a value in A1 (e.g. the drug dose of an infusion), I could used a fixed reference for A1 and a moving reference for values in the second row. In Microsoft excel, the fixed references are known as “absolute references”, while the moving references are known as “Relative references”.

Finally, you may find an instance where you need to reference another worksheet. This is accomplished by referencing the worksheet name, followed by “!”, and then the cell(s) coordinate(s) you want to reference in that sheet.

E.g. “=ABS(Sheet1!A9)”

WEEKLY CHALLENGE- CREATE A NESTED FUNCTION THAT CONTAINS A TOTAL OF FOUR FUNCTIONS
e.g. =Function(Function(Function(Function)))

We will begin working with Matlab in the weeks to come. For those of you that are interested in learning more about Matlab, I recommend two books:

For beginners:

More than you could ask for:

Wednesday, June 15, 2011

Common Excel Functions, Tips, and HotKeys

Common Functions for Microsoft Excel:

MATH FUNCTIONS:

=SUM( Number 1,Number 2,…Number N)

Computes the sum of a group of numbers

=COUNT(Value1, …Value N)

Counts the number of cells that contain a number

=MAX(Value 1,…Value N)

Calculates the maximum value out of a set of numbers

=MIN(Value 1,…Value N)

Calculates the minimum value out of a set of numbers

=AVERAGE(Value 1,…Value N)

Returns the (arithmetic) mean of a set of numbers.

=MEDIAN(Value 1,…Value N)

Returns the median of a set of numbers

=MODE(Value 1,…Value N)

Returns the mode (most frequently occurring number) of a set of numbers

=ABS(Value)Returns the absolute value of a given number

=ROUND(Value, # of decimal places)

Rounds a given number to the nearest specified decimal place; Round(Number, 0) will round to the nearest integer.

=ROUNDUP(Value, # of decimal places)

Rounds away from zero to the next specified decimal place; Negative numbers round lower and positive numbers round higher.

=ROUNDDOWN(Value, # of decimal places)

Rounds towards zero to the next specified decimal place

=EXP(VALUE)

Returns e raised to the power of a given number

=LN or LOG or LOG10(Value, Base)

Returns the natural log, log base 10, or log with a specified base

=STDEV(Value 1,…ValueN)

Returns the standard deviation of a set of numbers

=SQRT(value)

Returns the square root of a given value

=COUNTIF(range, criteria)

Counts the number of cells within a given range that meet the specified criteria

STRING FUNCTIONS:

=CONCATENATE(Value1,…Value N)

Combines strings together into one string

=COUNTIF(range, criteria)

Counts the number of cells within a given range that meet the specified criteria

=LEFT(text, # of characters)

Returns the leftmost X number of characters in a string

=RIGHT(text, # of characters)

Returns the rightmost X number of characters in a string

=VALUE(text)

Converts a number in string format to its numerical value; for instance "2"(string) can be converted to 2 (numerical).

=UPPER(Text)

Converts a string to upper case

=LOWER(Text)

Converts a string to lower case

LOGICAL FUNCTIONS:

=IF( Logical test, Value if TRUE, value if False)

=AND

= NOT

=OR

We will discuss the logical functions more in a later lesson. If you are curious you can look at the excel help and try to figure them out on your own.

Tips for working with data:

When selecting continuous data, the shift [SHIFT] key can be held to highlight a series of continuous rows/columns. On the other hand, discontinuous cells can be selected by holding control [CTRL] and then selecting each section of the spreadsheet.
Using the end key [END] and arrow keys you can easily navigate to the extreme top, bottom, left or right of a dataset. This can be combined with the shift key to select all data from one point to the start/end of a row or column [SHIFT+END+ARROW].
Finding values or finding and replacing values can be accomplished by clicking [CTRL+F] and using either the find or replace tab. This can be very useful for navigating the data.

Some other hotkeys:

CTRL+C- Copy

CTRL+X- Cut

CTRL+V Paste

CTRL+ALT+V- Opens paste special dialogue

CTRL+D- Fills a formula down

CTRL+Z –undo

CTRL+Y- Redo

CTRL+F- find/replace

CTRL+P- Print

F4- repeats last command or action

Still having trouble? Excel 2007 For Dummies

Pages - Menu