An exciting month!

Posted in Statistics

Hi all!

It has been quite an exciting month. I promise to post soon on cleaning microfiber couches, safety tips for a joiner after my emergency room visit :-), and an old video I made as an undergraduate. In the meantime, for those statistically minded folks, I’m posting a link to my most recent publication:

http://www.tandfonline.com/eprint/tasmrCgvdJ92eM8fbE7d/full

I’m thrilled to see this finally in print. It has taken three years and about seven rounds and provisions but it’s finally here! The link above will give free access to the first 50 people who access it. Enjoy!

Color-coding groups for plots: string.to.colors function in fifer()

Posted in Statistics

I used to hate color-coding plots. 'Twas a big pain. Let's say we're trying to plot the relationship between awesomeness and attractiveness in R versus. First, let's read in the R know-how and awesomeness dataset.

Let's peak under the “head”, shall we?

require(fifer) d = read.csv("Awesomeness_Rknowhow.csv") head(d) ## Number.of.Friends R.Know.how Club ## 1 0.46841 -0.7202 Mat Black Labs ## 2 0.01932 0.3678 Mat Black Labs ## 3 0.68488 -0.9006 Mat Black Labs ## 4 -1.15628 1.8706 Mat Black Labs ## 5 -0.76576 -0.1406 Mat Black Labs ## 6 0.15211 -1.4046 Mat Black Labs

Nice! And let's peak under the “tail.” (Okay, bad joke).

tail(d) ## Number.of.Friends R.Know.how Club ## 95 -0.6766 -1.4407 Rs-R-Us ## 96 -0.3809 -1.0437 Rs-R-Us ## 97 1.5243 2.4358 Rs-R-Us ## 98 0.3377 -0.3047 Rs-R-Us ## 99 -1.3506 -1.0900 Rs-R-Us ## 100 -1.4359 -1.6769 Rs-R-Us

Let's say we're at an uber geek convention where SAS, R, and Matlab users alike meet to…er…mingle and speak of common interests. Being the research-minded student you are, you decide to measure three traits of the convention participants: how many friends they have (okay, I know you can't have negative friends and you can't have a fraction of a friend. Stop being so critical!), how much they know about R, and which club they belong to–the Mat Black Labs or the R-R-Us(es). You then plot the relationship betwixt the two quantitative traits:

plot(d[,1:2], ylab="Number of Friends", xlab="R Know-How", xaxt="n", yaxt="n")

What a jumbled mess! Then you remember that you forgot you measured the two groups…but how to plot them. Why, let's color-code them!

This is where the string.to.color function comes in. It requires a vector of strings as inputs (and an optional vector of colors–one for each unique grouping value) and it will output a string of colors (the same length as the original string). Let's take a look:

#### let's look at that vector of strings (or factors) d$Club ## [1] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [5] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [9] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [13] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [17] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [21] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [25] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [29] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [33] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [37] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [41] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [45] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs ## [49] Mat Black Labs Mat Black Labs Rs-R-Us Rs-R-Us ## [53] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [57] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [61] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [65] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [69] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [73] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [77] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [81] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [85] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [89] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [93] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## [97] Rs-R-Us Rs-R-Us Rs-R-Us Rs-R-Us ## Levels: Mat Black Labs Rs-R-Us

And now let's see what string.to.colors does

string.to.colors(d$Club, col=c("red", "blue")) ## colors colors colors colors colors colors colors colors colors colors ## "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" ## colors colors colors colors colors colors colors colors colors colors ## "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" ## colors colors colors colors colors colors colors colors colors colors ## "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" ## colors colors colors colors colors colors colors colors colors colors ## "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" ## colors colors colors colors colors colors colors colors colors colors ## "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" ## colors colors colors colors colors colors colors colors colors colors ## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" ## colors colors colors colors colors colors colors colors colors colors ## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" ## colors colors colors colors colors colors colors colors colors colors ## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" ## colors colors colors colors colors colors colors colors colors colors ## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" ## colors colors colors colors colors colors colors colors colors colors ## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"

So all it does is replace all the values of “Rs-R-Us” with “blue” and all the values of “Mat Black Labs” with “red.”

Now we can put that into the plot to tell R how we wanna display it:

plot(d[,1:2], ylab="Number of Friends", xlab="R Know-How", xaxt="n", yaxt="n", col = string.to.colors(d$Club, col=c("orange", "purple"))) legend("topleft", legend=c("Mat Black Labs", "Rs-R-Us"), text.col=c("orange", "purple"), bty="n")

We can also “cheat” and use the string.to.colors function to use different symbols!

plot(d[,1:2], ylab="Number of Friends", xlab="R Know-How", xaxt="n", yaxt="n", pch = as.numeric(string.to.colors(d$Club, col=c(11, 16)))) legend("topleft", legend=c("Mat Black Labs", "Rs-R-Us"), pch=c(11, 16), bty="n")

Neato, eh?

The contents() function in fifer

Posted in Statistics
Introduction

I’m ashamed to admit it. I often forget the functions in my own package. You’d think that after spending hours creating it, hours more debugging it, and even more hours getting it to pass CRAN’s militant package checklist, I’d remember. But no, sadly. Was it string.to.color? Or strings.to.colors? Or StringsToColors? At first I assigned aliases (e.g., string.to.color and string.to.colors all point to the same documentation), but sometimes that wasn’t enough. I’d say something like, “I remember creating a function that….er…maybe it like rotates a plot or something? What was its name?”I used to type

ls("package:fifer")

But that got annoying (and I could never remember the command). So I created one called contents().

How to use it

To know what functions a package contains, type the following:

contents("packagename")

For example, to see what’s in MASS, type

contents("MASS")

To see what’s in fifer, type

contents("fifer")

or just

contents()

And bam! Yer done.

The clear() function

Posted in Statistics
Introduction

Here was a common issue I had in grad school. I’d sit hunched before my computer, eyes twitching from looking at the screen too long, toilets flushing in the bathroom next door to my office. And to top it off, the AC’s broke, so I’m sweating like a swimming pool. And my R program JUST. WON’T. WORK!

Debug, debug, debug. My stomach growls, protesting another missed meal. My wife is texting, asking when I’m coming home. “As soon as I get this program working,” I say. “Kids are screaming, hurry up,” she says.

And I’m about to scream too. But then…the moment comes. I make a change and everything works. Wahoo!

Or is it wahoo?

The next day, I return to the same humid office to find that the program no longer works.

But-but-but-but…it worked yesterday!!!! How could it work yesterday, and not today? That lying sack of computer code!

The Problem

Let’s say you’ve got the following code

d = rnorm(100) #### insert gobbledy goop that shouldn't change d### #### and now we return to the bottom of the page### mean(d) ### should be zero, eh? ####

But mean(d) says 1000! Say, what??? How could that be? Well, it seems you’ve forgotten one little thing. While going to a source file to edit a function (or something like that), you accidentally run this little line of code:

d = rnorm(100, mean=1000)

but forgot. So R originally assigned d a bunch of random numbers with a mean of zero, but later replaced it with a new set of random numbers with a mean of 1000. So, when you try to calculate the mean, it’s not going to remember what you originally assigned to it, but it’s going to remember the most recent assignment.

How often does it happen? Not often, but often enough that it’s a big hassle (especially if I’m sourcing functions and whatnot).

The Solution

To solve this problem, I used to include one line of code at the beginning of every R document

rm(list=ls())

That clears R’s memory from the get-go, that way any debugging will be entirely contained within your current document. It’s a clear slate. A fresh start. Turning over a new leaf. Like January 1st or the end of an AA meeting–past sins forgotten, nuttin’ but clean programming ahead.

But…I got sick of typing all that in just to wipe R’s memory. So I built a function called clear(). The Gildroy Lockehart of programming, that function.

How to use it

The easiest way is to just wipe everything

clear()

Or you can remove only objects named dobby

clear("dobby", keep=F)

Or you can remove all objects but dobby (since, you know, dobby’s pretty much the man):

clear("dobby", keep=T)

Neato, eh?

Practice

Posted in Introduction to R, Statistics
Practice

For each of the questions, remember that you can always look at the documentation to find out more information about the functions. Also, I recognize that it would be nearly impossible to determine how to do this practice with only the information I’ve provided. Consequently, I recommend becoming quite familiar with Google.

First, you’ll need to download the following files:

avdata

tdata

zdata

1. Using the dataset titled zdata.csv, do a z-test to determine whether the sample data differs from the population mean. Assume mu=100 and sigma=15. Be sure to report the z_obt and p_obt as well as to state the conclusion.

2. Using the dataset called tdata.csv to conduct a two independent sample t-test.

3. Using that same dataset, conduct a two dependent sample t-test

4. Using the dataset avdata.csv, conduct an ANOVA to determine whether the three groups differ. Be sure to report the F statistic and a p value.