Question



Data Frame subsetted with logical types FALSE, TRUE

In section 2.3.3.3 of your book you show a data frame consisting of an integer vector and factor vector. Next you subset data frame with
the statement df[,c(FALSE,TRUE)] and the second column (i.e. the factor vector) results. I don't understand why this works as it does or why
you would subset the data frame in this way.





Answers and follow-up questions





Answer or follow-up question 1

Dear student,

Thank you for your question. Using a logical vector is one way of subsetting a data frame. Note that the logical vector comes after the
comma, and therefore we are subsetting the columns as opposed to the rows. The length of the logical vector should be equal to the number of
columns in the data frame, and only the columns corresponding to the positions of the TRUE values in the logical vector will be retained.

Some examples:

Only the second value is TRUE, so only the second column will be retained:
df[,c(FALSE,TRUE)]

Both values are TRUE, so both columns will be retained:
df[,c(TRUE,TRUE)]

Only the first value is TRUE, so only the first column will be retained:
df[,c(TRUE,FALSE)]

No values are TRUE, so no columns will be selected:
df[,c(FALSE,FALSE)]

So why would we use a logical vector, if we can us an integer vector (which is less typing)? In this case:
df[,c(TRUE,TRUE)] == df[, 1] == == df[, c(1)]
Well, flexibility is a good thing. If you want to select columns (or rows) that satisfy a certain condition, this comes in very handy.

For example, if you want to select only columns that are integers, you could first write some code (involving the is.integer function) to
find out which columns are integers. The result of that code would be a logical vector. You could then use that logical vector to subset
the data frame.

As for rows, if you only want to select rows with values bigger than a given number for a given variable you could write code to do that,
and the output would be a logical vector. You could then use that logical vector to subset the rows of that data frame. Here is a code
example, in which we select only rows with a value bigger than 5 for variable 1.

> (a <- data.frame(var1=1:10,var2=11:20))
var1 var2
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
> (logicalvector <- a$var1 > 5)
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> a[logicalvector,]
var1 var2
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20

Michel Ballings




Sign in to be able to add an answer or mark this question as resolved.