What is the difference between characters, numerics, factors, and integers?


I am having a difficult time retaining the difference between characters, numerics, factors, and integers. I also am unsure of when it is
appropriate to use each. I think it would be helpful to be able to refer to the answer here in the future.

Thank you

Answers and follow-up questions

Answer or follow-up question 1

Difference between characters, numerics, factors, and integers:

First some terminology to make sure we're on the same page:
We have alphabetic characters and numeric characters. The former are letters or strings
and the latter are numbers. A number can have an integer part and a fractional part. The
integer part comes before the dot (called the decimal separator) and the fractional part
comes after the dot. For example, in the number 5.637, the number 5 is the integer, and
637 is the fractional part (also called decimals).

In addition we have categorical variables (order is meaningless; also called nominal variabels)
and non-categorical variables (order is meaningful; ordinal, interval and ratio variables).
For example if you have blue, green, red (of 1,2,3) as options you have a categorical variable because
blue is not bigger than green or red (or 3 is not better than 1).
If we are talking about income or temperature we have a non-categorical variable. An income
of $10 is smaller than an income of $20.

The function numeric() is reserved for numbers with a fractional parts (always non-categorical).
The function integer() is reserved for numbers without a fractional part (always non-categorical).
If you try to store numbers with a fractional part with integer() it will only
store the integer part.
The function character() is reserved for alphabetic characters (always categorical).
The function factor() is reserved for alphanumeric (alphabetic and numeric) variables (always categorical).

You can store everything as a factor or a character without loss of information.
However, in terms of storage footprint (smaller is better):
-integer < numeric < characer and factor.
-if you have many unique values: character < factor
-if you have few unique values: character > factor

Therefore use a factor in case of few unique values, and a character in case
of many unique values.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.