## Question

What is the difference between characters, numerics, factors, and integers?

Professor,

I am having a difficult time retaining the difference between characters, numerics, factors, and integers. I also am unsure of when it is

appropriate to use each. I think it would be helpful to be able to refer to the answer here in the future.

Thank you

## Answers and follow-up questions

** Answer or follow-up question 1**Difference between characters, numerics, factors, and integers:

First some terminology to make sure we're on the same page:

We have alphabetic characters and numeric characters. The former are letters or strings

and the latter are numbers. A number can have an integer part and a fractional part. The

integer part comes before the dot (called the decimal separator) and the fractional part

comes after the dot. For example, in the number 5.637, the number 5 is the integer, and

637 is the fractional part (also called decimals).

In addition we have categorical variables (order is meaningless; also called nominal variabels)

and non-categorical variables (order is meaningful; ordinal, interval and ratio variables).

For example if you have blue, green, red (of 1,2,3) as options you have a categorical variable because

blue is not bigger than green or red (or 3 is not better than 1).

If we are talking about income or temperature we have a non-categorical variable. An income

of $10 is smaller than an income of $20.

The function numeric() is reserved for numbers with a fractional parts (always non-categorical).

The function integer() is reserved for numbers without a fractional part (always non-categorical).

If you try to store numbers with a fractional part with integer() it will only

store the integer part.

The function character() is reserved for alphabetic characters (always categorical).

The function factor() is reserved for alphanumeric (alphabetic and numeric) variables (always categorical).

You can store everything as a factor or a character without loss of information.

However, in terms of storage footprint (smaller is better):

-integer < numeric < characer and factor.

-if you have many unique values: character < factor

-if you have few unique values: character > factor

Therefore use a factor in case of few unique values, and a character in case

of many unique values.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.