Question



Big Project Articles Text mining: Merging 2 data sets

Dr. Ballings,

I have an issue with merging after aggregating the the Companies. Here is the situation:

*I have successfully aggregated the COMPANIES Data Set name AGG.COMPANIES by consolidating all the Story_ID's according to their Company_ID,
so it looks like this:

Company_ID TimeSTAMP_UTC Story_ID
1 1/1/1990 B8094E11C1279FAF1A231FE2161FA5DB
2 1/1/1995 3F064F7FCB18A86B39A7B1DAD7F353E9, 3BF76F95D39120A64F689AEC98AA6F55.....
3 1/3/2000 0D2922F8ECDD8FBB8E08857A5595B2FD, 9E98E0308356707B67248724EF9E8ECE.....

PROBLEM:
When I try coding----MERGE <- merge(AGGRG.COMPANIES, STORIES, by= "STORY_ID")---, it only merge the rows that had ONE single Story_ID, and
it did not merge the rows with multiple Story_ID's aggregated ( e.g. Company 2 and 3 above). I try aggregating STORIES dataset by Story_ID
and Time, but all stories have unique STORY_ID's so I can't aggregate by STORY_ID, This leaves to the question...how do I solve this
problem of merging by Story_ID which has rows with multiple unique identifiers?

How do I merge two data sets that allows me to fully merge the rows with multiple STORY_ID's (company 2 and 3 above) and at the same time
merge all their relevant information merge by STORY_ID?


please advise,

Jian





Answers and follow-up questions





Answer or follow-up question 1

Dear Jian,

Indeed you cannot do it like that.

You want to create a dtm on each story and then aggregate them (sum) by stock and day.

Only then you can merge with the NYSE data.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.