Export and merge data in R in CDMS
There are two ways to export data from your study to R:
- Export data in CSV format and read the CSV file into R (see below)
- Use the Castor API R package called castoRedc to retrieve data
Read your CSV file into R
Skip this step if you have retrieved data with the Castor API R package. Follow this step if you have exported your data in CSV format.
- Read a .csv file in R using the following code:
studydata <- read.csv(file="C:\\yourfilepath\\exportstudydata.csv",header=TRUE, sep=",")
Note: It is possible that the values in your .csv file are separated with semicolons rather than commas. You can check this by opening the .csv file with any text application (e.g. Excel or Notepad). If this is the case, replace the comma separator (sep=",") with a semicolon:
studydata <- read.csv(file="C:\\yourfilepath\\exportstudydata.csv", header=TRUE, sep=";")
- In the code above, replace:
C:\\yourfilepath\\exportstudydata.csv
with your own file path.
- Replace single backslashes (\) in your file path with two backslashes (\\) as shown in the code above.
- Run the code to read your file into R. You have now created "studydata".
- View your data by running:
View(studydata)
Merging study and report/survey data
Once you have your data in R, you can choose between two ways to merge your study and report/survey data:
- Merge in long format
- This means that every report/survey instance has its own row, so one record can have multiple rows in the dataset.
- Merge in wide format
- This means that every report/survey instance will have new columns, so one record will have one row in the dataset.
Merge study and report/survey data in R in long format
After you have imported your data, you can merge your data. Follow these steps if you want to merge your study and report/survey data in a long format. This means that for every report instance, a new row is added for a record.
Let's say you have imported your study data and report data into R, which are called "studydata" and "report".
- Make the column name of the record ID identical, so that you can merge the data based on the record ID:
colnames(studydata)[1] <- "recordID" colnames(report)[1] <- "recordID"
- Merge the files by running the following line of code:
myfulldata = merge(studydata,report,by="recordID",all=TRUE)
- View your merged dataset by running:
View(myfulldata)
Merge study and report/survey data in R in wide format
After you have exported your data, you can merge your data. Follow these steps if you want to merge your study and report/survey data in a wide format. This means that for every report instance, new columns are added.
Let's say you have imported your study data and report data into R, which are called "studydata" and "report".
-
Make the record ID column names identical so you can merge on recordID:
colnames(report)[1]<-"recordID" colnames(studydata)[1]<-"recordID"
-
We'll use the data.table package to cast the report data to a wide format (i.e., every report instance has its own columns). If you have never installed packages before, use:
install.packages("data.table")
See also this tutorial on how to install packages in R. If you have done this (before), load the package:
library(data.table)
-
Define which variables you want to include:
-
All variables (i.e., from report creation date to the last variable in your report):
allVariables <- colnames(report)[2:ncol(report)]
- Specific variables, based on the variable name:
allVariables <- c("Report.Creation.Date","Report.Name.Custom","Report.Parent", "yourVariable1", "yourVariable2","yourVariable3")
- A number of variables, based on their variable/column position. The record ID has position 1, record creation date has position 2, the custom report name has position 3, etc. So for example, if you want to include report creation date, custom report name, parent report, and your own first 2 variables, you can use:
allVariables <- colnames(report)[2:6]
-
- After defining the variables you want to include, cast your report data to a wide format with the code below. Every column name will have a suffix “_ReportX”, with X being the number of the report instance.
castedReport <- dcast(setDT(report), recordID~rowid(recordID, prefix="Report"), value.var=allVariables)
For surveys you can choose to set the suffix to "Survey":castedReport <- dcast(setDT(yourSurveyData), recordID~rowid(recordID, prefix="Survey"), value.var=allVariables)
- View your castedReport with:
View(castedReport)
-
Merge the casted report with the study data with:
myfulldata <- merge(studydata, castedReport, by="recordID", all=TRUE)
-
View your merged dataset with:
View(myfulldata)