Open Refine can be downloaded here.


In this intro for Open Refine we will clean and save recipe of this census CSV file for California county population.


To Start

Open the downloaded Open Refine application. The application will open in your default browser. Choose a new file to create a project.


openrefine

Click next and choose CSV as data format.


openrefine

In the preview on the right.

I ignore the first 3 lines.
Parse next 1 line as column headers.
Discard initial 1 rows.
Unlick store blank rows.

Create the project.


We still have some extra blank unnecessary footers at the end.


openrefine

Here we do a blank facet to remove the footers. Click on any of the column that does not contain a footer. For example, I use Census column here.

openrefine

There are 6 footers to be removed.

Select the true field on the left. Footers will show.
Click All > Edit Rows > Remove all matching rows. Cross the left mini box to see the results.

I will remove the empty columns and rename the county column.


openrefine

To remove the “.” In County column. Click Edit cell > Transform.


Use GREL Expression.

value.replace(".", "")

openrefine

Since I already have the column named County. I want to remove the “County” next to the county name.

We do another transformation using GREL.

value.replace("County", "")

Final Transformation will look like this.


openrefine


Copy the recipe you have created by going to Extract under Undo/Redo. Copy the JSON to be applied on a new dataset.

openrefine

Finally, we will see if the recipe created here will work on a similar format data. Choose any of the CSV files here.


We used California in the previous one.



I will apply the recipe on New York data.

Download the CSV to local machine.
Open new project in Open Refine.

Change the format like the previous one.

Parse data as CSV
Ignore first 3 lines
Parse next 1 line as column header
Discard initial 2 rows
Unclick store blank rows

Create the project.

Now apply (paste) the copied JSON recipe under Undo/Redo.


openrefine

Click Perform Operations to see the changes.


The result should look like this:

openrefine

There many other operations you can do clean the data. The work can be exported in many formats.


To export as JSON format. Under Export click > Templating.. > Export.