Importing CSV Data to a Heroku Database

  • rails, gems
  • 1 Comment

I was recently asked to migrate data from a CSV file to a Heroku database that was already in use for a Rails app. For a couple of years my company was using a Google form as a way for people to apply to our program but we recently deployed a Rails form so we needed to migrate all of the Google form data to the Heroku database. My first thought was, “No big deal," but I quickly found out this task was more complex than it appeared.

Baby Steps

Google forms store all of the submitted data on spreadsheets so the first step was to make sure the Google column headers matched the column names on the Heroku database. For instance, “Timestamp" on the Google form had to be changed to “created_at" to match our database column and I had to convert all of these timestamps from the default 7/1/2014 9:8:42 to a form accepted by Postgres: 2014-07-01 09:08:26. Once all of the data and columns were fixed I simply exported the document to CSV and stored it locally.

Rake Task

My plan was to create a rake task that would create new application records using the CSV data that I could parse with the CSV library. In application.rb I added require 'csv' so I'd have access to the CSV library throughout the Rails app.

Next, I created a rake task that took the CSV file I had stored locally and called Application.create while specifying which columns I wanted to import by using the slice method and an array of desired columns:

namespace :acltc_website do
  desc 'Import applicant data from CSV to database'
  task :import_applicant_data => :environment do
    CSV.foreach('/Users/desktop/application-responses.csv', :headers => true) do |row|
      Application.create(row.to_hash.slice(*%w[first_name last_name email phone programming_experience preferred_work_location capstone_idea created_at]))
    end
  end
end

This worked flawlessly in development on localhost: all of the applicant data from the CSV was migrated to the correct Postgres columns. I thought I was home free until I tried to push this code to Heroku.

Heroku is not Local

I quickly realized I couldn't implement this strategy because Heroku does not have access to my local environment. Sure, I could have pushed this CSV data to GitHub and grabbed it from there but that would've exposed the personal data of hundreds of people to world. What I needed was a secure place to store the CSV data AND Heroku had to be able to access it. Enter Amazon S3.

Amazon S3 Saves the Day

I created a bucket in my Amazon S3 account and uploaded the CSV file and then added the 'aws-sdk', '~> 2' gem to my Gemfile. I stored my AWS credentials in my .env file, pushed them to Heroku, and after a lot of trial and error I was able to get the rake task to work with the following code:

namespace :acltc_website do
  desc 'Import applicant data from CSV to database'
  task :import_applicant_data => :environment do
    s3 = Aws::S3::Resource.new(
      region: 'region',
      access_key_id: ENV['AWS_ACCESS_KEY_ID'],
      secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
    )
    obj = s3.bucket(ENV['S3_BUCKET_TITLE']).object('key').get
    CSV.parse(obj.body, :headers => true) do |row|
      Application.create(row.to_hash.slice(*%w[first_name last_name email phone programming_experience preferred_work_location capstone_idea created_at]))
    end
  end
end

There is a lot going on in that rake task so I'll walk you through it. I create a new instance of the Aws::S3:Resource and pass in my credentials. Then, I set the obj variable to the name of my S3 bucket and the “key," which is just their term for the name of my CSV file. The interesting thing about this is that obj is a string, not an array of arrays like a typical CSV, so I can't use CSV.foreach to access the data. I need to call CSV.parse and specify obj.body as an argument because that's where the CSV data is stored.

Now I can run heroku run bundle exec rake acltc_website:import_applicant_data and boom—all of the CSV data is migrated to the Heroku database.