June 6, 2016

Handling non UTF-8 characters with PostgreSQL

If you have a PostgreSQL database using UTF-8 then you can't save non UTF-8 characters in it. If you try to do it with the pg gem, then it will raise a PG::CharacterNotInRepertoire exception.

And exactly that happened in our CSV import system, because users are very good at finding all edge cases ;). I've saw this error in our exception logging system and I was trying to find the best solution to handle it.

I was hesitant to use String#encode because I think that it would be error prone. We do not know the original encoding and it might be too complicated to do everything correctly.

Instead of that I decided just to inform users that they have to convert their CSV file to UTF-8.

This is simplified version of the code:

class CsvImportsController < ApplicationController
  rescue_from PG::CharacterNotInRepertoire, with: :handle_incorrect_encoding

  # ...

  def create
    @csv_import = CsvImport.from_uploaded_io(params[:csv_import][:csv_file])

    if @csv_import.save
      redirect_to csv_imports_path, notice: "CSV file has been successfully uploaded."
      render "new"


  def handle_incorrect_encoding
    redirect_to new_csv_import_path, alert: "Incorrect CSV file encoding. Please save it with UTF-8 encoding and try again."

This is might not be the best solution for users and instead of that we could try to handle all encoding on our side. But as I said it could be error prone and it would require much more time to do it right. Also, this feature is used only from time to time and the issue didn't happen very often. So I just quickly implemented the above described solution and moved to something else.

Hey there!

My name is Patrik Bóna and I am the only programmer at Memberful. This blog is kind of dead, but I just started my own Ruby on Rails screencast. Follow me on Twitter if you want to be notified about my newest videos.