Experimenting with inherited tables in Ecto

I’ve been working on a personal project with some mates for the past few months, and I’ve been mostly responsible for building our backend with Phoenix.

It’s come to a time where we need to implement some sort of "comments & likes" functionality for the app, so I set out to start thinknig how to go about it.

Note: this is just me experimenting, not trying to say that this is the right way to do it. I’m still learning about these things, so if you think I’m missing something, by all means, let me know at oscar@swanros.com

Note 2: the example code for this post is on GitHub.

When I started digging around, most of the answers I found about how to implement such functionality pointed to using table inheritance so that I could have a ActionableEntities table, and EntityComments and EntityLikes tables. Then, Posts, Photos, Events whould inherit from ActionableEntities gaining the ability of being commented or ‘liked’.

"Simple enough," I said to myself. Then I started digging. This is what I found.

Implementing table inheritance with Ecto

I couldn’t find concrete examples of how to do this, but did read the official Ecto documentation, though, and found that you can pass an :options parameter to specify extra attributes that you want your table to have, such as WITH, INHERITS or ON COMMIT. So, the migration looks like this:

def change do
    create table(:actionable_entities) do
        timestamps()
    end
    
    create table(:entity_comments) do
        add :content, :string
        add :entity_id, references(:actionable_entities)
        
        timestamps()
    end
    
    create table(:posts, options: "INHERITS (actionable_entities)") do
        add :content, :string
    end
    
    # photos and all other tables follow the same structure as the posts one.
end

Now the modules that’re going to be using each of these tables are defined as follows:

defmodule Inh.ActionableEntity do
  use Ecto.Schema

  schema "interactive_entities" do
    has_many :comments, Inh.EntityComments.Comment

    timestamps()
  end
end

defmodule Inh.EntityComments.Comment do
  use Ecto.Schema

  schema "entity_comments" do
    field :content, :string
    belongs_to :entity, Inh.ActionableEntity

    timestamps()
  end
end

defmodule Inh.Posts.Post do
  use Ecto.Schema

  schema "posts" do
    field :content, :string

    timestamps()
  end
end

"This ought to work right here," I thought to myself. And it does for the most part: I can create a post, and get a list of posts. I can even query for a Post‘s comments eventhough there are none on the database.

However, the following error on the database arises when trying to create a comment for a given post:

ERROR: insert or update on table "entity_comments" violates foreign key constraint "entity_comments_actionable_entity_id_fkey" DETAIL: Key (entity_id)=(1) is not present in table "actionable_entities".

The code used for trying to insert a new comment looks something along the lines of

def create_for_post(id, c_params) when is_integer(id) do
    Repo.get(ActionableEntity, id)
    |> Ecto.build_assoc(:comments, c_params)
    |> Repo.insert()
end

First get the post I want to comment, build the corresponding association and then insert it into the database. Here’s the problem, though:

indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children.

So, if I add a record to posts or to photos, the information from the inherited table bubbles up to teh parent table. But then, technically, the information is a post, not an actionable_entity. 🤔

Postgres is right to be telling me that there’s no ActionableEntity with the given id.

I could get more of a sense that something is really wrong here by creating a new photos table that too inherits from actionable_entity and adding some records to it:

At this point, there’s no data consistency as the database can’t distinguish between Photos and Posts.

One way to solve this is to create a trigger on the database to check every time we try to inset a new EntityComment for a Post, that the Post indeed exists on the database. To do this, the foreign key constraint needs to be removed from the database. So first, update the entity_comments migration:

create table(:entity_comments) do
  add :content, :string
  add :entity_id, :integer

  timestamps()
end

entity_id is now just a simple :integer, there’s not an explicit reference to the actionable_entities table.

Then, create the trigger:

execute """
  CREATE OR REPLACE FUNCTION internal_post_check() RETURNS TRIGGER AS $$
  BEGIN
    IF NOT EXISTS(SELECT 1 FROM posts WHERE id = new.entity_id) THEN
      RAISE EXCEPTION 'Post does not exist:  %', new.entity_id;
    END IF;
    RETURN new;
  END;
  $$ language plpgsql;
"""

execute """
  CREATE TRIGGER CheckEntityExists BEFORE INSERT OR UPDATE ON entity_comments
  FOR EACH ROW EXECUTE PROCEDURE internal_post_check();
"""

This trigger will run every time a new record wants to be inserted on the entity_comments table and will manually check if a post with the value on entity_id as id exists.

It may seem that the problem is now solved, but then again, what would happen when I have a Post and want to retrieve its comments? If I have Posts and Photos and potentially N number of "interactive entities" on my database, how would I be able to query just for those?

I solved this by adding a new type:String column on the actionable_entity table. Posts would have the value post in that column, photos would have the value photo, and so on for every type of actionable entity I eventually add to the database.

This way, now I can query for the comments of a specific photo with a given id:

def comments_for_post(id) when is_integer(id) do
    q = from entity in ActionableEntity,
        where: entity.id == ^id and entity.type == ^"post",
        left_join: comments in assoc(entity, :comments),
        preload: [comments: comments]
        
    post = Repo.one(q)
    post.comments
end

Final notes

Although this works it does require to bypass the database’s integrity checks to handle those on my own. This is very error prone and would require me to constantly run tests to verify that I’m not missing addig a new trigger for photos, events, or any other "actionable entity" that I want to add to my system.

Also, if I decide to add another kind of action to these entities, such as "likes" or "claps," I’d have add another set of checks for those too.

This is very soon becoming a maintainability nightmare.

I asked friend whose really experienced with backend development and he just sent me this link to a post called Three Reasons Why You Shouldn’t Use Single Table Inheritance. Read it.

In the end, this was just me trying to implement a solution that I thought would make for a good one, but it seems that the compromises that need to be made here are not worth it.

What I went with was the simplest approach: add to the entity_comments table the columns of the entities that I want to enable comments for:

id | content | post_id | photo_id | event_id | inserted_at | updated_at
-----------------------------------------------------------------------
1    Hey!      1                               -----         -----
2    nice!               1                     -----         -----
3    Good!     4                               -----         -----

As stated in schema.ex#L806:

Unless you have dozens of columns, this is simpler for the developer,
more DB friendly and more efficient in all aspects.

So there’s that! Thanks for reading.

Filtering has_many relationships in Ecto

This is sot-of a follow up to my last post, about self-referencing many_to_many relationships using Ecto.

I find myself in a scenario where I have a User:

# User model
defmodule MyApp.User do
  use MyApp.Web, :model

  alias MyApp.Contact

  schema "users" do
    has_many :_internal_contacts, MyApp.Contact
    has_many :contacts, through: [:_internal_contacts, :contact]

    timestamps
  end
end

and an association model Contact through which the User has many contacts (User instances):

defmodule MyApp.Contact do
  use MyApp.Web, :model

  alias MyApp.User

  schema "contacts" do
    field :status, :integer

    belongs_to :user, User, foreign_key: :user_id
    belongs_to :contact, User, foreign_key: :contact_id

    timestamps
  end

  def status_code(status) do
    case status do
      :accepted ->
        1
      :pending ->
        0
      :rejected ->
        -1
      _ ->
        0
    end
  end

I created an endpoint on my app that’s supposed to get me only the user’s contacts that have the :accepted status (1). How can this be accomplished?

The Repo module has a set of handy functions that let you preload associations on your models.

user = User |> Repo.get(1) |> Repo.preload(:contacts)

The above will preload all contacts on the user. Pay attention to the fact that it is of type has_many and it goes through the :_internal_contacts property. This is the important cue.

Turns out that the preloads parameter on Repo.preload/3 can be accompanied by a query.

import Ecto.Query

#1
query = from c in Contact, where: c.status == 1 and c.user_id == 1

u = User
    |> Repo.get(1) #2
    |> Repo.preload(_internal_contacts: query) #3
    |> Repo.preload(:contacts) #4

The code above will:

  1. Create a query that will ask for Contacts with status 1 and with user_id 1.
  2. Get a User instance.
  3. Preload the _internal_contacts on that instace using our query (these are Contact instances, not User instances)
  4. Preload the :contacts on our user. And since :contacts goes through: _internal_contacts, :contacts has only valid contacts now (User instances).

Notes:

Honestly, I don’t know if this is the right approach here (let me know!). I did find this thread on GitHub where the general issue that we’re facing here is described. It seems that there’s an interest to have the ability to filter the relationships on the declaration itself:

It would look something like this:

has_many :comments, MyApp.Comment, foreign_key: :commentable_id,  fn (query) -> 
  from c in query, where: c.commentable_type == "articles"
end

However, right now, the saner approach seems to be the use of composable queries.

Have anything to add to this article or did I miss something? Please let me know on Twitter.