🆕 Build and deploy Haystack pipelines with deepset Studio

Analyze Your Instagram Comments’ Vibe with Apify and Haystack


Author: Jiri Spilka ( Apify)
Idea: Bilge Yücel ( deepset.ai)

Ever wondered if your Instagram posts are truly vibrating among your audience? In this cookbook, we’ll show you how to use the Instagram Comment Scraper Actor to download comments from your instagram post and analyze them using a large language model. All performed within the Haystack ecosystem using the apify-haystack integration.

We’ll start by using the Actor to download the comments, clean the data with the DocumentCleaner and then use the OpenAIGenerator to discover the vibe of the Instagram posts.

Install dependencies

!pip install apify-haystack==0.1.4 haystack-ai

Set up the API keys

You need to have an Apify account and obtain APIFY_API_TOKEN.

You also need an OpenAI account and OPENAI_API_KEY

import os
from getpass import getpass

os.environ["APIFY_API_TOKEN"] = getpass("Enter YOUR APIFY_API_TOKEN")
os.environ["OPENAI_API_KEY"] = getpass("Enter YOUR OPENAI_API_KEY")
Enter YOUR APIFY_API_TOKEN··········
Enter YOUR OPENAI_API_KEY··········

Use the Haystack Pipeline to Orchestrate Instagram Comments Scraper, Comments Cleanup, and Analysis Using LLM

Now, let’s decide which post to analyze. We can start with these two posts that might reveal some interesting insights:

We’ll download the comments using the Instagram Scraper Actor. But first, we need to understand the output format of the Actor.

The output is in the following format:

[
  {
    "text": "You've just uncovered the goldmine for me 😍 but I still love your news and updates!",
    "timestamp": "2024-09-02T16:27:09.000Z",
    "ownerUsername": "codingmermaid.ai",
    "ownerProfilePicUrl": "....",
    "postUrl": "https://www.instagram.com/p/C_a9jcRuJZZ/"
  },
  {
    "text": "Will check it out🙌",
    "timestamp": "2024-09-02T16:29:28.000Z",
    "ownerUsername": "author.parijat",
    "postUrl": "https://www.instagram.com/p/C_a9jcRuJZZ/"
  }
]

We will convert this JSON to a Haystack Document using the dataset_mapping_function as follows

from haystack import Document

def dataset_mapping_function(dataset_item: dict) -> Document:
    return Document(content=dataset_item.get("text"), meta={"ownerUsername": dataset_item.get("ownerUsername")})

Once we understand the Actor output format and have the dataset_mapping_function, we can setup the Haystack component to enable interaction between the Haystack and Apify.

First, we need to provide actor_id, dataset_mapping_function along with input parameters run_input.

We can define the run_input in three ways:

  • i) when creating the ApifyDatasetFromActorCall class
  • ii) as arguments in a pipeline.
  • iii) as argumennts to the run() function when we calling ApifyDatasetFromActorCall.run()
  • iv) as a combination of i) and ii) as shown in this cookbook.

For a detailed description of the input parameters, visit the Instagram Comments Scraper page.

Let’s setup the ApifyDatasetFromActorCall

from apify_haystack import ApifyDatasetFromActorCall

document_loader = ApifyDatasetFromActorCall(
    actor_id="apify/instagram-comment-scraper",
    run_input={"resultsLimit": 50},
    dataset_mapping_function=dataset_mapping_function,
)

Next, we’ll define a prompt for the LLM and connect all the components in the Pipeline.

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.preprocessors import DocumentCleaner

prompt = """
Analyze these Instagram comments to determine if the post is generating positive energy, excitement,
or high engagement. Focus on sentiment, emotional tone, and engagement patterns to conclude if
the post is 'vibrating' with high energy. Be concise."

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Analysis:
"""

cleaner = DocumentCleaner(remove_empty_lines=True, remove_extra_whitespaces=True, remove_repeated_substrings=True)
prompt_builder = PromptBuilder(template=prompt)
generator = OpenAIGenerator(model="gpt-4o-mini")


pipe = Pipeline()
pipe.add_component("loader", document_loader)
pipe.add_component("cleaner", cleaner)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", generator)
pipe.connect("loader", "cleaner")
pipe.connect("cleaner", "prompt_builder")
pipe.connect("prompt_builder", "llm")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7b45ef117be0>
🚅 Components
  - loader: ApifyDatasetFromActorCall
  - cleaner: DocumentCleaner
  - prompt_builder: PromptBuilder
  - llm: OpenAIGenerator
🛤️ Connections
  - loader.documents -> cleaner.documents (list[Document])
  - cleaner.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.prompt (str)

After that, we can run the pipeline. The execution and analysis will take approximately 30-60 seconds.

# \@tiffintech on How to easily keep up with tech?
url = "https://www.instagram.com/p/C_a9jcRuJZZ/"

res = pipe.run({"loader": {"run_input": {"directUrls": [url]}}})
res.get("llm", {}).get("replies", ["No response"])[0]
'Overall, the Instagram comments on the post reflect positive energy, excitement, and high engagement. The use of emojis such as 😂, 😍, 🙌, ❤️, and 🔥 indicate enthusiasm and excitement. Many comments express gratitude, appreciation, and eagerness to explore the resources mentioned in the post. There are also interactions between users tagging each other and discussing their interest in the topic, further increasing engagement. Overall, the post seems to be generating high energy and positive vibes from the audience.'

Now, let’s us run the same analysis. This time with the @kamalaharris post

# \@kamalaharris on Affordable Care Act
url = "https://www.instagram.com/p/C_RgBzogufK/"

res = pipe.run({"loader": {"run_input": {"directUrls": [url]}}})
res.get("llm", {}).get("replies", ["No response"])[0]
'The comments on this post are highly polarized, with strong opinions expressed on both sides of the political spectrum. There is a mix of negative and positive sentiment, with some users expressing excitement and support for the current administration (e.g., emojis like 💙💙💙💙, Kamala 👏👏) while others criticize past policies and individuals associated with them (e.g., Trump 2024, lack of education). Overall, the engagement on this post is high, with users actively debating and defending their viewpoints. Despite the divisive nature of the comments, the post is generating a high level of energy and engagement.'

The analysis shows that the first post about How to easily keep up with tech? is vibrating with high energy:

The Instagram comments reveal a strong level of engagement and positive energy. Emojis like 😍, 😂, ❤️, 🙌, and 🔥 are frequently used, indicating excitement and enthusiasm. Commenters express gratitude, excitement, and appreciation for the content. The tone is overwhelmingly positive, supportive, and encouraging, with many users tagging others to share the content. Overall, this post is generating a vibrant and highly engaged response.

However, the post by @kamalaharris on the Affordable Care Act is (not surprisingly) sparking a lot of controversy with negative comments.

The comments on this post are generating negative energy but with high engagement. There’s a strong focus on political opinions, particularly concerning insurance companies, the Affordable Care Act, Trump, and Biden. Many comments express frustration, criticism, and disagreement, with some users discussing party affiliations or support for specific politicians. There are also mentions of misinformation and conspiracy theories. Engagement is high, with numerous comment threads delving into various political issues. Overall, this post is vibrating with intense energy, driven by political opinions, disagreements, and active discussions.

💡 You might receive slightly different results, as the comments may have changed since the last run