Description Link to heading

This project started as a way to bring Art from a closed-sourced and isolated community like Discord, to an open platform like the Fediverse - a fairly unknown, open source, federated alternative to microblogging social media (like Twitter).
Discord Communities tend to circle around particular subjects - games, TV shows, celebrities etc. and various fans will gather to discuss and create content, particularly art to share with each other. Before Discord became popular, these subjects would be discussed in forums where content could typically be viewed publicly forever. Unfortunately, much content is now walled behind problematic gateways such as CloudFlare, IPv4-only networking, account creation and the whims of a venture-capitalist startup quickly running out of funding. Discord currently allows an unlimited number of image uploads for free users, within a file-size limit of 25 MB; such an unsustainable business model is surely the first to go, with old uploads being purged as soon as it’s no longer viable to investors.

With the above in mind I set out to create the following:

  • A dedicated Fediverse instance for hosting fan-created content of a specific community.
  • A simple and intuitive interface for artists to immediately upload their content as a Fediverse post.
  • A best-effort attempt at implementing take-down and removal controls if user’s change their mind in future.
  • Automatic AI Generated Alt Text for visually impaired viewers.

The completed project can be found on my GiTea site here.

Python and Asynchronous Code Link to heading

At this point I was already vastly familiar with writing Python, having used it in a range of personal and professional projects before, some of them small scripts and others entire automated applications I still use to this day; but one hurdle had evaded me time and time again is asynchronous coding. A method of coding in-between the traditional line-by-line synchronous execution, but before all the chaos and optimisation hell of threaded execution. Asynchronous code has the ability to pause execution while awaiting a result and execute other parts of the program, this is particularly useful when dealing with multiple network requests. With synchronous code, a list of 10 network requests would execute one after the other, if each request takes 10 seconds, then it will take 100 seconds to complete, even if 9 of those seconds is awaiting a response from a remote server. With asynchronous code, after starting each network request the execution moves on to the next request, resulting in a runtime of 11 seconds, 1 second to execute all 10 requests, and 10 seconds awaiting a response. This method of coding in Python is fundamentally different and can take some time to understand, key concepts such as retrieving results, working with the execution loop and where exactly async can be used can easily trip up a beginner at python.

Luckily, with the vast number of blogs, video tutorials, API Documentation and the helpful community of Discord.Py (the API library I was using), I conquered the Asyncio library and was well on my way to forming the Discord API interface.

# The actual Discord bot.
class FediverseBot(discord.Client):
    def __init__(
        self,
        *args,
        **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.guild = discord.Object(DS_GUILD_ID)

async def main():

    # Define bot intents.
    intents = discord.Intents.default()
    intents.message_content = True

    # Start bot.
    async with FediverseBot(intents=intents) as bot:
        bot = await bot.start(DS_TOKEN)

# BEGIN
asyncio.run(main(), debug=False)

Fediverse API Link to heading

Discord.py, (the Discord Python API library of my choice) is in active development, has thousands of contributors, and pages of documentation to read. The Fediverse however, does not have such great Python support. My instance software of choice is Akkoma, a fork of Pleroma, which is a fork of Mastodon (Mastodon makes up the majority of Fediverse instances but is known to be the least featured). I found one person’s Python Library for Akkoma, simply called Akkoma.py made by spla, last committed two years ago, and unfinished… but the Akkoma API hasn’t changed much in two years and most of the groundwork I need is already set, including authentication, creating posts, and uploading images.

Akkoma.py media_post function:

###
# Writing data: Media
###
@api_version("1.0.0", "2.9.1", __DICT_VERSION_MEDIA)
def media_post(self, media_file, mime_type=None, description=None, focus=None):
    """
    Post an image, video or audio file. `media_file` can either be image data or
    a file name. If image data is passed directly, the mime
    type has to be specified manually, otherwise, it is
    determined from the file name. `focus` should be a tuple
    of floats between -1 and 1, giving the x and y coordinates
    of the images focus point for cropping (with the origin being the images
    center).

    Throws a `AkkomaIllegalArgumentError` if the mime type of the
    passed data or file can not be determined properly.

    Returns a `media dict`_. This contains the id that can be used in
    status_post to attach the media file to a toot.
    """
    if mime_type is None and (isinstance(media_file, str) and os.path.isfile(media_file)):
        mime_type = guess_type(media_file)
        media_file = open(media_file, 'rb')
    elif isinstance(media_file, str) and os.path.isfile(media_file):
        media_file = open(media_file, 'rb')

    if mime_type is None:
        raise AkkomaIllegalArgumentError('Could not determine mime type'
                                            ' or data passed directly '
                                            'without mime type.')

    random_suffix = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))
    file_name = "akkomapy_upload_" + str(time.time()) + "_" + str(random_suffix) + mimetypes.guess_extension(mime_type)

    if focus != None:
        focus = str(focus[0]) + "," + str(focus[1])

    media_file_description = (file_name, media_file, mime_type)
    return self.__api_request('POST', '/api/v1/media',
                                files={'file': media_file_description},

In the end, this API was almost fully formed, the only changes I needed to make were to fix a couple typos in error messages, a bug in mime-type detection and adding a single function to delete posts:

    def status_delete(self, id):
        return self.__api_request('DELETE', f'/api/v1/statuses/{id}')

The final API I used in this bot is available here.

Configuration Link to heading

Whenever I’m working with API Keys, or any form of saved credential, adding a configuration file is one of the first things I do, in order to avoid committing my API key accidentally… configs also make it much easier for other people to use your software and generally make the whole thing look a lot neater and more professional.

I’m a big fan of Python’s configparser library, which uses INI files for configuration. INI is nice simpler format, making use of [headers] and simple key = value tokens below them.

The bot’s first configuration file section, default values:

# General config options.
[general]
log_file_location = ${data_directory}/FediverseBot.log
# Log level, one of DEBUG, INFO, WARNING, ERROR, CRITICAL
# See table for explanations https://docs.python.org/3/howto/logging.html
log_level = INFO
# Appended to the end of a post description.
post_tail = example tail text
# Stores database, credentials files, emoji exports etc.
data_directory = data
# Where to store the sqlite database.
database_location = ${data_directory}/db.sqlite

Alt Text Link to heading

One of the standout differences between the Fediverse and other social media, is the overwhelming level of support and engagement with media alt-text, which are generally used to assist visually impaired people to understand pictures and video, without being able to view the contents. Many members are very vocal of this engagement, encouraging newcomers to add alt-text wherever possible, providing guidance and support on how to describe media effectively, some will even vocally refuse to engage with posts that do not provide alt-text.

Knowing how important this is, and wanting to support it, as well as not wanting to find my instance ostracized by certain circles, I set out to find a solution. The easiest and most obvious of which was to pass through the alt-text from Discord to the Fediverse post:

A function within the bot that uploads media to the Akkoma Instance:

# Uploads a post to the Akkoma instance, using the provided Discord button press interaction.
async def fedi_upload(interaction):
    logger = logging.getLogger('discord')
    first_message = await interaction.channel.fetch_message(interaction.channel.id)
    def convert_emojis(text):
        matches = set(emoji_pattern.findall(text))
        for match in matches:
            text = text.replace(match[0], f":{match[2]}:")
        return text
    media_ids = []
    # Save all attachments to a temporary directory for uploading.
    with TemporaryDirectory(ignore_cleanup_errors=True) as tempdir:
        for media in first_message.attachments:
            sensitive = False
            if media.content_type and media.content_type.split("/")[0] in DS_ACCEPTED_CONTENT_TYPES:
                if media.filename.startswith("SPOILER_"):
                    sensitive = True
                filename = tempdir + "/" + media.filename
                await media.save(filename)
                media_ids.append(akkoma.media_post(filename, description=media.description)["id"])

This solution presented two problems:

  • A lot of artists would not upload art directly as an attachment, but link it from a URL, usually from Twitter. Discord has no support for adding image descriptions inside link embeds.
  • During testing, not a single artist added alt-text to their attachments.

Not wanting to fight a battle between alienating artists into supporting people on a platform they’ve never heard of, but while also not wanting to abandon the people of the Fediverse I set out to find a new solution, and a big one had just walked around the corner: Chat GPT-4o. OpenAI had announced just a week previously their latest AI, and it now had the ability to see and describe images. While I had previously looked into the option of self-hosting a large language model for image descriptions, most projects I found seemed abandoned, relied on specific outdated python versions, and most importantly required a GPU to work, something my rented server didn’t have, and I didn’t want to pay for.

# Uploads the provided URL to ChatGPT and returns the AI Generated text.
def get_ai_image_description(image_url):
    logger = logging.getLogger('discord')
    logger.info(f"Starting OpenAI Image Description request for URL \"{image_url}\"")
    try:
        response = openai_client.chat.completions.create(
            model = "gpt-4o",
            max_tokens = 300,
            messages = [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": OPENAI_MESSAGE_PROMPT
                        },
                        {
                            "type": "image_url",
                            "image_url": 
                                {
                                    "url": image_url,
                                    "detail": "low"
                                },
                        },
                    ],
                }
            ],
        )
    except Exception as e:
        logger.error(f"OpenAI Image Description request failed for URL \"{image_url}\".Message: \"{e.message}\".")
        return 
    
    # If the generation was completed return the response.
    if response.choices[0].finish_reason == "stop":
        return "AI Generated Description: " + response.choices[0].message.content
    else:
        # Otherwise return nothing.
        return

However, using OpenAI presented new problems one of cost, and much more pressing one of ethics. The artist community has spent the last year watching their decades of hard work being chewed up by faceless corporations and spat out as sub-par imitations of their love, causing widespread repercussions to their reputation and lifestyle. While OpenAI insists that use of their API is not used for training further AIs, I would not be shocked to find an artist that doesn’t trust this. I did not want to be seen as secretly feeding everyone’s drawings back into the OpenAI maw, not without their permission at least. So I set out to implement a novel new idea for AI, consent.

A handy button, and a disclaimer of what it does:
Screenshot of a Discord Post, a description of what the Fediverse is, what AI Alt Text is, if you want to upload your media and a link to a Terms of Service and Privacy Policy. Then four buttons, Confirm, Cancel, AI Alt Text with a green tick next to it, Refresh and Remove Credits. Within the post is a photo of my cat Mango, basking in a sunbeam. Screenshot of a Discord Post, a description of what the Fediverse is, what AI Alt Text is, if you want to upload your media and a link to a Terms of Service and Privacy Policy. Then four buttons, Confirm, Cancel, AI Alt Text with a red x next to it, Refresh and Remove Credits. Within the post is a photo of my cat Mango, basking in a sunbeam.

Clicking the purple button switches it from a green tick to a red X, I tampered with some other icons to represent on and off hoping to avoid the psychological implications and bias of presenting green as good and red as bad, but ultimately none were clear enough in their intention.

Overall I was very happy to have come to an ideal compromise whereby detailed image descriptions were available to Fediverse users, and artists on Discord would not find their work unwillingly submitted directly to their unfaithful competitor. Chat GPT described my cat here as the following:

AI Generated Description: The image features a calico cat lying on a wooden floor. The cat has a mix of white, black, and orange fur and is resting comfortably in a patch of sunlight streaming through a window. The light highlights the texture of the cat’s fur, making it appear soft and warm. Behind the cat is a wooden cabinet with several drawers, each adorned with a round wooden knob. The overall scene appears calm and serene, capturing a moment of relaxation.

The final hurdle then was cost. Chat-GPT API usage is measured in tokens depending on the resolution of the image it has to analyse, luckily OpenAI includes an option to lock the analysis to “low detail” which instructs Chat GPT to process any images at 512x512 keeping usage at 85 tokens, $0.000425 per image. For my $20 I can process about 47,000 images, far above my current rate of 5 per week. Overall the API’s pricing is currently very competitive and well within my means.

My application would not be the first art-bot on the Fediverse, many exist and follow specific themes, but most if not all scrape their images from dodgy message boards or the host’s own collection, none are submitted to by the artist themselves. I was determined to create what I like to call, the first ethical art bot, by allowing the artists to choose whether they want to submit their work, allowing them to add their own messages, link to their own social media and most importantly delete the post in future if they ever want to.

After some planning and testing, I settled on adding a new command to the bot /delete_fedi, this command would check the thread it’s executed in to see if a submission exists, and assuming it’s the original author calling the command it would send a delete request to the Fediverse instance. I was concerned however on the scenario that a thread might not exist, or the artist might not be a member of the Discord server in future, so I added an optional parameter to /delete_fedi which can be either the fediverse post ID or the URL, and allowed the command to be executed within the bots direct messages. I also added an About page to the fediverse instance, encouraging with additional ways to request a removal, by email or direct message to myself.

A screenshot of a discord message reading “Your Fediverse post has been deleted”.

With this implementation I was happy I had achieved my goal of providing a simple and intuitive interface to interact with the post in future.

Database Link to heading

The features now available within this application required me to implement another new facet of programming that I’d so far been able to avoid - a database. For each post I needed to keep track of four things:

  • The Discord thread the post was submitted from.
  • The Discord user that submitted the post.
  • The ID of the acknowledgement message that would appear on the thread after submission, it contains a link to the Fediverse post.
  • The ID of the Fediverse post itself.

With this information, I can correlate data to authorise users to delete their posts. If someone uses the delete command within a thread, I can look up the thread ID in the database, and compare the command issuer’s user ID with the one just retrieved; I can also delete the acknowledgement message by ID and issue a delete command to the fediverse post ID. Should someone use the delete command within the bots direct messages, I can look up the provided submission ID and correlate it against the user ID etc.

My database of choice was SQLite, with its low complexity, widespread use, good documentation and ability to store more data than I could ever possibly reach, it was an ideal candidate.

Creating the database happens at startup using the handy IF NOT EXIST method:

# Connect to database.
self.db = sqlite3.connect(DB_LOCATION)
# Initialise db cursor.
db_cursor = self.db.cursor()
# Create initial table
db_cursor.execute("CREATE TABLE IF NOT EXISTS fedi_submissions(thread_id, author_id, ack_message_id, submission_id)")
self.db.commit()

During the start of the submission process, a database entry is made to keep track of existing submissions that haven’t completed yet, partial data of just the thread ID is filled in. It was important to keep track of this, to prevent people from starting multiple submissions in one thread. In all SQLite executions, I’ve used the provided string substitution methods with the SQLite library, to avoid the SQL Injection exploits that would be created by using standard Python string substitution.

# At this point we're definitely going to return a working embed so we'll add a DB entry.
db_cursor = self.db.cursor()
values = (str(thread.id),)
db_cursor.execute("INSERT INTO fedi_submissions (thread_id) VALUES(?)", values)
self.db.commit()

After a confirmed submission, all appropriate data is added to the row:

values = (str(interaction.channel.id), str(interaction.channel.owner_id), str(ack_message.id), str(fedipost.id), str(interaction.channel.id))
interaction.client.db.execute("UPDATE fedi_submissions SET thread_id = ?, author_id = ?, ack_message_id = ?, submission_id = ? WHERE thread_id = ?", values)

During the deleting process, we either look up a row by the fediverse post ID or Discord thread ID, depending on whether the command is being made with an argument.

# Do a Database lookup for an existing submission based on the url provided.
db_cursor = interaction.client.db.cursor()
db_results = (db_cursor.execute("SELECT * FROM fedi_submissions WHERE submission_id == ?", (post_id,))).fetchone()
# If we've reached this point then there are no arguments.
# Do a Database lookup for an existing submission based on the current thread.
db_cursor = interaction.client.db.cursor()
db_results = (db_cursor.execute("SELECT * FROM fedi_submissions WHERE thread_id == ?", (str(interaction.channel_id),))).fetchone()

Later, the row is removed if the post is deleted:

db_cursor.execute("DELETE FROM fedi_submissions WHERE thread_id == ?", (db_results[0],))

Overall, using SQLite was straightforward and the hurdle I’d envisioned was just an illusion.