Image Alternative Text using Positron

Having trouble installing Oxygen? Got a bug to report? Post it all here.
susannecm
Posts: 114
Joined: Wed Mar 17, 2010 1:04 pm

Image Alternative Text using Positron

Post by susannecm »

I am currently testing AI Positron. One use case I am interested in is adding Alternative Image Texts to DITA documentation.
I know selecting an image and using the respective command is possible. However, I want to select the topics, for example, in the DITA maps manager, and then create alternative texts for all image elements. Is this possible at all? And is there a way to see the prompts you are using?
sorin_carbunaru
Posts: 416
Joined: Mon May 09, 2016 9:37 am

Re: Image Alternative Text using Positron

Post by sorin_carbunaru »

Hello,

As you mentioned, the current action only works on selection. What you need is another custom action, of a different type, that allows for the AI to analyze all images, not only the selected one. This type is called "update-entire-document-based-on-images" (available in the content completion window for the "type" property).

Here is an example of such an action:

Code: Select all

{
  "id": "generate.img.alternate.text.all",
  "title": "Generate Alternate Text for All Images",
  "categoryId": "Accessibility",
  "type": "update-entire-document-based-on-images",
  "framework": "*DITA*",
  "input-type": "markup",
  "description": "Generate alternate text for all images.",
  "context": "Generate alternate text for all images in the given content and for each of them add the generated text in an <alt> element (not the attribute), which in turn is added as a child of the corresponding <image> element.",
  "parameters": {
    "engine": {"name": "gpt-4o-2024-11-20"}
  }
}
Please note that the instructions (see the "context" property") are not perfect. I wrote them in a 1 minute or so, for demo purposes.

You can then use this action with "AI Positron XML Refactoring" to perform it on a collection of files.

By the way, I also create issue OPA-3058 to perhaps add this functionality (an action that works on all images in a document) by default in Oxygen.

Alternatively, you can use the same instructions from the "context" property in the "AI Positron XML Refactoring" dialog box, in the "Custom instructions" text area. But I personally prefer actions, at least because they are easier to use.

In regards to the prompts we use for our actions, they are not for sharing :mrgreen:.

All the best wishes,
Sorin Carbunaru
Oxygen XML Editor
susannecm
Posts: 114
Joined: Wed Mar 17, 2010 1:04 pm

Re: Image Alternative Text using Positron

Post by susannecm »

Thank you very much for the instructions, Sorin. They will be very helpful.
As for the prompts, I have a specific reason for the question.
I usually work with German documents but sometimes get an English alternative text from the Assistant. How is the language of the source document taken into account? Does the system perform OCR on the image? What happens if the image contains no text, such as a button with a checkmark?
sorin_carbunaru
Posts: 416
Joined: Mon May 09, 2016 9:37 am

Re: Image Alternative Text using Positron

Post by sorin_carbunaru »

Hi,

The Generate Image Alternate Text action uses the vision capabilities of the AI model that is used (GPT-4o, Claude, depending on the Positron distribution and configuration you use).

In the prompt we don't give specific instructions about the language of the generated text, therefore the AI is the one who decides. But I added the issue with ID "OPA-3064" to specifically ask the AI to respect the language of the document. We'll update this thread once the improvement is implemented.

A colleague of mine also pointed out to me that the XML Refactoring tool in Oxygen already has an operation for generating alternate texts for a collection of documents. You can test it by going to the Tools main menu > XML Refactoring > Generate alternate text for images in DITA XML topics. Now, this refactoring operation is not as powerful as the Positron action (it has simpler instructions), but you can use it as an example for a custom operation. In your custom operation you could also give instructions about the language.

The operation is defined by a pair of files called "ai-refactor-image-alt.xml" and "ai-refactor-image-alt.xsl". You can find them at a location such as "C:\Users\[USER]\AppData\Roaming\com.oxygenxml\extensions\v27.1\plugins\oxygen.ai.positron.enterprise.addon\oxygen-ai-positron-enterprise-addon-4.1.0\refactoring" on Windows. On MacOS the location start with "[user_home_directory]/Library/Preferences/com.oxygenxml/...". Read https://www.oxygenxml.com/doc/versions/ ... tions.html to find out how to add a custom XML Refactoring operation.

By the way, you can also read more about the ai:transform-content() function used in the XSL file at https://www.oxygenxml.com/doc/ug-addons ... 1x_jk3_qxb.
Post Reply