Fast way to get total count of found items, resources, and sources

Are you missing a feature? Request its implementation here.
natemyersibm
Posts: 3
Joined: Fri Apr 07, 2023 1:24 am

Fast way to get total count of found items, resources, and sources

Post by natemyersibm »

Hello!
In OxygenXML Author 25.0, I'm looking for a fast and simple way to see a count of total search results across a count of files in a count of sources. I searched the oxygenxml website and forum, and I watched a related video , but I didn't quite see what I'm looking for.

Details:
  • Open/Find Resource gives a count of matching resources, but not an obvious total count of found items when 'In content' is selected.
  • Find/Replace in Files gives a count of found items, but not an obvious total count of matching resources.
  • In Preferences, Open/Find Resource prefs include a mandatory search results limiter. Fortunately, in Open/Find Resource a count still appears of total matching resources despite the limiter, but only the limited number appear in the result set. This limit can be out of proportion with web-based search results that hit the same doc collection.
So my hoped-for functionality is this:
  • One string that gives a total count of found items in a total count of resources. Example: "452 items found in 126 files".
  • An option similar to the above that includes multiple linked sources, along with a details option that could offer a breakdown:
    • Example: 17,362 items found in 887 files across 3 sources (Details)
      • 13,002 items found in 686 files in source MyProject
      • 3300 items found in 100 files in source OtherTeam'sProject
      • 1060 items found in 101 files in source OtherDepartment'sProject
  • In Open/Find Resources, a string that shows how many times an item is found in each matching resource.
  • An option (maybe a checkbox) associated with the results limiter in Open/Find Resource prefs to page large found sets if they exceed the limit. Example: "Use multiple pages to display all results". Even better would be to just do this by default. That way, you can keep the Open/Find Resource window performant and responsive by displaying the first X results (where X is a settable integer, as it is today) and then lazy-loading subsequent found items across additional 'pages' (separated lists) within the Open/Find Resource window.
Thank you!
Radu
Posts: 9286
Joined: Fri Jul 09, 2004 5:18 pm

Re: Fast way to get total count of found items, resources, and sources

Post by Radu »

Hi,

When using the 'In content" search from the Open/Find resources dialog, Oxygen indeed returns to you files which contain inside the searched string (either one or multiple times). Indeed for each of those resources Oxygen does not present to you information about how many times inside each file the searched string occurs.

Please see some more remarks below:
In Preferences, Open/Find Resource prefs include a mandatory search results limiter. Fortunately, in Open/Find Resource a count still appears of total matching resources despite the limiter, but only the limited number appear in the result set. This limit can be out of proportion with web-based search results that hit the same doc collection.
We impose this returned search results limitation for performance reasons and you can control its limit indeed from the preferences page. I do not understand the "This limit can be out of proportion" part. Searching in various places may behave in different ways, web search (depending on the search engine) may for example display the results split in pages and web search may also have limitations imposed to the total number of found items. We chose not to display the "Open/Find Resource" results in pages and to impose this global limit to the returned search results.

About this request:
In Open/Find Resources, a string that shows how many times an item is found in each matching resource.
The indexer we are using does not have this capability, it just returns resources which contain inside once or more times the searched string. In general indexers work like this, they are optimized for speed, for example when google searching you search for a word, Google returns a link to a web page, it does not tell you how many times in the web page the word occurs.
The intended purpose of the "Open/Find Resources" dialog is to find resources containing certain words, you seem to want to create some kind of reports using its functionality but I'm afraid we cannot change the "Open/Find Resources" dialog to also display the number of matches per file.
An option (maybe a checkbox) associated with the results limiter in Open/Find Resource prefs to page large found sets if they exceed the limit. Example: "Use multiple pages to display all results". Even better would be to just do this by default. That way, you can keep the Open/Find Resource window performant and responsive by displaying the first X results (where X is a settable integer, as it is today) and then lazy-loading subsequent found items across additional 'pages' (separated lists) within the Open/Find Resource window.
Sounds like an interesting improvement request, I added an internal issue based on it, pasting the issue ID below for future reference:
EXM-53082 Open/Find Resources - present results in pages instead of imposing returned results limit
Find/Replace in Files gives a count of found items, but not an obvious total count of matching resources.
Yes, but here you may have more flexibility to create a custom report. For example after Find/Replace in Files shows the results in the Results view, you can select all those found matches, right click and choose "Save results as XML". Once you have the XML document containing all matches maybe you can create an XSLT stylesheet which applies over the XML to produce a report for you.

Searching with Find/Replace in Files versus Open/Find Resources is of course not identical, "Open Find Resources" can find for you resources which contain a set of words even if the words are not consecutive to each other in the original document...

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 918
Joined: Thu May 02, 2019 2:32 pm

Re: Fast way to get total count of found items, resources, and sources

Post by chrispitude »

If you save the results as XML as Radu suggested, here is a stylesheet that computes the total number of matches and the files with the most matches:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xpath-default-namespace="http://www.oxygenxml.com/ns/report"
	exclude-result-prefixes="#all"
	version="3.0">

  <xsl:output indent="yes"/>

  <!-- remember matches by their file name -->
  <xsl:key name="matches-by-file" match="incident" use="systemID"/> 

  <xsl:template match="report">
    <results>

      <!-- show total match count -->
      <total-count><xsl:value-of select="count(incident)"/></total-count>

      <!-- show matches by file, sorted by number of matches in file -->
      <count-by-file>
        <xsl:for-each-group select="incident" group-by="systemID">
          <xsl:sort select="count(key('matches-by-file', current-grouping-key()))" order="descending"/>
          <file href="{current-grouping-key()}">
            <count><xsl:value-of select="count(key('matches-by-file', current-grouping-key()))"/></count>
          </file>
        </xsl:for-each-group>
      </count-by-file>
    </results>
  </xsl:template>
  
</xsl:stylesheet>
You could even paste the results XML and the stylesheet above into this site:

.NET XSLT Fiddle

and experiment interactively with different ways of querying the data.
natemyersibm
Posts: 3
Joined: Fri Apr 07, 2023 1:24 am

Re: Fast way to get total count of found items, resources, and sources

Post by natemyersibm »

Thanks Radu for the thorough reply, and thanks chrispitude for the sample XML stylesheet!

I used Open/Find Resource as a possible place to display roll-up data because it's available any time as a View, but maybe I can simplify my request by focusing on the Find/Replace in Files function and the Results window. Currently, when no 'Grouped by' option is selected, the Results window shows several column headers: Description - (number of items), Resource, System ID, and Location. The Results window contains all the information needed to calculate and display a roll-up count, so it would be great, especially for find operations across large collections, if the OxygenXML UI could present that roll-up data somewhere rather than requiring an export, stylesheet, or other custom report.

FWIW, it would also be great if the Results window could behave as other Views, and perhaps retain recent find operation tabs.

I'm making this request because the advent of LLMs seems to be quickly changing content development to become much more collaborative and business-oriented, and may give rise to a more DevOps-style culture (think "DocOps") that emphasizes content administration over authorship. In such an environment, instantaneous on-screen real-time roll-up data that can be used to rapidly assess and respond to content needs across large doc collections (e.g, in an online meeting) could become more important, and I'd like to be ready.
Last edited by natemyersibm on Fri May 05, 2023 9:41 pm, edited 1 time in total.
Post Reply