• Solutions
    • FERC XBRL Reporting
    • FDTA Financial Reporting
    • SEC Compliance
    • Windows Clipboard Management
    • Legato Scripting
  • Products
    • GoFiler Suite
    • XBRLworks
    • SEC Exhibit Explorer
    • SEC Extractor
    • Clipboard Scout
    • Legato
  • Education
    • Training
    • SEC and EDGAR Compliance
    • Legato Developers
  • Blog
  • Support
  • Skip to blog entries
  • Skip to archive page
  • Skip to right sidebar

Friday, June 29. 2018

LDC #91: Find and Replace HTML Script Template

A pretty common task in any document editing environment is running find and replace operations. If you have a character that’s repeated, and you want to replace every instance of that character with a different character, then running a find and replace is the fastest way to do it. What happens if you need to execute these operations many times on different documents? You could run Find and Replace and type the information in each time but it’s often a much easier solution to just write a small Legato function. You can execute the Legato function from the Tools menu to run a common find and replace operation. I know in previous blog posts I’ve done similar scripts to replace wingdings characters with character entities, or to replace certain inline tags with other inline tags. For this week’s blog script, I took those previous scripts and made a more generic version, that can be easily modified by anyone to do different find and replace operations.


You don’t need to know the ins and outs of Legato to modify this script, but we’re going to go over how it works in depth anyway. Modifying the script to do any find and replace you want is easy but understanding what’s going on in at least a general sense is always a good idea. This script will iterate over every HTML tag in your document, and for each element it finds, it will examine the content, and run a find and replace operation on the content of that element. This means it will only look at the content of block tags such as paragraphs, tables, and divisions. It will not find and replace those tags itself, so if you want to replace all paragraphs with divisions, this script will not work (though it could be modified to do so). If you want to replace some characters with others, or wingdings characters with character entities, this script is a great starting point though, because that’s just modifying the content of paragraphs that already exist. Let’s take a look at the script, starting with the setup function.


This setup function is pretty much like any others we’ve talked about. In this case though, the Code, MenuText, and Description values in the item array are just placeholders, and should really be replaced to be more descriptive of what the function will actually do. Other than that, this function can just be left alone.



                                                                        /****************************************/
int setup() {                                                           /* Called from Application Startup      */
                                                                        /****************************************/
    string              fnScript;                                       /* Us                                   */
    string              item[10];                                       /* Menu Item                            */
    int                 rc;                                             /* Return Code                          */
                                                                        /*                                      */
    item["Code"] = "EXTENSION_REPLACE_EXAMPLE";                         /* Function Code                        */
    item["MenuText"] = "&Replace Example";                              /* Menu Text                            */
    item["Description"] = "<B>Replace Example</B> ";                    /* Description (long)                   */
    item["Description"]+= "\r\rExample of Replace Function";            /*  * description                       */
    fnScript = GetScriptFilename();                                     /* Get the script filename              */
    MenuAddFunction(item);                                              /* add the function to the menu         */
    MenuSetHook(item["Code"], fnScript, "run");                         /* Set the Test Hook                    */
    return ERROR_NONE;                                                  /* Return value (does not matter)       */
    }                                                                   /* end setup                            */


The run function is the main function called by the menu hook from the Tools menu. Like all run functions we’ve done so far, it checks the mode definition to make sure it’s running in preprocess mode, then gets on with the execution of the script. It starts by getting the active Edit Window with GetActiveEditWindow, and checks the window type to ensure the window is an HTML Page View window. Now that we have an HTML window, we can get the Edit Object, and create an SGML parser using that object.



                                                                        /****************************************/
void run(int f_id, string mode){                                        /* call from hook                       */
                                                                        /****************************************/
    string              find, replace;                                  /* segment of text                      */
    int                 replaced;                                       /* number of replaced items             */
    handle              window;                                         /* window handle                        */
    handle              sgml;                                           /* sgml parser                          */
    dword               w_type;                                         /* window type                          */
    handle              edit_obj;                                       /* current object of text               */
                                                                        /*                                      */
    if (mode != "preprocess") {                                         /* check mode                           */
      return;                                                           /* return                               */
      }                                                                 /*                                      */
                                                                        /*                                      */
    window = GetActiveEditWindow();                                     /* get the active edit window           */
    w_type = GetEditWindowType(window);                                 /* get the window type                  */
    w_type &= EDX_TYPE_ID_MASK;                                         /* get the window type                  */
    if (w_type != EDX_TYPE_PSG_PAGE_VIEW){                              /* if it's not page view                */
      MessageBox('x',"This function can only be run on an HTML file."); /* display error                        */
      }                                                                 /*                                      */
    edit_obj = GetEditObject(window);                                   /* get current selected edit object     */
    sgml = SGMLCreate(edit_obj);                                        /* create SGML object                   */


Now that we have our objects created, we can run our find and replace operations. I marked this section with comments to indicate where you can modify what is being found and how it is replaced. Define a “find” and a “replace” string, then call the find_replace function. The function returns an integer value of how many items were replaced, so we add the returned value to the total of how many items were replaced. After running the two replace operations, or more if you want, a message box pops up to let the user know how many objects were edited, or if no objects were edited.


    

    /* ******************************************* begin edit area **********************************************/
    find = "&nbsp;";                                                    /* set find string                      */
    replace = "&#160;";                                                 /* set replace string                   */
    replaced += find_replace(edit_obj, find, replace, sgml);            /* execute a find / replace             */
                                                                        /*                                      */
    find = "<FONT STYLE=\"font-family: Wingdings\">x</FONT>";           /* set find string                      */
    replace = "&#9746;";                                                /* set replace string                   */
    replaced += find_replace(edit_obj, find, replace, sgml);            /* execute a find / replace             */
                                                                        /*                                      */
    /* ******************************************* end edit area ************************************************/
    if (replaced != 0){                                                 /* if replaced isn't zero               */
      MessageBox('i',"Edited %d objects in the file.",replaced);        /* display message                      */
      }                                                                 /*                                      */
    else{                                                               /* if there is nothing replaced         */
      MessageBox('i',"Found nothing to replace.");                      /* display message                      */
      }                                                                 /*                                      */
    }                                                                   /*                                      */


The find_replace function does the majority of the work in the script. It takes an edit object handle, find and replace string values, and a handle to the SMGL parser as inputs and executes the find and replace operations on the file. Note that it returns the number of block objects that were edited, not the total number of things that were replaced.


The first thing the function does is set the position of the SGML parser back to the start of the file, in case something previously used the parser. Then it grabs the first element from the parser with SGMLNextElement, and enters a while loop. It stays in that loop until it runs out of elements. If we’re looking at an element with “<HTML” or “<BODY” in it, we can just continue processing after grabbing the next element, because we don’t want to parse over that, we want actual block elements inside the body instead of the body itself.



                                                                        /****************************************/
int find_replace(edit_obj, find, replace, sgml){                        /* execute a find and replace           */
                                                                        /****************************************/
    int                 ix, ex, ey, sx, sy;                             /* counters                             */
    string              contents,segment;                               /* string segment                       */
                                                                        /*                                      */
    SGMLSetPosition(sgml,0,0);                                          /* reset position                       */
    segment = SGMLNextElement(sgml);                                    /* get the next element                 */
    while(segment!=""){                                                 /* while not at the end of the doc      */
      if (FindInString(segment,"<HTML")>(-1)){                          /* if this an HTML tag                  */
        segment = SGMLNextElement(sgml);                                /* get the next element                 */
        continue;                                                       /* go back for next tag                 */
        }                                                               /*                                      */
      if (FindInString(segment,"<BODY")>(-1)){                          /* if this an HTML tag                  */
        segment = SGMLNextElement(sgml);                                /* get the next element                 */
        continue;                                                       /* go back for next tag                 */
        }                                                               /*                                      */


Now we can get the start position of our area to replace by grabbing the end positions of the element we’re on. We’re only replacing the content, so it makes sense to grab the end positions of the current SGML tag. Then we want to use the SGMLFindClosingElement function to advance our parse position to the closing tag, and to get the content of the tag.  If the content of the tag contains the string we’re looking for, we can get the end positions of the content by getting the start positions of the closing tag. Then we run a ReplaceInString function on the content to actually do a replace. All that’s left to do then is to use WriteSegment to write out the content, reset our parser position with SGMLSetPosition, and increment the number of elements we’ve edited.


      
      sx = SGMLGetItemPosEX(sgml);                                      /* get start x                          */
      sy = SGMLGetItemPosEY(sgml);                                      /* get start y                          */
      contents = SGMLFindClosingElement(sgml, SP_FCE_CODE_AS_IS);       /* get the content of the element       */
      if (FindInString(contents, find)>(-1)){                           /* if the target exists in the string   */
        ex = SGMLGetItemPosSX(sgml);                                    /* get end x                            */
        ey = SGMLGetItemPosSY(sgml);                                    /* get end y                            */
        contents = ReplaceInString(contents,find,replace);              /* get new content                      */
        WriteSegment(edit_obj, contents, sx,sy,ex,ey);                  /* write new string out                 */
        SGMLSetPosition(sgml,ex,ey);                                    /* set position                         */
        ix++;                                                           /* increment counter                    */
        }                                                               /*                                      */
      segment = SGMLNextElement(sgml);                                  /* get the next element                 */
      }                                                                 /*                                      */
    return ix;                                                          /* return no error                      */
    }                                                                   /*                                      */


Here's a complete copy of our script file:



//
//
//      GoFiler Legato Script - Find Replace
//      ------------------------------------------
//
//      Rev     06/29/2018

    void                run                             (int f_id, string mode);
    int                 find_replace                    (handle edit_obj,
                                                         string find,
                                                         string replace,
                                                         handle sgml);

                                                                        /****************************************/
int setup() {                                                           /* Called from Application Startup      */
                                                                        /****************************************/
    string              fnScript;                                       /* Us                                   */
    string              item[10];                                       /* Menu Item                            */
    int                 rc;                                             /* Return Code                          */
                                                                        /*                                      */
    item["Code"] = "EXTENSION_REPLACE_EXAMPLE";                         /* Function Code                        */
    item["MenuText"] = "&Replace Example";                              /* Menu Text                            */
    item["Description"] = "<B>Replace Example</B> ";                    /* Description (long)                   */
    item["Description"]+= "\r\rExample of Replace Function";            /*  * description                       */
    fnScript = GetScriptFilename();                                     /* Get the script filename              */
    MenuAddFunction(item);                                              /* add the function to the menu         */
    MenuSetHook(item["Code"], fnScript, "run");                         /* Set the Test Hook                    */
    return ERROR_NONE;                                                  /* Return value (does not matter)       */
    }                                                                   /* end setup                            */
                                                                        /****************************************/
void run(int f_id, string mode){                                        /* call from hook                       */
                                                                        /****************************************/
    string              find, replace;                                  /* segment of text                      */
    int                 replaced;                                       /* number of replaced items             */
    handle              window;                                         /* window handle                        */
    handle              sgml;                                           /* sgml parser                          */
    dword               w_type;                                         /* window type                          */
    handle              edit_obj;                                       /* current object of text               */
                                                                        /*                                      */
    if (mode != "preprocess") {                                         /* check mode                           */
      return;                                                           /* return                               */
      }                                                                 /*                                      */
                                                                        /*                                      */
    window = GetActiveEditWindow();                                     /* get the active edit window           */
    w_type = GetEditWindowType(window);                                 /* get the window type                  */
    w_type &= EDX_TYPE_ID_MASK;                                         /* get the window type                  */
    if (w_type != EDX_TYPE_PSG_PAGE_VIEW){                              /* if it's not page view                */
      MessageBox('x',"This function can only be run on an HTML file."); /* display error                        */
      }                                                                 /*                                      */
    edit_obj = GetEditObject(window);                                   /* get current selected edit object     */
    sgml = SGMLCreate(edit_obj);                                        /* create SGML object                   */
    /* ******************************************* begin edit area **********************************************/
    find = "&nbsp;";                                                    /* set find string                      */
    replace = "&#160;";                                                 /* set replace string                   */
    replaced += find_replace(edit_obj, find, replace, sgml);            /* execute a find / replace             */
                                                                        /*                                      */
    find = "<FONT STYLE=\"font-family: Wingdings\">x</FONT>";           /* set find string                      */
    replace = "&#9746;";                                                /* set replace string                   */
    replaced += find_replace(edit_obj, find, replace, sgml);            /* execute a find / replace             */
                                                                        /*                                      */
    /* ******************************************* end edit area ************************************************/
    if (replaced != 0){                                                 /* if replaced isn't zero               */
      MessageBox('i',"Edited %d objects in the file.",replaced);        /* display message                      */
      }                                                                 /*                                      */
    else{                                                               /* if there is nothing replaced         */
      MessageBox('i',"Found nothing to replace.");                      /* display message                      */
      }                                                                 /*                                      */
    }                                                                   /*                                      */
                                                                        /****************************************/
int find_replace(edit_obj, find, replace, sgml){                        /* execute a find and replace           */
                                                                        /****************************************/
    int                 ix, ex, ey, sx, sy;                             /* counters                             */
    string              contents,segment;                               /* string segment                       */
                                                                        /*                                      */
    SGMLSetPosition(sgml,0,0);                                          /* reset position                       */
    segment = SGMLNextElement(sgml);                                    /* get the next element                 */
    while(segment!=""){                                                 /* while not at the end of the doc      */
      if (FindInString(segment,"<HTML")>(-1)){                          /* if this an HTML tag                  */
        segment = SGMLNextElement(sgml);                                /* get the next element                 */
        continue;                                                       /* go back for next tag                 */
        }                                                               /*                                      */
      if (FindInString(segment,"<BODY")>(-1)){                          /* if this an HTML tag                  */
        segment = SGMLNextElement(sgml);                                /* get the next element                 */
        continue;                                                       /* go back for next tag                 */
        }                                                               /*                                      */
      sx = SGMLGetItemPosEX(sgml);                                      /* get start x                          */
      sy = SGMLGetItemPosEY(sgml);                                      /* get start y                          */
      contents = SGMLFindClosingElement(sgml, SP_FCE_CODE_AS_IS);       /* get the content of the element       */
      if (FindInString(contents, find)>(-1)){                           /* if the target exists in the string   */
        ex = SGMLGetItemPosSX(sgml);                                    /* get end x                            */
        ey = SGMLGetItemPosSY(sgml);                                    /* get end y                            */
        contents = ReplaceInString(contents,find,replace);              /* get new content                      */
        WriteSegment(edit_obj, contents, sx,sy,ex,ey);                  /* write new string out                 */
        SGMLSetPosition(sgml,ex,ey);                                    /* set position                         */
        ix++;                                                           /* increment counter                    */
        }                                                               /*                                      */
      segment = SGMLNextElement(sgml);                                  /* get the next element                 */
      }                                                                 /*                                      */
    return ix;                                                          /* return no error                      */
    }                                                                   /*                                      */
                                                                        /****************************************/
int main(){                                                             /* main method                          */
                                                                        /****************************************/
    setup();                                                            /* run the setup                        */
    return ERROR_NONE;                                                  /* return                               */
    }                                                                   /*                                      */


This script is meant to be a template, from here you can modify the file and add any find and replace operations you want that would be a normal part of the editing process. This means your users can simply run a function instead of having to manually do find and replace operations on code view. The script itself could also be further modified if you want, to run find and replaces on the block tags themselves, instead of just their content, but this is a good starting point to go from.


 


Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC.

Additional Resources

Novaworks’ Legato Resources

Legato Script Developers LinkedIn Group

Primer: An Introduction to Legato 



Posted by
Steven Horowitz
in Development at 14:49
Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments
Add Comment
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

 
   
 

Quicksearch

Categories

  • XML Accounting
  • XML AICPA News
  • XML FASB News
  • XML GASB News
  • XML IASB News
  • XML Development
  • XML Events
  • XML FERC
  • XML eForms News
  • XML FERC Filing Help
  • XML Filing Technology
  • XML Information Technology
  • XML Investor Education
  • XML MSRB
  • XML EMMA News
  • XML FDTA
  • XML MSRB Filing Help
  • XML Novaworks News
  • XML GoFiler Online Updates
  • XML GoFiler Updates
  • XML XBRLworks Updates
  • XML SEC
  • XML Corporation Finance
  • XML DERA
  • XML EDGAR News
  • XML Investment Management
  • XML SEC Filing Help
  • XML XBRL
  • XML Data Quality Committee
  • XML GRIP Taxonomy
  • XML IFRS Taxonomy
  • XML US GAAP Taxonomy

Calendar

Back October '25 Forward
Mo Tu We Th Fr Sa Su
Friday, October 24. 2025
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Feeds

  • XML
Sign Up Now
Get SEC news articles and blog posts delivered monthly to your inbox!
Based on the s9y Bulletproof template framework

Compliance

  • FERC
  • EDGAR
  • EMMA

Software

  • GoFiler
  • SEC Exhibit Explorer
  • SEC Extractor
  • XBRLworks
  • Legato Scripting

Company

  • About Novaworks
  • News
  • Site Map
  • Support

Follow Us:

  • LinkedIn
  • YouTube
  • RSS
  • Newsletter
  • © 2025 Novaworks, LLC
  • Privacy
  • Terms of Use
  • Trademarks and Patents
  • Contact Us