A guide to working with external data
- Last Modified:
- 23 May 2019
- User Level:
- Power User
Content management isn't just limited to the content in TERMINALFOUR. You may have databases, XML or CSV files that contain data you would like to publish as content on your site.
Since there's more than one kind of external data, there's more than one way of integrating third-party content in TERMINALFOUR. The purpose of this article is to help you choose the one that works best for you.
Below we'll look at the different options available and list the pros and cons of each but before that let's look at some things you should consider before choosing one:
Before you choose an integration method there are six factors to consider:
Content can come in a variety of formats – whether it's directly from the database or through an interchange format like JSON.
Here are a few examples of formats, with the supported integration methods:
- CSV: External Content Syncer
- Excel: Excel to HTML T4 Tag
- XML (including web services that return XML): Data Object for RSS, or External Content Syncer
- Database: Data Object for Database, or External Content Syncer
- HTTP/https/HTML: Import URL, Web Object, or Packages
- API or Web Service (e.g., JSON): Custom code within the content or Content Layouts
If you need real-time content updates, it's best to use either web services or server-side scripting to interact with the API of the external system or wrap the external application with the website header and footer.
Update content on website publish
If the external content update schedule is similar to, or the same as the website publish schedule, then consider
All of these retrieve the content at the time of publishing and put the content in the published page.
Bear in mind that retrieving large amounts of content at publish can negatively impact publish times. You can configure the Web Object to cache the output and only retrieve updated content less frequently.
Daily or twice daily updates
If the content is only updated once or twice a day, the External Content Syncer can be scheduled to run at specified intervals to update the content within TERMINALFOUR. In this case, the sync interval is independent of the publish schedule.
For one-off content imports you can use:
- Packages to bulk upload Media Items or import content as HTML
- The External Content Syncer can be configured to import content just once
The volume of content may affect the terms of your TERMINALFOUR license and can also affect publish times.
The External Content Syncer will create a Content Item for each record within the source database/XML/CSV. If your TERMINALFOUR system license is based on the number of Content Items within the system, this can have implications for your license. If this is a concern, check the Content tab on the About page, or if you're still not sure whether it will be an issue for you, contact your Account Manager to check.
The Import URL will only include one page/URL per Content Item, requiring a content item for each page/URL to be published. In comparison, a Web Object can be configured to follow links within the target URL (that match a particular pattern or depth), allowing multiple pages/URLs to be published for one content item.
If using the Import URL, Web Object or Data Object, the content is retrieved at the time of publishing and put on the published page. This can have publish performance implications for large volumes of content, although the Web Object can be configured to cache the output, to only retrieve updated content at a less frequent schedule than the publish.
Consider how and where you intend the external content to appear on your website.
Web Objects and Import URL tags retrieve the raw HTML of the target URLs and incorporate that into the published page. This is seen as one block of HTML and the content within them cannot be seen by Navigation Objects. It is therefore not possible to re-use the content elsewhere on the site. In addition, the content cannot be published in any other formats (e.g. XML or JSON) since it is seen as a block of HTML code. This would be similar for integrations that use custom code or wrap the application in a header and footer for the website.
External Content Syncer
The External Content Syncer will create a Content Item for each record within the source database/XML/CSV. This content is published using Content Layouts, so you have full control over the format and structure. You can also display the data on other parts of the site by retrieving it through Navigation Objects. Since the content is in TERMINALFOUR, it can be published in multiple formats, if it is to be re-used by other applications on the site. This would often be the case for Course content (for a course search) or Event content (for an Event Calendar).
Consider the nature of the content being displayed, and the impact on operations if that data were to become unavailable on your site due to routine system maintenance, or due to unplanned outages.
Low impact solution
When the External Content Syncer attempts to sync with an unavailable data source, the most recent version of the content remains in TERMINALFOUR and will continue to publish on the website. As a result, there is no impact on your site other than out of date content displaying until the next successful scheduled or manual sync. Once the source database/XML/CSV becomes available again, an Administrator can perform a manual sync of the content to refresh the content in TERMINALFOUR.
In comparison, if the source data is unavailable when the Import URL, Web Object, or Data Object connects to refresh the data, the objects will not retrieve the data and will not generate an output. Unlike the External Content Syncer, these objects do not cache the data so they cannot use a previous version of the content in the case of unavailability.
If the application or data becomes unavailable where custom code is written, or the application is wrapped in the website header and footer and linked to from your website, website visitors will most likely be presented with an error message or with a blank page. Custom code would be written with error handling in the case of unavailability to limit the impact on the visitor.
No connection to source data
If using the External Content Syncer, Import URL, Web Object or Data Object, the content that is retrieved is embedded within the HTML of the published page(s). In this case, the end website does not contain any link back into the source data. This helps to maintain data security since website visitors cannot use that connection to hack into the source data to retrieve additional sensitive data or to update the data.
Consider security when using custom code
Where some custom code is written or the application is wrapped within the website, the application (or parts of the application) will need to be made available externally (or publicly) to allow the data to be retrieved or displayed. This may require the IT team that manages the data source to open firewalls to make this possible, which may raise security concerns. It is essential that any code that is written to interact with these applications is written securely to prevent malicious attacks on the data.
The following options are the most common ways that we integrate with other sources of data or external systems to display that content on the web.
The External Content Syncer is used to import and update content from an external data source like a database, XML file, or even a CSV file. When content is synced, each row in the data source is added to TERMINALFOUR as a content item which can be published on the site, and displayed via Navigation Objects, as with any standard Content Item. Syncing can ensure that the link between the external data source persists. This means that updates made to content in the external data source are reflected in the synced content in TERMINALFOUR.
For instance, if your staff profile information was stored on an external database, that content can be synced with TERMINALFOUR so it can be published on your website. The external database may not include photos but these can be managed directly within TERMINALFOUR.
Content edits are synced in one direction only – from the data source to TERMINALFOUR – so while synced content can be edited in TERMINALFOUR; those edits will not update content back in the data source. Edits to synced content in TERMINALFOUR will be overwritten by changes to content in the data source when the next sync occurs.
|No impact on publish performance, apart from the additional publish time for the extra content.||If there is a large amount of data, and your TERMINALFOUR license is based on the number of content items, this can impact on the license.|
|External data is like standard TERMINALFOUR content that can be displayed elsewhere on the site, using Navigation Objects.||Out of the box, each Content Syncer that is configured either imports all content into one Section (which is not ideal for large amounts of content), or creates a flat structure, with one Section for each Content Item, which can create a large number of Sections, all with the same parent Section. For any other structure, a custom Site Structure Creator plugin is required.|
|The format of the published content is entirely customizable through standard Content Layouts (or Programmable Layouts) and can be published in multiple formats, if required.|
|The schedule that runs to update content is entirely configurable and is independent of any other scheduled tasks in TERMINALFOUR (e.g. publish).|
|If the external data source is unavailable at the time of the sync, the content items within TERMINALFOUR are left as-is and will continue to publish to the site. They will be updated and refreshed when the content is next set to sync (or by performing a manual sync).|
|It is possible to force a manual update of the content if required.|
|There is no connection from the end published website to the external data, ensuring the security of the source data.|
The Import URL T4 Tag allows content to be taken from a web page (HTTP or https) and to be added to the HTML of the published page. Each time the site publishes, the content is taken from an HTTP or https location and embedded within the HTML of the published page. The Import URL tag is added to a Content Layout and then the content is added to the site using the Content Type.
|A straightforward way to include an external page within a site.||Each Content Item will only include that one page/URL, requiring a Content Item for each page/URL to be published.|
|Content on the published site is refreshed regularly, with each site publish.||The entire HTML page of the target URL is included in the page, including the
|It is possible to force a manual update of the content, if required, by running a publish.||Possible publish performance impact for large amounts of data since the URL is contacted and content retrieved on every publish.|
|There is no connection from the published website to the external data, ensuring the security of the source data.||The HTML for the content on the published page is controlled by the HTML of the target source/system. Any changes that are required to the format of the content needs to be made within the target source/system and cannot be controlled through TERMINALFOUR.|
|If the external URL is not available when the broker is called at publish time, the published page will display blank or empty (it cannot use a previous version of the external page).|
A Web Object T4 Tag is a broker similar to the Import URL Broker; however, the Web Object can be configured to follow links on the target URL to the linked web pages and can be configured to cache the external data, allowing it to use the cache rather than retrieving the data on every publish. A Web Object T4 tag is added to a Content Layout and then content is added to the site using the Content Type.
|Content on the published site is refreshed regularly and can be configured to cache the content, preventing a refresh with every site publish.||Possible publish performance implications for large amounts of data when the Web Object updates the content (instead of using the cache).|
|Can be configured to follow links on the target page that match a specific pattern to include associated linked pages. A single content item pointing at a listing could, therefore, generate multiple pages on the published site.||The HTML for the content on the published page is controlled by the HTML of the target source/system. Any changes that are required to the format of the content are made within the target source/system and cannot be controlled through TERMINALFOUR.|
|Can be configured only to include the HTML within the target page
||It is possible to force a manual update of the content, if required, by running a publish, but if the object is configured to cache the output, there is no simple way to force an update of that cache.|
|There is no connection from the end published website to the external data, ensuring the security of the source data.||If the external URL is not available at publishing time, when the broker is called, the published page will be blank or empty (it will not use a previous version of the external page).|
|The content is included into the published page but cannot be re-used or displayed on other pages of the site (for example using Navigation Objects) since the HTML content is not content within TERMINALFOUR.|
The Data Object for Database or Data Object for RSS is a T4 Tag that connects to an external database or RSS feed each time the site publishes. On publish, the content is pulled from the database/feed and inserted into the HTML of the published page. The presentation of the content can be controlled within the Content Layouts. A Data Object T4 Tag is added to a Content Layout and content is added to the site using the Content Type.
|A straightforward way to include external data within a site.||Possible publish performance implications for large amounts of data since the database/RSS is contacted and content retrieved on every publish.|
|It is possible to force a manual update of the content, if required, by running a publish.||If the external database/URL is not available at publish time, when the broker is called, the published page will appear blank or empty (it will not use a previous version of the data).|
|There is no connection to the external data from the published website, ensuring the security of the source data.||The content is included into the published page but cannot be re-used or displayed on other pages of the site (for example using Navigation Objects) since the HTML content is not content within TERMINALFOUR.|
|The HTML for the content on the published page is controlled within TERMINALFOUR, allowing flexibility around the layout of the content. The content could be published in any format that is required (e.g., JSON or XML).|
|A single Data Object can retrieve multiple rows/items, which are published to the site, but it will only count as one Content Item towards the TERMINALFOUR system license (if your license is based on the number of Content Items).|
The Excel to HTML Broker T4 Tag takes an Excel spreadsheet from within the Media Library and publishes the data within that file as an HTML table on the page. The Excel to HTML tag is added to a Content Layout (to create the table) and to a Page Layout (to format the table) and the content is added to the site using the Content Type.
|Easy for users to create complex tables in Excel that can be published onto your website as HTML tables.||The file must exist in the Media Library, and there is no way (out-of-the-box) to update it automatically. This does not allow constant updates to the data without refreshing the file in the Media Library.|
|There is no control over the format of the content - it is an HTML table that matches the layout in the Excel file.|
With Packages, you can batch import content in HTML format or Media items into TERMINALFOUR. This is useful for once-off imports of exports and not for a continuous update and data sync.
|Large batches of content like text or media can be imported into TERMINALFOUR, saving the time and energy involved in manually migrating the content.||Provides a once-off import and does not allow for updates to existing content/Media items.|
Only works with importing:
Code (e.g. PHP) can be added as content, or within Content Layouts, that connects to external APIs or Web Services to retrieve data and display it on the page. The data is retrieved when the website visitor loads the webpage and so does not get imported into TERMINALFOUR. This may have implications for page load times but will provide a real-time view of the data.
A potential use-case might be a Course Details Content Type allowing a content author to add marketing information about the course and link it to a Course ID. The Content Layout could contain code that uses the Course ID to retrieve course content about that course to display it on the page (for example the title, credits, location).
|The external content is retrieved when the website visitor visits the page. This allows content updates within the external system to be updated on the website in real time.||Requires skills to develop the code to connect to the data source and retrieve the relevant content for display.|
|Does not affect publish performance.
The external content is only retrieved when the website visitor loads the page and is not retrieved when the site publishes.
|Since the website is connecting to the external data source, security measures must be considered to ensure that website visitors can't hack into the external data source to retrieve additional sensitive data or update the data in any way.|
|The layout of the content is entirely customizable within the code that is retrieving and displaying the content. This can be managed in TERMINALFOUR, allowing updates to the format to be made within TERMINALFOUR.||The data source must be available externally (or publicly) to allow the website to connect to the data source and retrieve the content. You might need to contact the administrator of the data source to open firewalls to make this possible.|
|If the external system is unavailable (due to system maintenance or downtime), the website will be affected and the data on the website will become blank. This can be detrimental if essential webpages are left for long periods of time.|
An external application can be wrapped with a header and footer to appear to your website visitors as if it is part of the website, although it is a separate application. This would be common for more complex applications that come with a web interface and allow website visitors to interact with the application to submit forms, search content, etc.
|The external content is retrieved when the website visitor visits the page. This allows content updates within the external system to be updated on the website in real time.||The data source will need to be available externally (or publicly) to allow website visitors to use the application. This may require the team that the IT team that manages the data source may need to open firewalls to make this possible.|
|There is no additional publish performance or system license implications since the external content is only retrieved when the website visitor loads the page and is not retrieved when the site publishes.||Since the external application needs to be available externally (or publicly), security needs to be considered to ensure that website visitors are not able to hack into the external data source to retrieve additional sensitive data or update the data in any way.|
|If the external system is unavailable (due to system maintenance or downtime), the website will be affected and the data on the website will become blank. For core website pages, this can be detrimental if left for long periods.|
Sometimes integration solutions can seem overwhelming, and so if you're still unsure, or need a bit of guidance, reach out to your Account Manager, ask our support team, or even ask on our Community Slack Channel, and someone will be able to help.Back to top