11 min read
The Benefits of Applying Semantic Visions’ Screening and Monitoring Services
Content:
- Ontology and the team
- The semantic web (system), its hierarchy and extent
- Machine Learning and Natural Language Processing
- Scenarios and their creation
- Screening and Monitoring
- Alternative data
- Semantic Visions and its services
Hello, Anna, nice to meet you. So first, tell me about your job and career in Semantic Visions and speak about your position. What is your professional background?
I am Italian and my background is in linguistics, I graduated in Slavic languages and Italian. I am also fluent in Czech and I have an advanced level in Spanish and French, besides knowing other languages. I have worked in many jobs, among others as a professor, translator and interpreter, have co-authored a dictionary, designed an online language course and worked for some years as director of studies at the Italian institute in Prague.
My lifelong passion for languages and systematization brought me to Semantic Visions (SV), where I started working as an Ontology Specialist for Italian. Today I work as the Head of the Ontology Department and we deal with the natural language processing (NLP), the creation of which we call scenarios and the quality assurance of the products. All of this across 12 languages.
Did I hear correctly that your department covers 12 languages?
Yeah, we cover 12 languages. Our working language is English, of course, but in my team there are 12 expert individuals who come from all over the world: we have colleagues from Colombia, Iraq, Portugal, Russia, Japan, Korea, China, the Czech Republic…. Sometimes it can be a challenge. But it is a pleasure to lead such a team, where each member contributes and everyone can present their original point of view.
And, moreover, there is the added value of having such a spread of different cultural backgrounds.
That sounds exciting. So what qualifications and skills does an ontologist need?
Each ontologist is an expert in their own language, frequently with an advanced knowledge of other additional languages. The team shares the responsibility for the structure and the content of the (so-called) topical categories and event types. Each of these categories and events exist across all languages with synchronized meaning.
So, you are basically referring to a network of categories and event-types that serves to describe the world of business operations?
Well, I guess you could put it that way.
But we cover much more than just business. We have created an interconnected web of over 10,000 topics that operate across languages and countries. Each topic corresponds to a category that has its own expert rule and includes all the results from the subtopics (subcategories) through aggregation.
At the top level it categorizes into Events, Business, Politics, Science and Technology, Health, Society, Entertainment, Sports, and World.
The many branches can drill down from the most generic into very specific topics such as Reputation Risks, Workplace Health & Safety Negligence, Heart Rate Monitors, or Nanotechnology, Resource Sustainability, Islamic Revolutionary Guard Corps, Population Trends, Space Flights, Nuclear Weapons, Humanitarian Aid, Oxfam, Biopharmaceuticals, National Congress Party, Security Camera Systems, Aviation Accidents, Product Recalls… you name it…
Wow. It looks like a massive system that would require an ongoing process of refining and maintaining to keep it up to date. Is that right?
Absolutely!! That’s why we dedicate a substantial part of our time to Quality Assurance, performed on a daily basis. We continuously check our processing line results and fine tune our topical categories and event types rules. Thanks to this minute Quality Assurance we guarantee that data results are reliable at all times and well comparable despite all linguistic and cultural peculiarities.
For some languages we are also training new language models for NER (Named Entity Recognition), which is done in a close collaboration with our Machine Learning (ML) team.
Machine Learning sounds much like Artificial Intelligence… Do you also make use of it in your department?
Sure, in SV we are systematically looking at ways that we can apply AI and are implementing it into various processes. We have a devoted team that focuses on AI implementation and testing. Earlier I spoke about the topical categories and event types. These are extracted through natural language processing (NLP) and become part of the articles metadata.
So, that means that some part of your work is automated, correct?
Correct. Natural Language Processing is a field of AI concerned with the interaction between computers and humans through natural language. We develop algorithms and techniques that enable computers to understand, interpret, and generate human language in a meaningful way.
Regarding language understanding, the NLP we apply involves tasks such as parsing, part-of-speech tagging, named entity recognition, and syntactic analysis. We break down human language into its constituent parts in order to understand their roles and relationships within a sentence or document. The point is to interpret the meaning behind words, phrases and sentences.
Also, we have to figure out issues like word sense disambiguation, that is resolving the meaning of ambiguous words based on their context, and also sentiment analysis, which determines the sentiment conveyed in a piece of text.
Sounds like a rather sophisticated and complex job. So, to what goal does your company employ these algorithms and techniques?
In general, I can say that our NLP involves extracting structured information from unstructured text data. This includes tasks like named entity recognition (NER), which identifies and classifies entities mentioned in text (such as people, organizations, and locations), and relation extraction, which identifies relationships between these entities or between entities and our topical categories.
In fact, Semantic Visions NLP includes tasks such as text classification, where documents are categorized into predefined classes or categories based on their content, and text clustering, where similar documents are grouped together, based on their similarity.
So far, you have presented the metadata extraction. Can you familiarize us with the concept of what you earlier called “scenario creation”?
Semantic Visions’ main added value is related to the end part of our data processing, in which we create a so-called scenario, by clustering articles with the same signal resulting from the interconnection between an entity and an event type, within a given timeframe and across all 12 languages.
As I already explained to you, in a certain timeframe (hourly or daily) our process continuously clusters (groups) articles, from different sources and languages, which contain the same event type and the same entity, guaranteeing a relation between them. Scenarios are the results of this process.
They can contain different entities, like companies, commodities, industries or geo-locations. The nature of the entity-type influences also the list of the related event types, because not all events work properly for all entity types. Just to give you an example, event types like ‘War’ or ‘Military Invasion’ are suitable only for geo-locations, while ‘Business Shutdown’ or ‘Deteriorating Financial Situation’ are company-bound event types.
Are you saying that a scenario with a meticulous set-up guarantees relevant results?
Oh, definitely!. The use of scenarios greatly increases the accuracy of the data we provide to clients (precision of 95%) also thanks to the fact that they are highly configurable. Multiple semantic and statistical parameters, which are generally set up based on the clients’ interest and needs, render possible the generation of the right scenario with the goal to prevent or mitigate risks or to signal opportunities, on the other hand.
Depending on the concrete use case, scenarios can be the results of data monitoring or screening.
And finally, could you explain to our readers the difference between screening and monitoring?
For sure. Depending on our clients’ needs, the processes basically deal either with historical data or near real time data. The latter delivers continuous data monitoring; while the former delivers historical screening.
Data screening can be utilized each time the analysis of historical data is required in order to gain a retrospective vision of risks and opportunities related to entities (e.g. companies). Eventually, you may obtain a data model which helps to predict any future development. Let me illustrate this: a company receives a historical data feed that encompasses compliance-related events and incidents from the past. This historical data assists the company in assessing their compliance with the new legislation, identifying past issues, and taking steps to address them.
Data monitoring of OSINT (Open-Source Intelligence) sources is paramount for our clients to stay abreast of pivotal market trends and events that could influence their operations. Our data feeds are meticulously crafted to align with our clients’ specific needs, spanning three crucial domains: Risk Management, Research and Other Data Requirements and Opportunity Identification. Within each domain, an array of categories empowers our clients to address specific challenges, including Supplier Management, Third Party Risk Management, ESG Compliance, Alternative Data Processing and the Innovation Opportunities.
I guess these automated tools can save time, manpower and money. Now, how would a business benefit from screening?
Of the many instances where it is used, I will mention just a couple. Imagine a business that plans to expand and needs to close new deals. The screening during onboarding strengthens the onboarding process- we gather different types of events related to the potential partner, in the vast context of business operations… Remember the categories I talked about earlier? It is a thorough checkup from many different angles.
So we help to establish a robust foundation for the business to make informed decisions when engaging with new entities, thereby minimizing potential risks associated with uncharted partnerships.
Another example of screening is the technology’s efficacy in retrospectively identifying and assessing past events. By evaluating historical data, the company can gain valuable insights into potential risks, providing a practical validation of the solution’s effectiveness. This process not only instills confidence in the adopted technology but also lays the groundwork for future risk assessments and proactive mitigation strategies.
Can you give us an example, a use case, of how SV monitoring is put to practice serving the different industries?
Let me give you a couple of examples on how SV’s data data monitoring may be of use .
The first use case is related to the monitoring of supply chain components. Certainly, it is crucial for industries involved in manufacturing and consumer goods to stay informed in real-time about potential risks and opportunities associated with supply chain threats/components. Our solution provides continuous monitoring of the supply chain to track compliance with the new EU sustainability legislation too. This includes real-time alerts for regulatory incidents or potential breaches, giving the company the ability to respond promptly to emerging compliance issues.
This enables them to make informed business decisions, hopefully mitigate the risks, and adhere to regulations in the jurisdictions where they operate.
Another example concerns investment management firms and hedge funds, for which real-time data on companies and products is crucial. In this respect, Semantic Visions, by monitoring 90 percent of online news in 12 languages, can provide robust alternative data to be used to guide investment strategies and make investment decisions.
I’ve heard the term already but I am not quite sure what alternative data represents. What does it stand for?
Basically, alternative data refers to information that is gathered from non-traditional or unconventional sources like, for example, open sources (OSINT) and their analysis related to specific companies or industries.
By utilizing alternative data sources, investors can potentially gain a competitive advantage by uncovering unique insights that may not be reflected in traditional financial metrics or analyst reports.
Thank you for the explanation. Now, would you briefly describe the services Semantic Visions delivers?
Let me first rewind the clock and give you some background. Semantic Vision was founded in Prague in 2011 with the idea to process open source data that we would turn into structured topic based data. Therefore, after all these years, SV has over 10 years of historical data.
Today, our solutions include source collection, source analysis, natural language processing, and proprietary artificial intelligence. We process over one million documents on a daily basis, extracting entities and detecting more than 10.000 topical categories and over 520 event types.
Well, to be honest, that sounds like a colossal job and a big aspiration to be abreast of global events. Spanning the globe and detecting all the events, moreover, in real time!
Yeah, totally. I believe that our company has the expertise, the right data, and a great team to provide this kind of service. And it is a mission of my team to be part of all this!