A Hybrid Approach to Fake News Detection

Author: Deepak Battula

In the current information landscape, there has been a notable spike in fake news and misleading online content. Frustrated by the challenge of differentiating genuine news from unreliable sources, I began researching for a solution.

Upon reasearch I realised that I can make a working model on this made me start my reasearch on this specific project. And in my journey I have realised that a cybersecurity engineer can be a great asset.

Building the Team

As I was strategizing this project, my reunion with an old friend, Mallikarjuna has proved to be invaluable. During a discussion about our career goals, he mentioned his deep interest and recent internship in cybersecurity. It was a perfect match.

I reached out to Mallikarjuna with a invitation to collaborate, outlining his potential role and the cybersecurity aspects of the project. His enthusiasm and keen interest in building a collaborative project have been a driving force. We are now in the process of defining our roles and structuring the project workflow based on our respective skills.

A screenshot of the project invitation sent to Mallikarjuna.

The project brief sent to Mallikarjuna, detailing the cybersecurity responsibilities.

The Technical Game Plan

Our solution attacks the problem from two angles: analyzing the content itself with NLP, and validating the source with cybersecurity checks.

Part 1: The Content Analysis (NLP & AI)

This is the core of the detector, focusing on the news article's text. Our model will analyze the content and provide explainable insights into its authenticity.

Explainability: Using tools like SHAP, the model won't just give a "fake" or "real" score; it will explain why it made its decision.
Keyword Analysis: The user will be presented with keywords that most strongly influenced the model's conclusion.
Accuracy Estimation: We will provide an estimation of the model's confidence in its verdict.

Part 2: The Source Analysis (Cybersecurity Shield)

An article's authenticity is heavily tied to its source. Mallikarjuna's expertise will be crucial here, where we will programmatically check the technical trustworthiness of the URL.

Domain Age: Recently created domains are often a red flag for misinformation campaigns.
Website Ranking: Using established metrics to gauge the site's authority and reputation.
SSL Certificate Validity: We will check the type of SSL certificate (DV, OV, EV), which indicates the level of identity verification the website owner has undergone.
Security Hazard Warnings: The tool will issue clear warnings for unsafe URLs known for data loss risks or other security hazards.

The final output will be a clear, tiered ranking (e.g., S-Tier, A-Tier for trustworthiness) that combines both the content and source analysis to give the user a comprehensive and understandable verdict on the news they are reading.

Our team is excited to begin development and will be documenting our progress in form of blogs. Stay tuned.