Intro to NotebookLM
One of the tools that I found recently and I keep using more and more each day is NotebookLM from Google Labs. NotebookLM is a great tool for learning new topics, researching large amounts of data, summarizing data. The data is organized into notebooks, each notebook can contain multiple sources of data.You can upload data in various formats (web URLs, Slides, PDFs, text files, audio data, YouTube videos, …) and then use the tool to analyze them.I usually use it to ask questions about the data or summarize the data and/or extract pieces of information.The most useful feature for me is that when you ask a question it will provide an answer with numbered links to the sources so you can double check if the answer is correct or not.
Here, I’m opening the notebook Introduction to NotebookLM and ask the question What is the maximum number of words a notebook can contain? and you can see that it answered with a link to the paragraph that lists the Source limitations. (Each source can contain up to 500,000 words.)
A WordPress hack
A few days ago I had the idea of trying to see if it’s possible to analyze WordPress logs with NotebookLM (or with LLMs in general). That happened after a friend’s blog was hacked and I spent a lot of time looking at the logs trying to make sense of them. I was thinking, there must be an easier way to do this, LLMs are great at analyzing structured data.So, I setup a test WordPress blog, made it public on the internet for a few days to get some background internet noise logs (to make it as realistic as possible). And then, I hacked my test blog with the exploit my friend’s blog was hacked with (to reproduce the situation). The exploit is CVE-2023-6961, it’s related to the WordPress plugin WP Meta SEO. The exploit is well described in this blog post from Fastly.
This is a stored XSS vulnerability via the Referer header, you send an HTTP request with an XSS payload on the Referer header.
GET /index.php/2024/10/20/973498739847943/ HTTP/1.1
Referer:
Host: blog.thx.bz
Accept-Encoding: gzip, deflate, br
Accept: /
Accept-Language: en-US;q=0.9,en;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.100 Safari/537.36
Connection: close
Cache-Control: max-age=0
When the administrator logs into the WP Admin dashboard and visits the WP Meta SEO 404 & Redirects page, the XSS payload gets executed. For the payload I’ve used some JS code that will create a new WP admin user similar to what happened in my friend’s case.
If you are interested to see the exact logs that I’ve uploaded into NotebookLM, you can find them in this Kaggle dataset.
Great, now we have the WordPress Hack Apache Access logs. Let’s load them into NotebookLM and see what we can do with them.
What I’ve uploaded to NotebookLM is a file named apache_access_log.txt (as it only accepts text files) that contains 1076 lines of access logs logged over 3 days. It’s possible to upload much more data, the Gemini 1.5 Pro model used by NotebookLM supports up to 2 million tokens/words.
178.215.238.68 - - [19/Oct/2024:00:03:17 +0000] "GET /login.rsp HTTP/1.1" 404 453 "-" "Hello World"
167.99.55.110 - - [19/Oct/2024:00:13:56 +0000] "POST /wp-cron.php?doing_wp_cron=1729469636.1745829582214355468750 HTTP/1.1" 200 259 "-" "WordPress/6.6.1; http://blog.thx.bz"
143.110.222.166 - - [19/Oct/2024:00:13:55 +0000] "GET / HTTP/1.1" 200 15340 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Mobile/15E148 Safari/604.1"
162.158.154.86 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-includes/certificates/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.115.200 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-admin/user/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.230.7 - - [19/Oct/2024:01:03:12 +0000] "GET /.well-known/acme-challenge/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.230.7 - - [19/Oct/2024:01:03:12 +0000] "GET /.well-known/acme-challenge/plugins.php HTTP/1.1" 404 490 "-" "-"
162.158.158.139 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-includes/customize/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.115.200 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-includes/SimplePie/plugins.php HTTP/1.1" 404 489 "-" "-"
162.158.154.86 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-admin/css/colors/blue/plugins.php HTTP/1.1" 404 489 "-" "-"
...
1076 lines of logs
Analyze WordPress logs with NotebookLM
Now that we have the logs uploaded into NotebookLM, let’s try to analyze the data. Let’s start with an “easy” question.
What is the IP address of the WordPress administrator?
I’m asking what is the IP address of the WordPress administrator to see if NotebookLM can understand the data and extract some information from this data:
Great answer, not only because it correctly determined that IP address of the WP admin (80.97.26.93), but it also was able to figure out that initially the user logged on as another IP (138.199.53.226) and then switched to the final one (80.97.26.93).That’s pretty impressive, I was curious to know how it knew to correlate these two IP addresses.
So, I’ve asked next:
How do you know that these 2 IP addresses (80.97.26.93 and 138.199.53.226) belong to the same user?
Again a great answer, it noticed the Identical User Agent and Sequential Activity.That’s pretty useful already. Let’s ask more complicated questions, to try to identify what HTTP requests could be related with the creation of a new WP Admin account (this is what we know happened in my friend’s case—a new WP user was created).
List all the IP addresses and logs that generated HTTP requests that could have resulted in a new WP admin user creation
Interesting. It figured out that our own WP admin IP address was used to try to create a new WP admin user.This is pretty interesting as it kind of hints to a Stored XSS vulnerability.The most obvious way our own IP address could be used to create a new admin user is if we visited an administrative page where attacker JS code was injected and our own user (from our own IP address) executed the attacker’s injected code. Let’s ask a more complicated question trying to pinpoint the WP plugin that was involved in the exploit.
What WP plugin could have been exploited to create a new WP admin user?
I’ve also added the following additional information to the question to help the LLM answer the question (as we already know what WP plugins we have installed):
What WP plugin could have been exploited to create a new WP admin user?
Take into consideration the following known facts:
The following WordPress plugins are installed in my WordPress installation:
akismet
wp-fail2ban
wp-meta-seo
hello.php
I’ve basically asked it to identify the WP plugin that could have been used to create a new WP admin user and provided a list of installed WP plugins.
Wow, it was able to identify the vulnerable WP plugin (WP Meta SEO) that was used during the exploit.Not only that but it was also able to identify the WP Meta SEO admin page where the exploit happened.
The answer contains the following section:
These attempts originated from pages related to the WP Meta SEO plugin, specifically the “metaseo_broken_link” page
metaseo_broken_link is the vulnerable page where the XSS payload executed.
It quoted the following logs:
80.97.26.93 - - [21/Oct/2024:08:15:49 +0000] "GET /wp-admin/user-new.php HTTP/1.1" 200 10927 "http://blog.thx.bz/wp-admin/admin.php?page=metaseo_broken_link" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0"
80.97.26.93 - - [21/Oct/2024:08:15:49 +0000] "POST /wp-admin/user-new.php HTTP/1.1" 302 459 "http://blog.thx.bz/wp-admin/admin.php?page=metaseo_broken_link" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0"
80.97.26.93 - - [21/Oct/2024:08:15:49 +0000] "GET /wp-admin/users.php?update=add&id=2 HTTP/1.1" 200 12205 "http://blog.thx.bz/wp-admin/admin.php?page=metaseo_broken_link" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0"
That’s great. We see a POST /wp-admin/user-new.php that results in a 302 (Success) that has a Referer of http://blog.thx.bz/wp-admin/admin.php?page=metaseo_broken_link.And then GET /wp-admin/users.php?update=add&id=2 we know that the newly created WP user has id=2 (that is correct).metaseo_broken_link is clearly the culprit.
Let’s ask one more question:
Please list all the log entries where the Referrer header contains HTML code
It correctly identified the request that I’ve used to inject the XSS payload that resulted in the Stored XSS vulnerability.
As you can see, using NotebookLM helped us to quickly get an idea of how the WordPress blog was compromised and which plugin was potentially vulnerable.Of course, it doesn’t work as well each time, but it still can save a lot of time.
If you are interested in the patch for this vulnerability, it’s available here (the Referrer header is HTML encoded).