I Archived The Entire Subreddit And Coded A Simple Website To Read It (self.TheRedPill)
submitted 10 months ago * by dream-hunter
Preview: https://i.imgur.com/BXXQOke.png & https://i.imgur.com/niDdoEW.png
As a web developer that discovered TRP 1 year ago and is very grateful for the subreddit, I've always wanted to contribute here, but I never knew how, until now. After TRP has been quarantined, I feared it would get banned one day. So I decided to figure out a way to scrape the entire subreddit and have it viewed on a simple website.
I saw TRP's current backup of the subreddit and I wasn't happy with its design and hard-to-use website (and its lack of posts). So I decided to spend 8 hours trying to figure out how to scrape the entire subreddit and then code a website to view the posts as simple as possible.
Edit: Thank you for the amazing feedback everyone. I just finished scrapping RedPillWomen & RedPillParenting & ThankTRP and added them both to the website. Up to 1.5k posts added.
Edit 2: Almost every single post (around 64k) ever posted on TRP, from all the way back to 2012 till now, can now be viewed on the website.
Edit 3: Option to search through entire archive added; search through titles, posts, authors and even comments.
Edit 4: Every single post from askTRP, becomeaman and RedPillWomen (altTRP + GEOTRP) has been archived; that counts up to 60k posts since 2012. We are now at a total of 160k archived posts!
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
if 1 * 2 < 3: print "hello, world!"
[–]Modredpillschool[M] [score hidden] 10 months ago* stickied comment (5 children)
Thanks. So did we. https://www.forums.red/i/theredpill
We will always welcome more backup points, so thank you for helping out.
Our backup includes 3,176 Posts between TRP and AskTRP. With over 519,619 individual comments.
Counting RPW and ThankTRP, our total archive is 4,573 posts.
Instant search can be found here
In case of emergency we will release the entire database as torrent. FYI.
Forums opening soon with new mobile-friendly design.
[–]ReeZoX 51 points52 points53 points 10 months ago (3 children)
Really like it!
Only thing I would like is being able to sort/search the results based on the categories (flairs) and then sort that with the most upvoted one's :)
[–]dream-hunter[S] 15 points16 points17 points 10 months ago* (2 children)
Doing that is possible. Click on the Category column on the table two times then you'll see the most upvoted posts on that flair (or type the entire category's name in the search function). You can even see the most upvoted posts of each subreddit; just click the Subreddit column.
Thank you for the feedback!
[–]ReeZoX 0 points1 point2 points 10 months ago (1 child)
The search function works, but I can't switch the category on the column, there's either "Science" or none/unspecified displaying for me there ;)
[–]dream-hunter[S] 1 point2 points3 points 10 months ago (0 children)
Refresh page > click Column tab two times then you'll see the posts from each category in order of most upvoted.
[–]izzyinjurious 24 points25 points26 points 10 months ago (12 children)
What language did you use to scrape it? It's awesome btw good work.
[–]dream-hunter[S] 19 points20 points21 points 10 months ago (11 children)
PHP, my favorite language. :) Thank you!
[–]TheTriviaMan 13 points14 points15 points 10 months ago (1 child)
I write php myself and even though you've already done the scraping I would suggest for any future developers to use a python library known as "beautiful soup" https://pypi.org/project/beautifulsoup4/ it's made specifically for web scraping
[–]SpiderAlpha33 2 points3 points4 points 10 months ago (0 children)
Yeah BeautifilSoup is fast and reliable, and for dynamically generated websites I use Selenium with headless Firefox.
[–]1McDrMuffinMan 7 points8 points9 points 10 months ago (0 children)
Wow.... You're either a masochist or something else.
God speed Man!
[–]ThePantsThief 2 points3 points4 points 10 months ago (5 children)
Why scrape it when they have an API? Fellow developer here. Curious in case the scraping solution is better somehow
[–]Modredpillschool 3 points4 points5 points 10 months ago (0 children)
The https://forums.red/i/TheRedPill archive was done via API
[–]needz 0 points1 point2 points 10 months ago (3 children)
Sometimes if you already have your favorite tool and it works, there's no need to learn another tool (or in this case, API).
[–]SilkTouchm 4 points5 points6 points 10 months ago (2 children)
Scraping is always harder than just using the api.
[–]needz 0 points1 point2 points 10 months ago (1 child)
Never say never or always.
So if I've been using BeautifulSoup since it came out, have an entire framework and dev environment dedicated to scraping websites and have little to no experience with APIs, which is gonna be harder?
[–]SilkTouchm 1 point2 points3 points 10 months ago (0 children)
which is gonna be harder?
which is gonna be harder?
Scraping, by far. It doesn't matter how much experience you have, you still have to do the legwork. An API is just using a few pre-packaged methods, you don't need experience on it.
[–]the-dan-man 0 points1 point2 points 10 months ago (1 child)
Out of curiousity, why is PHP your favourite language, and where did you learn it?