on Monday, 7 October, 2019

PDF Searching Programmes

I am trying to write some programmes to search PDFs from directories.

I am hoping to use:

Bash
Python
Perl
Ruby

I have done the first two, and am timing them now (from ~/Desktop/Study/.../2018/Tri\ 2/, searching for allport):

	Bash	Python
	49.80 seconds	119.56 seconds
	42.87 seconds	118.58 seconds
	43.82 seconds	124.00 seconds
Average:	45.50 seconds	120.71 seconds

Note that I did Bash → Py → Bash..., not Bash × 3 → Py × 3.

On average, my Bash script was 2.65 times faster than my python one.

A note from the future: once I figured out the logic required to do such a task as this, this was my first good practice with different languages, learning the syntax of these different languages.

More notes on different languages, from mid-October 2019

I want to write more PDF-searchers, for my own edification more than for their functionality. Here are some notes on the strengths of different languages, and my method for PDF-searches.

Why use Perl?

Text manipulation and data wrangling is easy and fast!
Good for "glue" projects in two disparate systems
The most complete language

Why use Ruby?

Automation and scripting
Data scraping and general crawling
- Mechanise, Cucumber, Capybara, Site Prism, Selenium, Faker, Pry, Watir
Server management
General perpose! Even AI: game bots; social moderation

Why use LISP?

Most programmable; specialised in general purpose!

Why use Elixir?

Elixir runs on the erlang vm
- Very self-contained
Functional programming

Why use Rust?

In sys-programmung, memory safe!
Cargo makes managing crates easy!
Could replace C++

Why use Lua?

Lua is faster to develope that C++ and doesn't require compiling!

How I am writing PDF-searches:

PDF → text with set file and search string
Search file for search string
Print whether or not found
Make search string arg
Make loop in walk dirs + subdirs
Print all PDFs for which found; count and print times found on x PDFs
Time script
Print help output.