Is this Shakespeare project I am contemplating feasible/possible?
I am interested in Shakespeare’s works and would like to learn more about the deeper significance behind his works. I thought of a way that I could do this and I am wondering if it would be feasible/possible.
Let’s take the quote, “To be, or not to be: that is the question:” I want to make a program (can be software or online) that would allow me to check that quote against the original manuscript. It would also be nice to be able to check that quote against contemporary sources that discuss its significance.
Assumptions
1. Please assume that all IP issues will be taken care of. No need to raise copyright or patent issues here.
2. I want to just do Hamlet for now as a test of the feasibility/possibility of the program.
3. I do not have a technical background but I have a friend who is willing to help. My friend knows Visual Basic and Javascript. I live close to local universities and might be able to recruit interns/part-time help if necessary.
4. I do not know if such a program currently exists. However, even if it does, I still want to know if this project is feasible/possible because I might want to do it for other literature/plays/etc.
Observing members:
0
Composing members:
0
8 Answers
If you type it into Google Books, in quotes (although I would do “To be or not to be” “that is the question” because how the line is separated can be weird) it will come up with every time it’s mentioned in a book.
There are some issues with the “original manuscript”, as it were. I think for all of his plays, the best we have are the copies (yes, plural, and that’s important) of the First Folio. So we technically don’t have the original manuscript. But this is all off of memory, so I could be wrong.
My first thought was that the first copies would likely be written by hand, which is very difficult to scan with any existing software. Even searchable PDFs of early printed works are very sketchy, to say the least.
I don’t know anything about programming or whether it’d be possible from that standpoint, but I know from experience that software + text from any earlier than 1800s or so can be very problematic.
I think a large part of the problem is that many resources like that aren’t just scanned in and uploaded to the internet for all to see – at best, most of them are scanned in and available with a subscription to the proper database, which can cost 100K a year. Professional historians can’t get a hold of these resources without paying out the nose or traveling to the institution where they’re kept.
From a technical point of view, your biggest problem is that the internet is BIG. If you want to go to every website and find the phrase, it will take a LONG time (even with a lot of parallel machines).
One option you have is to do a search on google and and go through the search results. It may not be perfect but it will reduce the size of your search since google will do a lot of the work for you. In fact, you can probably used their cached pages to speed things up.
The other big problem is what you with the information. You will get a LOT of it, far more than you go through by hand. A search for “to be or not to be” returns 4,720,000 results. There are algorithms that can try to classify your results, etc… and you may be able to modify them to give some useful information, but this will be quite hard. Even so, with so much data, I’m skeptical that you can find a way to reduce this to meaningful results.
Finally, if you want results that that actually use the phrase in a meaningful way, you need to find a way to filter so that yo only get meaningful results. Again, there are algorithms that might work (and you may need to modify them or make up your own),
Its not impossible, but it is VERY difficult. I suggest finding someone in the computer science department of your local university who is willing to help you out. You probably want either a professor or a PhD candidate since this is the kind of thing that academics dig but has very little practical application (at least right now).
You may wish to look at the way they’ve made this search engine from Open Source Shakespeare
I like also how you can search for concordance
They also document how they created it here; which I think will be really useful when you make your own. You can even download their source-code and database.
—-
Also check the number of sites listed on the bottom right of this page
which include various searches, including already documents from different folios.
This search from rhymezone is also kinda cool.
—-
Given the availability of the script on the web already, and also the variety of different searches, my question (as a user) would be, what can your search engine offer that the others don’t already?
Also, is the main point to think of this as a CS project (to see if you can code a search engine yourself), or really a project to help other people out?
@lifeflame First, thanks for the link. I’ll check it out.
Frankly, I was not aware of any available software that could do the project I described. However, even if there are already existing programs with those capabilities, I can still enhance the program (if I know how to).
I’ll be checking out the source code. :)
Response moderated (Spam)
Answer this question