Prof.
Dr. Debora Weber-Wulff |
Thesis
Topics |
What is a bachelor's thesis?
- A topic from your internship that you
work on, for example the company had a horrible database or XML
structure, and you sort it out on your own to show them how it
could look.
- A topic from your project that you
continue, for example I had someone taking the E-Learning unit
we produced and making it accessable for blind users.
- A new topic that you agree on with an
IMI professor.
The following topics, more or less
diffuse, are on my current wishlist (and are possible for IC or as
the basis for a Master's Thesis area). In general, I am interested
in web topics, data mining, privacy, Android programming,
plagiarism detection and documentation tools, and E-Learning.
- Text Rewriting Classification
There are papers such as
this one that say that they can find the amount of text
rewriting in an academic paper. I would first like a student
(probably a Master's student) to replicate the experiment in
this paper using different sets of publications. Then I would
like you to look for a better classifier.
- Named Entity Recognition with
Wikidata
I am fascinated by how difficult Named Entity Recognition is in
the German language. One solution that comes to mind is to take
a corpus of bi/tri-grams after parsing and to look them up in
Wikidata. From there you should be able to determine and
classify if they represent names of people, places, things, or
are false positives. This would tend to be a Master's thesis.
- Retractions in Wikidata
WikiCite is in the process of dumping lots of citation into
Wikidata, but they are not getting the retractions and the
Expressions of concern marked properly. How can you deal with
this and get the retracted articles marked retracted and the
retractions imported as well.
- Finding author clusters on
PubMed Central
There is much open data available on PubMed Central. Given the
name of an author, can you plot the co-author graph? Now pick
two authors who are co-authors, what does their co-author graph
look like? Can you identify research groups by finding k-cliques?
- Knit your VroniPlag Wiki scarf
The VroniPlag Wiki site sports a barcode representing the
varying levels of plagiarism on the pages of a dissertation. The
barcode is created in JavaScript by accessing Semantic MediaWiki
data. There also exist knitting machines that can knit patterms.
Can you automatically create scarves knitted to match the
patterns of the barcodes (see Knit the Sky)? Can users
choose their own colors? Include a random barcode generator?
More ideas? Knitting social media tie-in?
- Wikidata
There is a list of self-contained
projects dealing with Wikidata that I feel would make
great Bachelor's or even Master's projects, or an IC. I would be
glad to discuss them with you, and send you on to someone at
Wikidata you if you want to try your hand at them.
- SPARQL queries for Wikidata
for non-computer scientists
Wikidata has so much data now that using SPARQL to query it can
result in amazing results. But there is a steep learning curve
involved with SPARQL. Can you make a kind of "google for
SPARQL", simple query interface that produces (and then runs)
SPARQL on Wikidata?
- Discovering Patterns in
Editorial Boards
Journals have editorial boards posted on their web sites,
sometimes there are 100 names and affiliations listed for just
one journal. Some of the names are fabricated and appear on
multiple boards, sometimes the names are slightly different, and
sometimes the people are real but do not know that they are
listed on these boards. I would like to have a data mining tool
that identifies potential board member lists online, extracts
names and affiliations, and then attempts to discover patterns
and connections using machine learning algorithms.
Visualizations of the database with Graphviz would be a very
cool plus.
- Curriculum-based OER guide
In Germany each state publishes Rahmenlehrpläne,
curricula for each subject, grade, and school type that list the
learning goals that are to be achieved during that year. These
documents tend to be wordy PDFs that also include long tables.
How could these be (easily) transformed into a navigational
structure for a Semantic Media Wiki that lets a teacher quickly
find materials that are available online and to determine their
licensing model in order to see if it is usable.
- Weka Miner
We are experimenting with the Weka Miner in Semantic Modelling,
I would like someone to use the Weka Miner on the 400 billion
results that I have from a collusion detection search. Can you
come up with a classification that identifies the positives,
i.e. the plagiarism pairs and clusters?
- Citation Miner
A previous master's thesis looked into the extraction of
citation information from scientific papers. I would like to
apply this technique to a largish set of papers from different
fields in order to train a machine learning system how to
recognize citations. Then it would be fascinating to run this on
a larger set of unknown papers. It would also be interesting to
measure how the algorithm fares on dissertations (these are much
larger than papers). I would also like to have frequency
distributions of the citations prepared in order to identify the
unused references ("garnish references") and the most frequent
ones.
- SIM_TEXTer-PlusPlus
Recently a student has managed to get the program and text
comparison tool, SIM_TEXT,
that does a wonderful job of comparing two text files and
highlighting the text similarities, working as an online
tool. There are some fiddly bits that need extending, for
example, dealing with multiple copies of text portions and doing
a self-comparison, and inputting pdf (can we use Apache Tika, for
example?). And while we are at it, I'd like to have the
new/old-directory comparison working under a nice GUI. The tool
should enable a teacher to set up their own databases =
directories of student papers (such as lab reports) on a per
class basis and then compare new lab reports with all the old
ones for the same class. The software needs to be browser-based
open software, so that it can be used in schools (who don't have
money for expensive software or other systems). All data must be
stored only locally on a teacher's computer.
- Java API Finder for
beginners
The Java API has all the information you need in it, but it is a
horrible mess, overloaded with information that confuses a
beginner. And you navigate it by using Google with hopefully
fitting search terms. Could there be a better way for beginners?
Is it possible to set up a better search and navigation system
that you don't have to throw away when the next version of the
API appears?
- Collusion Finder
The general case of finding plagiarisms on the open internet is
a difficult one, but finding collusions (multiple students
submitting the same or similar paper) is somewhat easier. There
are a number of systems available, but none are easy to use.
There in an open algorithm that is relatively good at finding
common parts, but quite difficult to use and interpret. I would
like to have a good interface for this system, and some
additional bells and whistles. The software needs to be an open
software, so that it can be used in schools (who don't have
money for systems like this). And it should be integrateable in
Moodle, so that all of the solutions to one exercise can be
compared to all others
- VroniPlag Wiki Report
Generator
I currently produce PDF reports for VroniPlag Wiki
semi-automatically with a lot of sleight of hand. I would like
to have a system that takes material from a Semantic Media Wiki
and produces both a colored PDF as well as LateX code that can
produce the report. There are a number of experiments that have
started, but a user-ready tool has still not emerged. Any
takers?
- Open Research Collaboration
Environment
There are so many tools out there, each one is great for one
small thing, but for helping a research group collaborate you
need to bolt on so many other things. A wiki, a private file
server, an Etherpad for incubating information, a chat for
real-time communication, a ticket system, a to-do list, a
references database. It is too much to dream of an integrated
system that can do all this and NOT depend on servers in the USA
or propritary tools?
- SlideSync
MediaEvent Services
is currently developing SlideSync, a self-service platform for
streaming live presentations on the web, which is already in use
by clients such as Lufthansa. Their toolset includes Ruby on
Rails, jQuery, Bootstrap, Amazon Web Services as well as Wowza
Media Server, and exemplary thesis topics could relate to
server-side video and slide processing, analytics, realtime
interaction, mobile devices and scalable server infrastructure.
They use an agile development process (Scrum, Gerrit and
Cucumber). Christian Becker (Telefon +49 6441 87087-22, E-Mail c.becker@mes-info.de)
would be happy to meet with interested students at HTW or their
office in Moabit to discuss the topic further. I will be glad to
be the HTW advisor for this topic.
- Blog Publisher
My blog, Copy, Paste & Shake, has been given an ISSN number.
I would like to have a theme that displays the Volume and Number
for each issue = month, and then produces a proper printable
version with page numbers, etc., for submission to the German
National Library. The publisher should produce PDF or LaTeX
code, and be adaptable for both WordPress and Google blogs.
- Group Commented Bibliography
There are lots of tools out there that let an individual keep a
bibliography with information about literature. But there is no
good, free group bibliography, much less one that will accept
comments from persons on the materials.
- Plagiarism in Journalism
Journalists seem to be constantly taking text from each other.
Can this be visualized? Can you identify the parts of an online
article that come from some other online article that is older
and can you map this to some sort of timeline? This might let us
see how many copies are actually influenced by one important
article.
Real Old Stuff
- Visualizations
With all this data floating around, much of it open, we are now
able to produce visualizations that suggest new information or
connections to other people. How can we automate the production
of such visualizations? Flash is dead, so this has to be in
HTML5. There are numerous possible theses here.
- The Wikipedia Admin Game
The admins in the German Wikipedia see themselves as warriors,
fighting against the evil bands of trolls that try to deface the
Wikipedia. In this game, you are an admin who is trying to save
the Wikipedia from fake edits, edit wars, teenagers, and the
like. School lets out, and you have to be on your guard, as the
bored kids fire up their computers and see what they can deface
today. Are you quick enough to keep the Wikipedia running?
- Wikipedia-Version-Tool
I need a tool that will display for me what the Wikipedia entry
(and all links I follow) was on a particular date at a
particular time, a sort of Way-Back machine for the Wikipedia.
This will help me so that when people say "The Wikipedia said
thus and such on day X" I can check it out. You will use the
history data which is available to determine what the page
actually displayed on that day. When I follow links with your
tool I will see what the page linked to.
- Moodle Book
Moodle Book is quite a good authoring system, but there are
problems. Some have been determined in a thesis done last
semester, others are my personal pet peeves. So I want you to
upgrade the Moodle Book module to make it more useful for
real-life E-Learning! For example, all subchapters remain
collapsed until the chapter opens, not all subchapters open at
the same time. This will involve using PHP and CSS.
- SCORM Test
This is the standard for import and export of E-Learning
materials. Except that it doesn't work. Everyone can import the
SCORM that *they* export, but not necessarily the work of
others. In this thesis I want the author to take a number of
different kinds of E-Learning materials, export them from CLIX,
Blackboard, Moodle, etc. and import them in other systems. I
want a module for Moodle to be set up that can cope with all of
the different SCORM problems.
Really old
stuff
- Wikipedia User Survey
The Wikimedia
Research Network wants to conduct a user
survey. I would like a thesis to conduct a pilot survey,
constructing the survey software which will have to be easily
localizable in many languages. The Wikimedia
Research Network Privacy Policy must be respected during
this work.
- Diff for InDesign
In Wikis or in programming it is trivial to find the difference
between two texts and report on the difference. It is also easy
to record the history of edits and set up "reverts". InDesign
also has a history, but I am not aware of a possibility to
analyze the difference between two projects, save snapshots of
the history, and revert changes back to a checkpoint. I would
like for someone to investigate making a diff for InDesign (or
Photoshop).
- Wikipedia
There are any number of enhancements possible for Wikipedia,
perhaps working through a taging mechanism for articles. You
must be willing to have your results subject to Gnu Public
Licence.
- Wiki RSS-Reader
One can set a watchlist for single pages in the Wikipedia, and
other Wikis offer a "last changes" RSS feed. But I have to visit
all of these Wikis in order to see if anything I am interested
in has changed. I would like to have a tool, similar to Awasu,
that sucks in the RSS feeds, lets me know what is new, and if I
click on a link, puts me in direct contact with that particular
Wiki. I want this tool to work with many different kind of
Wikis, and to cope with all sorts of RSS descriptions.
- Wikis for Teaching
Wikis seem to me to be an ideal basis for constructing a
learning management system. Can a synchronous service such as
Chat systems in the Flash Communication Server be integrated
into a Wiki-Collaboration and Annotation system? What does a
Wiki for teaching really need?
- What else interests me? E-Learning,
Web 2.0, social computing, gender questions, e-voting, privacy.
I only take about 6-8 students as thesis
students, as I want to have the time to advise you properly.
Please ask as soon as possible, if you are interested in any of
the topics. I will also sign for 3-4 external projects, but cannot
offer to advise you. I will now stop taking reservations, it is
first come, first served. I find it irritating when people
"reserve" a position, and then give it up at the last minute. I
pay for this by having to teach an extra class every few semesters
to fill up the negative teaching hours I get assigned if I do not
take enough students.
Students who do their thesis with me
should expect to come to my office every week with results from
the previous week. We will spend time in a group on questions, and
we will discuss in the group what you bring with you for us to
read. It is hard work, but tends to bring results, i.e. a finished
thesis inside of the deadline.
I have some links about topics having to
do with writing
a thesis.
Last change:
2020-02-19 12:00