Prof. Dr. Debora Weber-Wulff
What is a bachelor's thesis?
- A topic from your internship that you work on, for example the company had a horrible database or XML structure, and you sort it out on your own to show them how it could look.
- A topic from your project that you continue, for example I had someone taking the E-Learning unit we produced and making it accessable for blind users.
- A new topic that you agree on with an IMI professor.
The following topics, more or less
diffuse, are on my current wishlist (and are possible for IC or as the basis for a Master's Thesis area). In general, I am interested in web topics, data mining, privacy, Android programming, plagiarism detection and documentation tools, and E-Learning.
- Ad hoc peer-to-peer Network & Knowledge Management in medical crisis areas
The NGO Cadus.org is developing mobile hospital systems for use in crisis areas. In addition to all of the medical apparatus that is needed, they also need the possibility of building an ad hoc p2p network using only open-source software. Additionally, there is a lot of knowledge management that has to be set up in a distributed and secure manner, as there is not always an Internet available. The technology basis is mobile phones. This can be 2 theses, or potentially a project.
- Telephone Book Diver
you had 25 years of German telephone book data available? Sure, there are the usual bits of nastiness with different formats for different years, and streets being sometimes abbreviated, sometimes not, and the usual non-normalized stuff. Could you do things like create a map of where people live who use a "Dr." in their name, or calculate the "Schmidt"-Index, that is, what percentage of telephone patrons called "Schmidt" are in each city? You would have to clean the data and come up with a privacy-preserving interface that permits people without computer science degrees and Python experience to search for "interesting" data and plot it to a map of Germany.
- Retraction Finder
In biomedical science, it is important to know if the paper you are citeing has been corrected or retracted, and many papers have 30-50 publications referenced. There are many places retractions or issues with a paper can be noted: PubPeer, PubMed, Medline, Web of Science, SCOPUS, CrossCheck. There needs to be a tool similar to the plugin for Firefox that alerts a reader
to issues with a paper, but it needs to work on Word or Libre Office and notify a writer if there are issues. Alternatively, this could work together with Zotero or Citavi.
- Finding author clusters on PubMed Central
There is much open data available on PubMed Central. Given the name of an author, can you plot the co-author graph? Now pick two authors who are co-authors, what does their co-author graph look like? Can you identify research groups by finding k-cliques?
- Knit your VroniPlag Wiki scarf
There is a list of self-contained projects dealing with Wikidata that I feel would make great Bachelor's or even Master's projects, or an IC. I would be glad to discuss them with you, and send you on to someone at Wikidata you if you want to try your hand at them.
- SPARQL queries for Wikidata for non-computer scientists
Wikidata has so much data now that using SPARQL to query it can result in amazing results. But there is a steep learning curve involved with SPARQL. Can you make a kind of "google for SPARQL", simple query interface that produces (and then runs) SPARQL on Wikidata?
- Discovering Patterns in Editorial Boards
Journals have editorial boards posted on their web sites, sometimes there are 100 names and affiliations listed for just one journal. Some of the names are fabricated and appear on multiple boards, sometimes the names are slightly different, and sometimes the people are real but do not know that they are listed on these boards. I would like to have a data mining tool that identifies potential board member lists online, extracts names and affiliations, and then attempts to discover patterns and connections using machine learning algorithms. Visualizations of the database with Graphviz would be a very cool plus.
- Daimler Mobility
Dr. Emeterio Navarro, who used to teach in the IMI Master's program, is now with the moovel Group GmbH (a subsidiary of Daimler Mobility Services, https://www.moovel.com/en/DE). At the moment, there are the following three areas where they could have some students working with them in the team "moovel Global Core":
1) iOS Entwickler für SDK mit Kenntnissen in Software Engineering und Serious Computer Games
2) C++ Entwickler mit vertieften Kenntnissen und Graphikprogrammierung 3D
3) Lua Entwickler für Text-Search / Information Retrieval
They have both Bachelor's and Master's theses, and are also looking for students to work with the company. Please contact me if you do not have Dr. Navarro's contact data.
- Curriculum-based OER guide
In Germany each state publishes Rahmenlehrpläne, curricula for each subject, grade, and school type that list the learning goals that are to be achieved during that year. These documents tend to be wordy PDFs that also include long tables. How could these be (easily) transformed into a navigational structure for a Semantic Media Wiki that lets a teacher quickly find materials that are available online and to determine their licensing model in order to see if it is usable.
- Weka Miner
We are experimenting with the Weka Miner in Semantic Modelling, I would like someone to use the Weka Miner on the 400 billion results that I have from a collusion detection search. Can you come up with a classification that identifies the positives, i.e. the plagiarism pairs and clusters?
- Citation Miner
A previous master's thesis looked into the extraction of citation information from scientific papers. I would like to apply this technique to a largish set of papers from different fields in order to train a machine learning system how to recognize citations. Then it would be fascinating to run this on a larger set of unknown papers. It would also be interesting to measure how the algorithm fares on dissertations (these are much larger than papers). I would also like to have frequency distributions of the citations prepared in order to identify the unused references ("garnish references") and the most frequent ones.
- Data Mining
I am fascinated by the possibilities that data mining offer.
I have some ideas for using Hadoop and data mining technologies to find collusion. There are also many other data or text mining topics I can easily be encouraged to advise.
This semester a student has managed to get the program and text comparison tool, SIM_TEXT, that does a wonderful job of comparing two text files and highlighting the text similarities, working as an online tool. There are some fiddly bits that need extending, for example, dealing with multiple copies of text portions and doing a self-comparison, and inputting pdf (can we use Apache Tika, for example?). And while we are at it, I'd like to have the new/old-directory comparison working under a nice GUI. The tool should enable a teacher to set up their own databases = directories of student papers (such as lab reports) on a per class basis and then compare new lab reports with all the old ones for the same class. The software needs to be browser-based open software, so that it can be used in schools (who don't have money for expensive software or other systems). All data must be stored only locally on a teacher's computer.
- Java API Finder for beginners
The Java API
has all the information you need in it, but it is a horrible mess, overloaded with information that confuses a beginner. And you navigate it by using Google with hopefully fitting search terms. Could there be a better way for beginners? Is it possible to set up a better search and navigation system that you don't have to throw away when the next version of the API appears?
- Collusion Finder
The general case of finding plagiarisms on the open internet is a difficult one, but finding collusions (multiple students submitting the same or similar paper) is somewhat easier. There are a number of systems available, but none are easy to use. There in an open algorithm that is relatively good at finding common parts, but quite difficult to use and interpret. I would like to have a good interface for this system, and some additional bells and whistles. The software needs to be an open software, so that it can be used in schools (who don't have money for systems like this). And it should be integrateable in Moodle, so that all of the solutions to one exercise can be compared to all others
- VroniPlag Wiki Report Generator
I currently produce PDF reports for VroniPlag Wiki semi-automatically with a lot of sleight of hand. I would like to have a system that takes material from a Semantic Media Wiki and produces both a colored PDF as well as LateX code that can produce the report. There are a number of experiments that have started, but a user-ready tool has still not emerged. Any takers?
- 3D Printer
I want to play with 3D printers, we have a few on campus. What kinds of interesting applications could we create around a 3D printer?
- Open Research Collaboration Environment
There are so many tools out there, each one is great for one small thing, but for helping a research group collaborate you need to bolt on so many other things. A wiki, a private file server, an Etherpad for incubating information, a chat for real-time communication, a ticket system, a to-do list, a references database. It is too much to dream of an integrated system that can do all this and NOT depend on servers in the USA or propritary tools?
MediaEvent Services is currently developing SlideSync, a self-service platform for streaming live presentations on the web, which is already in use by clients such as Lufthansa. Their toolset includes Ruby on Rails, jQuery, Bootstrap, Amazon Web Services as well as Wowza Media Server, and exemplary thesis topics could relate to server-side video and slide processing, analytics, realtime interaction, mobile devices and scalable server infrastructure. They use an agile development process (Scrum, Gerrit and Cucumber). Christian Becker (Telefon +49 6441 87087-22, E-Mail firstname.lastname@example.org)
would be happy to meet with interested students at HTW or their office in Moabit to discuss the topic further. I will be glad to be the HTW advisor for this topic.
- Blog Publisher
My blog, Copy, Paste & Shake, has been given an ISSN number.
I would like to have a theme that displays the Volume and Number for each issue = month, and then produces a proper printable version with page numbers, etc., for submission to the German National Library. The publisher should produce PDF or LaTeX code, and be adaptable for both WordPress and Google blogs.
- Group Commented Bibliography
There are lots of tools out there that let an individual keep a bibliography with information about literature. But there is no good, free group bibliography, much less one that will accept comments from persons on the materials.
- Plagiarism in Journalism
Journalists seem to be constantly taking text from each other. Can this be visualized? Can you identify the parts of an online article that come from some other online article that is older and can you map this to some sort of timeline? This might let us see how many copies are actually influenced by one important article.
- Animating BlueJ
BlueJ is a fantastic environment for teaching object oriented programming. I would like to continue to use it for algorithms and data structures, but how can I visualize the more complicated data structures? How can the animation of algorithms be included into BlueJ? I would like for someone to look into this.
- Searching Project Gutenberg (taken)
I just looked for something in the Project Gutenberg, and found the navigation and search capabilities sorely lacking. There needs to be much better ways of finding specific material, and there needs to be a tagging system made for it. The results of this thesis should be donated to Project Gutenberg. This has to also work for people who are using Tor.
Real Old Stuff
With all this data floating around, much of it open, we are now able to produce visualizations that suggest new information or connections to other people. How can we automate the production of such visualizations? Flash is dead, so this has to be in HTML5. There are numerous possible theses here.
- The Wikipedia Admin Game
The admins in the German Wikipedia see themselves as warriors, fighting against the evil bands of trolls that try to deface the Wikipedia. In this game, you are an admin who is trying to save the Wikipedia from fake edits, edit wars, teenagers, and the like. School lets out, and you have to be on your guard, as the bored kids fire up their computers and see what they can deface today. Are you quick enough to keep the Wikipedia running?
I need a tool that will display for me what the Wikipedia entry (and all links I follow) was on a particular date at a particular time, a sort of Way-Back machine for the Wikipedia. This will help me so that when people say "The Wikipedia said thus and such on day X" I can check it out. You will use the history data which is available to determine what the page actually displayed on that day. When I follow links with your tool I will see what the page linked to.
- Moodle Book
Moodle Book is quite a good authoring system, but there are problems. Some have been determined in a thesis done last semester, others are my personal pet peeves. So I want you to upgrade the Moodle Book module to make it more useful for real-life E-Learning! For example, all subchapters remain collapsed until the chapter opens, not all subchapters open at the same time. This will involve using PHP and CSS.
- SCORM Test
This is the standard for import and export of E-Learning materials. Except
that it doesn't work. Everyone can import the SCORM that *they* export, but
not necessarily the work of others. In this thesis I want the author to take
a number of different kinds of E-Learning materials, export them from CLIX,
Blackboard, Moodle, etc. and import them in other systems. I want a module
for Moodle to be set up that can cope with all of the different SCORM problems.
Really old stuff
- Wikipedia User Survey
Research Network wants to conduct a user
survey. I would like a thesis to conduct a pilot survey, constructing
the survey software which will have to be easily localizable in many languages.
- Diff for InDesign
In Wikis or in programming it is trivial to find the difference between two
texts and report on the difference. It is also easy to record the history
of edits and set up "reverts". InDesign also has a history, but
I am not aware of a possibility to analyze the difference between two projects,
save snapshots of the history, and revert changes back to a checkpoint. I
would like for someone to investigate making a diff for InDesign (or Photoshop).
There are any number of enhancements possible for Wikipedia, perhaps working
through a taging mechanism for articles. You must be willing to have your
results subject to Gnu Public Licence.
- Wiki RSS-Reader
One can set a watchlist for single pages in the Wikipedia, and other Wikis
offer a "last changes" RSS feed. But I have to visit all of these
Wikis in order to see if anything I am interested in has changed. I would
like to have a tool, similar to Awasu, that sucks in the RSS feeds, lets me
know what is new, and if I click on a link, puts me in direct contact with
that particular Wiki. I want this tool to work with many different kind of
Wikis, and to cope with all sorts of RSS descriptions.
- Wikis for Teaching
Wikis seem to me to be an ideal basis for constructing a learning management
system. Can a synchronous service such as Chat systems in the Flash Communication
Server be integrated into a Wiki-Collaboration and Annotation system? What
does a Wiki for teaching really need?
- What else interests me? E-Learning, Web 2.0, social computing, gender questions, e-voting, privacy.
I only take about 6-8 students as thesis students, as I want to have the time
to advise you properly. Please ask as soon as possible, if you are interested
in any of the topics. I will also sign for 3-4 external projects, but cannot
offer to advise you. I will now stop taking reservations, it is first come, first served. I find it irritating when people "reserve" a position, and then give it up at the last minute. I pay for this by having to teach an extra class every few semesters to fill up the negative teaching hours I get assigned if I do not take enough students.
Students who do their thesis with me should
expect to come to my office every week with results from the previous week. We
will spend time in a group on questions, and we will discuss in the group what
you bring with you for us to read. It is hard work, but tends to bring results, i.e. a finished
thesis inside of the deadline.
I have some links about topics having to do with writing a thesis.
Last change: 2017-07-17 22:21