Projects

Sears Holdings Corporation Internship

Monitoring Framework

This project was about developing a Java filter class and replaced the original Spring interceptor class for monitoring merchandise pick-up (MPU) web application.  The goal was to write a monitor class without using Spring framework so other non-Spring based web applications can be integrated with and make use of the power of the monitoring class.

A diagram below briefly illustrates the whole process:
java_filter_class_diagram

The Java filter class is able to monitor web transactions and it has the following functionalities:

  • Monitor incoming requests from client and outgoing responses from MPU server
  • Log transaction order information including
    • Date & Time
    • URL
    • Request processing time
    • etc.
  • Aggregate transaction information periodically
  • Send transaction aggregation to data visualization tools like Graphite

Selenium Web Automation Testing Framework

This project was about implementing a test automation framework for testing MPU related web applications.  The framework was designed to simulate human interactions with four different web applications so testers can create sequences of method invocations given by the framework as test cases to validate the behaviors of the target web applications.

The tools involved in this project are listed below:

Tools Version Usage
Selenium WebDriver 2.46.0 Webpage manipulation
TestNG 6.9.6 Creating test cases
Log4j 2 2.3.0 Logging debug information
Monte media library 0.7.7 Video recording test execution
Spring 4.1.7 Code decoupling
Maven 3.3.3 Project management
Ant 1.9.5 Building JavaDoc

The framework supports the following functionalities:

  • Simulate human interaction with every web pages in the target web applications
  • Support Chrome and Firefox
  • Log debug information (to file/console)
  • Video recording test case execution
  • Generate test report and statistics automatically

Data Mining

Sentiment Analysis

In this project, we predicted the sentiment value of movie reviews on the Rotten Tomato website.

We took our data from Kaggle. The data is composed of a training set, a test set and a validation set. We have two options to feed classifiers:

  • Make independent assumptions of words and simply count frequencies of words appeared in each document
  • Make use of the structural information such as words order and built a syntax tree for each document

The first option discards structural information of documents but it is fast in terms of performance. The data we are dealing with is in bag-of-word representation where each column represents the frequency of a word and each row denotes a document. The second option gives more accurate result but it is usually slow to train. The data here is in format of a syntax tree where leaves are single words and internal nodes are phrases and the root is the entire document. In the project, we tried both strategies.

We pre-processed data to the bag-of-word representation and applied a bunch of classifiers, including naive Bayes, KNN, random forest, adaptive boost, SVM, logistic regression. The best result is achieved by naive Bayes classifier with an accuracy of 42% for 5-class and 78% for 2-class.

To capture structural information of documents, we applied recursive neural tensor network (RNTN) for our data set and got an accuracy of 44% for 5-class and 84% for 2-class.

We've learned a lot from the project, which includes:

  1. Perform feature selection or extraction before feeding it to classifiers when feature number is huge. If the number of features is greater than the number of documents for training, then the result may be overfitted. Also, reducing data dimensionality can speed up training process. That's why we need feature extraction.
  2. Try more classifiers for a given problem. It is often the case that we cannot intuitively find out what classifiers are good or bad for a given problem, so you should try as many as you can. But you may not want to try all of classifiers because some classifiers will not be better than what we've already choose, like bagging is no better than random forest. Also, sometimes we may know that a classifier will not perform well on a given problem by doing a little bit analysis, like in our project we did not choose Gaussian naive Bayes because the assumption it makes does not apply to our problem.
  3. To speed up our work, we should try to tune parameters that influence the accuracy most preceding to other less important parameters in classifiers.

The code can be found here.

Software Verification & Validation

MDE using Alloy

This project is inspired by our final project in Automated Software Design class. The goal is to replace Prolog with Alloy and compare expressibility between Prolog and Alloy. A visualization of this project is summarized below:

meta_model_alloy

Alloy is a modeling language which is targeted at express data structures and constraints. We think that UML diagram can be treated as data structures, and that's the original motivation of our project.

Similar to what we've done in Automated Software Design class, we defined meta-models for finite state machine and class diagram in Alloy. We also implemented model-to-model, model-to-text and text-to-model transformation in a different way. We took advantage of Alloy's instance generation feature and figured out that the constraints in our last MDE project are not complete. In fact, we found a lot of missing constraints in our last project. So we build a tool to detect missing constraints and check constraints validity.

We've learned many things in this project:

  1. We compared Alloy and Prolog with respect of their ease of use in modeling. Our observation is that Alloy is more expressive in term of writing constraints. For example, Alloy has transitive closure which is equivalent to the recursion in Prolog. But recursive program in Prolog is not easy to write, so we think Alloy can be used as a database candidate language in MDE.
  2. We analyzed MDE process and conclude that the most difficult part for MDE developers is to write a complete list of constraints that each model should conform to. Our tool can help MDE developers find those missing constraints. It can also help MDE users to detect constraints that a model does not conform to.

Our project code is available here.

Automated Software Design

MDE

Model Driven Engineering (MDE) is a software development methodology which focuses on creating and exploiting domain models. In this particular project, we were mainly dealing with finite state machine (FSM) and class diagram. The goal is to build a bridge that help different UML designers who use different UML tools to validate their diagrams and transform code representations of UML diagrams between different tools.

A brief visualization of our tool is show below:

meta_model_before

To began with, we defined meta-models for both finite state machine and class diagram and use Prolog to store meta-models and models. We then implemented parsers for model-to-model transformation, model-to-text transformation and text-to-model transformation. (Shown as arrows in the graph above) In the final step, we abstract the above meta-models for both finite state machine and class diagram into a higher level (a meta-meta-model), so that we can automatically generate parts of the parsers in the last step.

The key things I learned in this project include:

  1. Model abstraction. Basically, the project gives us an idea of how to define meta-models for a given model
  2. Automate program generation. We know the basic idea of MDE and automated software design, which is often guided by databases. In our case, the data we was stored in Prolog and we made use of that data to generate Java code.

The project code can be found here.