comp730:MeAndYou Project Software Notes



The COMP730 Software Group is a group created to bridge the gap between the Database and the  User Interface groups for the  MeAndYou Project. It is composed of the code that connects between the database and front-end, as well as the algorithm that determines the matching quality of searches. The group was first created during the demo team's first creation of the project in  Spring 2016.

Diagrams
A. Search finds top hits (1-n).

B. Those top hits now need to see if they have a search that finds the searcher.

Link to UML diagrams: Google Drive Folder

Pseudo Code of the Agent
1. Query the Database to access the unprocessed queries

2. For each unprocessed search

- store list of attributes - get users that match gender into memory - filter list down to users that have a score > .75 for the first name - filter list down to users that have a score > .75 for the last name 3. Now have the short list of Users

- compare all remaining attributes - get the top two(?) attributes - throw out user if attribute don't meet threshold

4. Short Candidate List

- For each person in CandidateList - For each attribute in MySearchAttributeList (the optional ones 2..n)      - compare each attribute value with the Candidate's information - this data may be in the Person record, or email table, or phone table etc. - Score each attribute for the person. - this may entail the use of a weight (importance of the attribute from the Attribute Type table) * the score - End - Sum the Candidate's score / weight = total score - if score < crush_min then discard, else keep with score - End

5. Get top 10 candidates based on the total score

- For each candidate in top 10 list - For each Search in getSearches(personId) [Their Searches] - For each attribute in Search - evaluate each attribute and score - End - Save the score - End - End

6. Get the highest score. This search is the best match for our search.

7. Send Email Notification

REQ: The personal attributes need to be submitted before the Search is entered. So if a user wants to do a search on Crush, they need to submit enough attributes to be found by a crush. This will allow us to do a search when the personal attributes change.

OPEN ISSUE: When a user changes their own attributes (adds, edits, can ignore deletes), we need to do a reverse search. This will allow us to find matches even after the search has been completed.

OPEN ISSUE: Do we need to re-run searches as more people add their information or will a new search prompt a reverse search?

REQ: The agent will monitor the Search table for new or modified records. This will be indicated by a status of Unprocessed.

REQ: If Search is updated, set Status of Search to "Unproccessed".

REQ: If a user's attributes are updated, then set timestamp of profile. Should we do a search on the profile?

Scenario: User A searches for User B.  Search does not find User B.  User B updates their data attributes. Since User's A search has already been performed, then search should be using User B's data.

May need to add a status to the User Attributes.

REQ: The agent will select a search by the FIFO method (First in, First out).

REQ: The agent will query the attributes table for exact matches between the search criteria and the user profile attributes.

REQ: If a match is found, the Search Id, and the found User Profile Id will be written to the Matches table.

REQ: Nicknames: Use name lookup tables to handle alternate names used. An example is Jon Smith’s first name may be Jon, John or Jonathan. May be able to use Soundex, Nearest Neighbor algorithm or another pattern matching program to find similar names. Lookup table would be diffiecult to maintain.

OPEN: What if it finds more than 1 match.

OPEN: Use weighted results for all fields (Name, Phone, Email) in table?

- Use combination of match accuracy for each field together to determine match. - Set a percent threshold to determine if match table for users should be updated. - Order match tables to handle greater than 5 matches for users. - If percent is too low then a lower match can be used and order table with highest first.

Engine
The engine is responsible for making the calls in order to find matches for unprocessed searches. It is dependent on services that perform the lower level work, which are injected upon initialization. This pattern of dependency injection should allow us to add or swap out any of the engine dependencies as the project advances. The engine is also responsible for opening and closing the connection to the database. This way it can leave the connection open in order to execute batch queries. For the specifics, you can reference the comments in the MatchEngine.cs. The functionality of the engine can be broken down into multiple steps. Those steps are as follows:




 * 1. Query database for list of unprocessed searches
 * 2. foreach unprocessed search
 * 3. Retrieve the attributes that the search is looking for
 * First Name, Last Name, Gender, etc...
 * 4. Query database for list of users that match the gender we are looking for
 * This allows us to cut the amount of users in half right away
 * 5. Filter list of users by first name using nearest neighbor algorithm
 * If the first name distance doesn't meet the required threshold, the user is removed from the list
 * If the first name matches, keep track of the score
 * 6. Filter list of users by last name using nearest neighbor algorithm
 * If the last name distance doesn't meet the required threshold, the user is removed from the list.
 * If the last name matches, keep track of the score
 * At this point, we should have a small list of users (or none), which we will refer to as our top candidates
 * 8. foreach candidate
 * 9. Retrieve a list of their optional attributes
 * 10. Compare each attribute to the attributes of the searcher
 * 11. Take the top two attribute scores and add to the existing score
 * 12. See if the total score meets the threshold
 * If not, remove the candidate from the list
 * At this point, we should have an even smaller list of candidates
 * 13. Retrieve a list of searches for each candidate
 * 14. foreach candidate search
 * 15. See if the gender they are searching for matches the gender of the original searcher
 * 16. Repeat steps 5-12 to see if the original searcher's attributes meet the criteria of the candidates search
 * If not, remove the candidate from the list
 * At this point, we should have a two-way match, if any
 * 17. Multiply the two scores together to come up with a final score for the match
 * 18. Insert the match into the database
 * 19. Notify each user that a match has been made
 * 20. Searching is done now, run whatever cleanup is required

Configuration Manager
Represents a list of configurations required by the app. Currently these are hard-coded values but eventually we may want to load these values in from an external source.

Exact Distance
Class able to to calculate the distance between two strings using an exact string compare. We aren't actually using this class in the engine implementation, but it is useful for debugging.

Jaro Winkler Distance
Class able to to calculate the distance between two strings using the Jaro-Winkler Distance algorithm. This returns a value from 0-1 depending on how close the two strings match. 0 indicates the there are no similarities and 1 indicates an exact match.

Notify Manager
The class responsible for sending notifications to users. Currently, the notification are being sent over email.

Mail Manager
Class that's able to send email messages over SMTP.

Query Manager
Class to hold all of the queries against the Database. This allows us to store all of the queries that we are going to make in one spot. Other dependencies use this class to search and update the database.

Search Manager
Class responsible for performing searches against the database.

Interface Data Converter
Represents the shell of a data converter. All data converters must implement this interface. This allows us to seamlessly convert MySQL data to usable C# objects.

Attribute Data Converter
Class that's capable of converting database data to attribute objects.

Match Data Converter
Converts the data into a list of matches.

Person Data Converter
Class that's capable of converting database data to person objects.

Search Data Converter
Class that's capable of converting database data to searchItem objects.

User Notification Data Converter
Converts the data into a list of user notifications.

Data Access Manager
Class that contains all of the logic for connecting to the database.

MySQL Data Reader
Wrapper for the my sql data reader.

Security
REQ: The agent will have access to read the database tables that contain the searches, and the attributes

REQ: The agent will have access to read and write the matches table.

Scheduling
REQ: The agent will be scheduled to run continuously. As needed, additional agent threads can be started to process requests in parrallel.


 * Using the built in Task Scheduler in Windows 2008
 * Runs Daily, every 10 minutes

Non-Functional Requirements
C# language will be used to implement Matching Engine

The search program will have several layers, three levels is the initial plan for names, phone and email.


 * • Usability: The MeAndYou system must be a user-friendly web application that is easy to learn and operate.


 * Command line application. Runs from the Windows command prompt or through a scheduled service.


 * • Reliability: The MeAndYou must be able to keep up with its functions for specified period of time.


 * Exception handling needs to terminate searches that are causing errors.


 * • Performance: The MeAndYou system must be built using most effective up to date web development utilities in UI, Match Engine, and database.


 * The speed of the CRUSH search for a dataset of 20,000 is < 3 seconds.
 * As the database grows, performance will suffer. Application will need to work as a distributed task, and allow multi-threading.


 * • Supportability: The MeAndYou system must have high adaptability and maintainability.


 * Installation
 * Need to turn off the Scheduler when deploying to allow the files to be copied over.
 * Need to schedule deployments to off hours. Recommend early morning or develop 2 scheduled agents.


 * Installation Instructions
 * Agent Software: MeAndYouAgent.exe

The agent software is a command line application that is executed on the server. It simply queries the MySQL database (meandyou2) and processes search records that have a status of InComplete.


 * Pre-requisites/Required Tools:
 * C# Compiler – Visual Studio 2015 or 2017 will work.
 * MySQL Database on the same server
 * SMTP Mail relay available

  
 * App.Config
 * In the application Configuration file, set the following if needed:


 * If a Mail Relay server exists, then uncomment the setting after PROD and set the proper host. If a relay does not exist, then you can use a folder for email notifications (the DEV setting)

     

Login is .\Administrator, with the admin password mj.US730
 * To Install:
 * In VS 2015, change the Configuration to Release, and click on Build – Build Solution
 * This will compile the executable and place it and its supporting files into the bin\release directory.
 * Copy these files to the lamp.unh.edu servers. \\lamp.unh.edu\c$\Software\Program

The Agent needs to run every 10 minutes. In order for this to happen, you need to schedule the agent in Windows Task Scheduler.
 * Scheduling the Agent
 * Steps:
 * From the Start button, search for “Scheduler” or click on Administration Tools – Task Scheduler
 * Click on Task Scheduler Library
 * Right Click and choose “Create Basic Task”
 * Enter the Name: MeAndYou Agent
 * Enter a Description of the agent.
 * Click NEXT
 * Choose Daily for Trigger
 * Click NEXT
 * Choose Start a Program, for Action
 * Click NEXT
 * Browse to c:\Software\Program\MeAndYouAgent.Exe
 * Click NEXT and FINISH
 * To make it run every 10 minutes, right click on the entry and choose Properties
 * Click on the Trigger tab, Change “One Time” to “Daily”, Recur every, 1 days.
 * Check off the Repeat Task Every, and then enter “10 minutes”, for the duration of “12 hours”
 * Check of “Stop Task if it Runs longer than”, “1 Day”
 * Click on OK

Open Note:


 * • User’s entries are validated in UI before being encrypted and transmitted to the database or matching engine.


 * • Match Engine does not perform any data validation it assume that all received data from UI and db are valid.

Features to be added
Up to date as of May 2017.


 * Add Old friend or buddy functionality.
 * Add Out-of-contact or lost family member functionality.
 * Add Lost or estranged loved one functionality.
 * Add check to make sure "lost loves" don't match "lost family member" searches.
 * Modify(Add) weights of matches for better or more accurate matching.
 * Wights should not be binary. They should intelligently determine how close a person is.
 * Important weights would be: eMail, Name, or other very personal information to avoid false matches.
 * SVN as source control, but could be improved. (add branching, merging, release versions)
 * Error handling in the engine to check for corrupt data being entered.
 * Add ability to scale up database access. Currently pulls by gender (50% of database). Should be something like "pull 10,000, then pull another 10,000".

Prior Class Notes (Pre 1/1/2017)

 * Start at fixed time (once a day) or interval (6 hours since last) to run program.


 * Use weighted results for all fields (Name, Phone, Email) in table. Use combination of match accuracy for each field together to determine match.  Set a percent threshold to determine if match table for users should be updated.  Order match tables to handle greater than 5 matches for users.  If percent is too low then a lower match can be used and order table with highest first.


 * Note for GUI team: Only check match table at login for matches.


 * Use timestamp from database to determine new or updated users to search against previously searches. This includes users who have made changes that had been part of previous searches.  New user needs to adjust to old table?


 * Load data in ‘chunks’ to data structures in memory. Chunks could apply to both newer tables and the tables searched previously.


 * Consider possibly use name lookup tables to handle alternate names used. An example is Jon Smith’s first name may be Jon, John or Jonathan [ update: this may be better for first letters only ].


 * Java language will be used for search program.


 * The search program will have several layers, three levels is the initial plan for names, phone and email.


 * Pseudo code of initial flow:


 * Get list of updated tables


 * While updated tables not done:


 * Load batch of updated tables


 * While all old tables not searched:


 * Load batch of old tables


 * Perform matching algorithm (this will expand)


 * For one updated user table match to batch of old tables


 * For each field, prepare and perform near match
 * Compile results to determine if match
 * Update match tables


 * Commit to database before next batch (note – add to old table load?)


 * Search diagram without individual entries breakdown:
 * MeAndYouSearchLoop.png
 * Initial Top Level Search Proposal




 * MeAndYou Social Network Site
 * Use Case Test Sheet/Log

Test| ID | Test Case Name| Prerequisites | Step| Activity | Expected | Result| Actual| Result|	Priority| Author |Status | (Pass/Fail)| Comments


 * Use Case Name: Finding Crush Match
 * Actors: Matching Engine, database
 * Description: This use case describes how the matching engine system finds matching person and send it back to the requesters.
 * Pre-conditions:
 * 1-	 The Matching Engine and database must be live to determine the matches.
 * 2-	More pre-conditions
 * 1-	 The Matching Engine and database must be live to determine the matches.
 * 2-	More pre-conditions


 * Normal Workflow:
 * 1-	The requester/caller object will provide 8 valid entries for the target crush person that is already in database.
 * 2-	The Matching Engine will search for the match inside the database and find 100% match.
 * 3-	The database will return the search result to the Matching Engine
 * 4-	The Matching Engine will sends the matching result to the caller object/instance
 * 5-	The use case ends with a success.
 * 4-	The Matching Engine will sends the matching result to the caller object/instance
 * 5-	The use case ends with a success.


 * Alternate Workflow:
 * Open: Assumptions
 * •	We assumed that all data validations are handled by UI, so matching Engine does not handle empty fields, incorrect data types, and other invalid data issues.
 * •	The notification messages/alerts/traps are done by the UI engine
 * •	The matching Engine system is always running every 3 hours, where many threads lookup the matches and sleep again.


 * a.	What if matching result is less than 62.5 %
 * 1.	The Matching Engine must notify the caller object that no match found for the requested crush target.
 * 2.	The UI/Browser must notify the user about the NO Match Found result.
 * 3.	The UI/Browser let the user know that the search information will be kept for future reference.
 * 4.	The user can be notified if the target user signs up in the future by loop to step 1 & 2


 * b.	What if no match found at all? ( 0 % match)
 * 1.	Database responses to the Match Engine about no match found
 * 2.	The Match Engine response to the UI with no match found message
 * 3.	The UI will let the user knows that his/her search information will be kept for future use.

Note: The second person is the one target the match, only if the second person expressed his/her interest on the first person.
 * c.	What if the targeted match signs up later? (Crush/lost family member, long lost love)
 * 1.	The Matching Engine will send the found match to the requester object
 * 2.	The UI will notify the match requester about the found match via email, text, and message on user notification page.


 * d.	Test close match case, where match is right logical, but actually wrong
 * 1.	Having more than 2 people with identical 8 attributes