IVA Data Matching Guidelines

IVA Matching Overview

Data matching is a difficult process, and IVA has spent a long time developing and evolving our matching procedure. If you do decide to implement your own matching solution instead of allowing us to handle the hard work,  provided below is IVA’s matching procedure outlined by media vertical. Using this procedure along with our matching guidelines and normalization rules listed above,  will help to guarantee accurate and consistent data matching between IVA and our data partners.


Step 1. Normalize our third party title using the normalization rules outlined above. Second, we search the IVA’s catalogue of normalized title’s for title’s that match to our third party title.

Step 2. For each of our potential IVA matches from Step 1, Validate the release year for both IVA title and the third party release year are within a year apart. For example, if IVA’s release year is 2011 and the third party release year is 2013, we would immediately skip this title. A simple algorithm for this, with M1 as IVA year and M2 as third party release year,

if ( Math.abs(M1 – M2) <=1 ) #Subtract both values and take the absolute value of the result. If the result is less than or equal to 1, proceed to Step 3.
Step 3. Next, we validate the performer’s and director’s for the IVA title’s that passed Step 2. For each of our matches from Step 2, use our normalization rule’s outlined above for the cast and director of both IVA title’s and third party data. An IVA title will pass this step if at least one performer OR director matches against the third party cast / director data.

TV Series

Special Note: TV Matching follows a hierarchial order. We ALWAYS start with the series, then match Seasons, and finally match episodes.

Step 1.Follow Step 1 under Movies. Note: Specifically use the Series Title.

Step 2.Follow Step 2 under Movies. Note: Specifially use the Series release year.

Step 3.Any title’s left from Step 2 is considered a match.

TV season

(Season Sequence number is required for Season matching.)

Step 1.Using our series match from TV Series Step 3, grab all IVA seasons under the IVA series record.

Step 2.For each season from Step 1, look for the IVA season with the same season sequence number as our third party season sequence number.

Step 3.The IVA season with the same season sequence number is our season match.

TV Episode

(Episode Sequence number is required for Episode matching.)

Step 1.Using our season match from TV Season Step 3, grab all IVA episodes under the IVA season record.

Step 2.For each episode from Step 1, look for the IVA episode with the same episode sequence number as our third party episode sequence number.

Step 3.The IVA episode with the same episode sequence number is our episode match.


Special Note: Games are NOT matched specific to platform.

Step 1.Follow Step 1 under Movies.

Step 2.Follow Step 2 under Movies

Step 3.The title that is left from Step 2 is our Game match.

Data Matching

IVA makes it easy to use a third party data provider because we match IVA content to all the major providers.  To facilitate your data matching requirements, IVA has created these Data Matching Guidelines.

Users who choose  to perform their own data matching should query IVA’s OData API for the information listed in the tables below. The more data points you have the better the match is going to be with fewer anomalies returned in the results.   If you are getting a table from us you will need to switch your ID for the IVA Published ID.  An example of a data query is shown after the last table.

IVA does offer a custom data matching service for which additional fees apply. Contact sales for additional information regarding the cost of this service.  If you prefer to have IVA perform a data match you will need to  provide us with a table containing the following data for each media type.


  • Your ID
  • Title
  • First Released Year ( check +/- 1 year as this field can vary by data provider)
  • Director
  • Cast
  • Media Type (Movie Scene/Clip, Movie Trialer,  Movie Alternate Trailer)


  • Your ID
  • Title (Series)
  • Season number
  • Episode Title
  • Sequence (Episode sequence number: i.e. Season 3 Episode 5 = 5 )
  • Director
  • Cast
  • Year


  • Title
  • Year


  • Title
  • Performer
  • Year
  • Album

Example Queries for Matching

Linq: (From e in EntertainmentPrograms where e.MediaId = 0 select e.Title, e.PublishedId, e.FirstReleasedYear, e.Director.FullName).Skip(5000).Take(100)

URL: http://api.internetvideoarchive.com/2.0/DataService/EntertainmentPrograms()?$filter=MediaId eq 0&$skip=5000&$top=100&$expand=Director&$select=Title,Publishedid,FirstReleasedYear,Director/FullName&developerid={yourID}

Best Practices for Using the Normalized Title in our Data APIs

Inside our Data API’s are titles that have been normalized according to the rules listed below.  The rules are subject to change without notice.  IVA recommends using this only if you have no other data option. IVA dedicates a lot of resources to matching our titles to 3rd party databases and ID’s.  We work with some of the biggest data providers in the world.  If you are already matched to one of our other data partners we would suggest using their ID’s to match using the Pinpoint API .

Normalization Rules

Note: Lists start with { and end with }.Lists are delimited by pipes. Spaces will be indicated by _
Single Quotes indicate a value is used literally, for example a list that includes the curly brace,
ex. { ‘{‘| ‘}’}


  1. Remove leading and trailing whitespace.
  2. Uppercase the entire string.
  3. Move any of the following articles to the front of the string from the end : {,_A|,_AN|,_THE}. (Note: Only one occurence is checked) Exe. Dark Knight, the becomes the dark knight.
  4. Remove any of the following leading articles, {,_A|,_AN|,_THE} Exe. The Dark Knight becomes Dark Knight
  5. Lowercase the entire string
  6. For each of the following pairs:If
    1. The string contains both parts of the pair and
    2.  The last occurrence of the left pair comes before the last occurrence of the right pair then
    3.  Return the base string starting from leftmost character and ending one character from last occurrence of leftmost pair.

    The pairs are, 1. [ ] 2.( ) 3.{ } Exe. Hi[5][6] returns Hi[5]
    Exe. Hello[] returns Hello

  7. Append a space in front and back of the word to account for trimming.
  8. Remove the following phrases from the string,
    {_an_imax_3d_experience_| _an_imax_experience_ | _the_imax_experience_|_imax_3d_experience_|_imax_3d_}
  9. Replace character & with string and.
  10. Remove leading and trailing whitespace.
  11. Uppercase the string.
  12. Repeat Step 3
  13. Repeat step 4
  14. LowerCase the string.
  15. Replace any occurence of the following strings
    {_i:|_ii:|_iii:|_iv:|_v:|_vi|_vii:|_viii|_ix:|_x:|_xi:|_xii:|} with their equivalent counterparts
  16. Using the following list of characters,
    { !| @| #| $| %| ^| *| _| +| =| ‘{‘| ‘}’| [| ]| ‘|’ | <| >| `| 😐 -| (| )| ?| /| \| & | ~| .| ,|single quote|double quote } **NOTE**: single quote means ‘ and double quote means ” respectively.
    Replace any occurence of the above characters with a blank space, exe. A string, Blink_182 becomes Blink 182
  17. Replace any numeric text to its string equivalent. For example, 1000 changes to One_Thousand ,
    6010 turns into Six_Thousand_Ten . Link to equivalent article,http://stackoverflow.com/questions/794663/net-convert-number-to-string-representation-1-to-one-2-to-two-etc
  18. LowerCase the entire string.
  19. Remove leading and trailing whitespace.
  20. Replace any occurence of the following string, {,} (comma) to a empty space. Exe. Hello, John becomes Hello John
  21.  If your string ends with the following strings
    {_i|_ii|_iii|_iv|_v|_vi|_vii|_viii|_ix|_x|_xi|_xii|}, replace them with their equivalent counterparts
  22. Repeat step 17
  23. Lowercase the string
  24. Replace Diacritics from the string with their correct counterpart. For example, strings such as é, è, ë, ê, É would all be replaced with e,e,e,e,E respectively.
  25. Remove leading and trailing whitespace.
  26. Remove the following occurences from the string, between any two characters remove any number of blank spaces and keep just one. For example, two____times will be replaced with two_times.