0:00:18okay are are about the non here one my name's is on you and uh
0:00:23i'm from national institute of informatics
0:00:25uh in japan
0:00:26so today a uh this talk is on our recent work
0:00:30but i know use a temporal recurrence hashing our reason
0:00:33uh for mining commercials
0:00:35from a to a string
0:00:38so come from mining is an important uh preprocessing task you know you though
0:00:42uh a cost "'em" at i an can and uh uh market research
0:00:46it i'm i'm at detecting and localising a
0:00:49duplicate commercial sequences from a large scale
0:00:52uh a video archive
0:00:54or to scheme and a
0:00:55one month
0:00:56us archive there could be
0:00:59oh of uh thousands of uh a duplicate
0:01:02uh commercial sequences
0:01:04so many
0:01:05uh detecting
0:01:06and the localising these so many uh commercial sequences is too time consuming and uh
0:01:11uh labour intensive
0:01:13so an automatic
0:01:14a commercial mining technique is needed
0:01:18so one direction of commercial mining
0:01:20is known in is not each based uh commercial mining it use it the
0:01:25uh the intrinsic uh
0:01:26characteristics of commercials
0:01:28forty as a up to an eight for detecting on
0:01:31for instance
0:01:33a a use some can trees
0:01:34the T V stations may uh at some
0:01:37one a monochrome
0:01:38or silence frames uh into that under speech into neighbouring commercial segments
0:01:44so if we can detect the position of these frames then we can use it to get the
0:01:49uh duration uh i mean the location of the commercial sequence
0:01:54so most
0:01:55and no each based
0:01:56a techniques are uh efficient
0:01:58but it's not generate enough because of that data dependent
0:02:02uh a knowledge they use
0:02:04uh because these up real knowledge
0:02:06uh maybe be barry this can trees and time
0:02:11i where there is one up to an order that can be used for detecting commercial but to view
0:02:16uh not to be scanned all times
0:02:19that is
0:02:20commercials are reputations
0:02:22so this kept to just take you never change
0:02:24and uh inspired the
0:02:26another uh direction known as reputation based commercial mining
0:02:31so most uh uh reputation based techniques are
0:02:35and super wide
0:02:36generate more generate
0:02:38but uh a lot uh can board
0:02:42in this study we proposed a simple but very effective
0:02:46a call reason
0:02:47uh for for the and supervised
0:02:50uh generate each and out high speed commercial mining
0:02:53so in this study there is no
0:02:55uh training that he's not harry provide a before and what do we have a is only a very wrong
0:03:01uh to tree
0:03:02and uh the a priest does not depend on any prior knowledge that my
0:03:07they have can't result time
0:03:10and uh also the are rings is
0:03:12very fast
0:03:13uh for ten hour stream
0:03:15uh the
0:03:16person time was only a four seconds
0:03:19and for one man
0:03:21the time was this simple forty two minutes
0:03:23and for for five here we do stream the processing time was this some twenty one hours
0:03:28it's very far
0:03:30so the proposed
0:03:31uh uh we're is is
0:03:33a a two-stage hashing out
0:03:35and uh i i really should i we explain the a two state one by one
0:03:40and before explaining the first stage by like to discuss that
0:03:43difference between commercials
0:03:45and and new duplicate video
0:03:48a it's you mean know a by definition your to decades
0:03:51a carriers that's all approach
0:03:54to to D uh identical videos
0:03:57it's videos are normally are derived from an original
0:04:01by means of various is transformation
0:04:03such as uh and coding your train picture
0:04:06and something that
0:04:08and on the other hand
0:04:11exact duplicates derived from the original video deal on any transformation
0:04:16so commercials can be considered as a special case of near to the kate we'd
0:04:21so this is the main difference between commercials and near to K
0:04:24and in the case of commercials
0:04:26uh the fragments
0:04:28for example the frames all the shops sub shots
0:04:30oh the but videos can be translated into
0:04:33uh a a be every compact
0:04:35uh in
0:04:36so that identical fragments across that duplicates can be mapped to exactly the same thing to for
0:04:42so based on this assumption if we insert
0:04:45the fragments
0:04:46i into a hash table by regarding the fingerprint as a you but in in X
0:04:50uh a a hash collision
0:04:51we occur in the corresponding has pocket
0:04:55the duplicate
0:04:57uh fragments can be easily detect it
0:04:59by based on a
0:05:01uh could each a tech process
0:05:03so these assumptions
0:05:05uh not to reasonable for in the case of near to
0:05:08as but only reasonable for exact duplicates like commercial
0:05:13so in this study we propose
0:05:15applying a luminance based fingerprint stage she to the be do stream and also a light
0:05:21and use it all your hashing
0:05:23technique to the audio stream
0:05:25and we didn't
0:05:26test all
0:05:27the the you existing techniques but to be believe that the
0:05:31the proposed our reason is performance
0:05:33uh you by rent to with a
0:05:35fingerprint sticky so any existing one can you
0:05:39so we apply these two fingerprinting straight is two
0:05:43all frames of the but
0:05:45and uh
0:05:46so here the same color indicates
0:05:49uh frames
0:05:49uh with the same fingerprint
0:05:52and uh the canteen hours frames
0:05:54with the same finger
0:05:56uh a a and boat into a fraction
0:06:00is a huge hole based on the
0:06:03the condition shouldn't the hash collision uh a tech that's we can use the to detect duplicate fragments
0:06:08but please note that
0:06:09the goal of commercial mining is not to detect
0:06:12these do you keep experiments but to detect
0:06:14do P eight
0:06:17a uh the commercial sequence is normally composed of
0:06:20uh site chains all few hundred all
0:06:23a fragment
0:06:24so here we read got
0:06:26uh that duplicate fragment parents that the basic unit
0:06:29and a project them to of the time X
0:06:32uh from this figure we can also so strong temporal consistency
0:06:36uh among these pairs
0:06:37for instance the
0:06:39uh positions of the fragments of consecutive
0:06:41and of the temporal interval
0:06:43each in each to fragment
0:06:45almost the same
0:06:46so this kind of temporal consistency is very useful for distinguishing
0:06:50duplicate sequences from non duplicate one
0:06:54and the the time more time
0:06:56the commercial mining a task can be formulated into a
0:06:59searching for
0:07:00duplicate fragment carrots with high temporal consistent
0:07:04so one sort of and to this is to a i
0:07:07a a pairwise matching based on temporal information to all
0:07:11pairs scope duplicate right
0:07:13so people and P do you know the number of the
0:07:16the confusion cost
0:07:17is in your to the scale and P
0:07:20and we can see and P stomp be a very very large number
0:07:23and uh a little or
0:07:25uh can can in cost can be obtained by of lighting pair
0:07:28was making based on temporal information
0:07:31to all
0:07:32uh all sets
0:07:33of duplicate fragments
0:07:35uh so give a and all you noting the that's
0:07:39this this actually
0:07:40and all use than then all the
0:07:43uh a the are of beans in the hash table
0:07:45and the sake even
0:07:47and no you know from this so the condition cost
0:07:50a unit know to the sky or and all it's
0:07:52so you
0:07:53uh not
0:07:54you you shouldn't enough
0:07:55and uh
0:07:56the besides size is two solutions there is another interesting
0:08:00uh study which of flights of fragment only
0:08:04to with that you paid for and pair
0:08:05and the
0:08:08in your two and P
0:08:10which is very efficient
0:08:11but the because the single operation cost
0:08:15oh with the
0:08:16fragment growing straight is
0:08:18five so the overall process time in this case
0:08:22a that in the previous of
0:08:26in this study you propose applying a second stage had she to that duplicate that paris
0:08:32so that's the computer cost
0:08:34can be in your two
0:08:35and P
0:08:36we uh a lower single operation
0:08:41a you're of B we got each duplicate we can then to pair at the basic unit
0:08:46and uh we propose
0:08:48two hash
0:08:49uh to translate the temporal information into
0:08:52in prince
0:08:54so the first fingerprints is the temporal position
0:08:57a more right
0:08:58and in this case the just it is that to many it's so that uh uh uh as get of
0:09:03a fragments
0:09:04can be a a mapped to the same it is the neighbouring ring finger
0:09:08and uh the second you print is the temporal interval
0:09:12between the two of fragment
0:09:14and uh the is that second
0:09:17and based on these
0:09:18to different
0:09:20all pairs of
0:09:22a duplicate for not pairs
0:09:24i insert it into a two dimensional hash table
0:09:27and the
0:09:28uh but doing so that duplicate frame paris
0:09:31with high temporal
0:09:32is this since C can be ultimate sent into the same
0:09:36so that the time-consuming cameras making can be
0:09:40and the to detect a the high temporal sit
0:09:44a a used would uh
0:09:45recurrence hashing histogram
0:09:47from the hash table
0:09:49and uh because that you hate
0:09:51a a friend of pairs with fight "'em" consistency
0:09:54have been
0:09:55but same boat into the same B so this in normally
0:09:58form a local maxima
0:10:00uh in this
0:10:02and in this case the been embedding
0:10:04uh indicates that temporal duration of duplicate
0:10:07uh second
0:10:09so you eight
0:10:10sick is can be easily detected by
0:10:12uh searching for local mixing
0:10:15uh hashing kids
0:10:17so this fall for the explanation of the proposed are reason
0:10:20and uh it's very simple but it's
0:10:23because we
0:10:24didn't you ever making on this yeah
0:10:27each you face and here
0:10:29and is that um
0:10:30hash table
0:10:31so that can see in if the be note and B which is much
0:10:34a lower than that of related stuff
0:10:38and we
0:10:39uh a better at the actress
0:10:42the proposed a reason by using a ten hour as
0:10:45and also uh one man
0:10:47stream and uh five years being were used for evaluating the vision
0:10:52and uh
0:10:54we i was
0:10:55you in both C
0:10:57and frame level
0:10:58the sequence level really phase how this side
0:11:03uh detect and the identify the commercial segments
0:11:06and of the frame that
0:11:07in right
0:11:09uh for precise at that
0:11:10are can local
0:11:12a commercial sick
0:11:13for example four
0:11:14from each frame the sequence start
0:11:16and the uh to be trained the sequence in
0:11:20of the results uh some right
0:11:22this table
0:11:23uh we implemented to state of the art that studies for comparison
0:11:27the first one for a
0:11:30i five ring of light
0:11:32pairwise matching matching to with a duplicate kate for the pair
0:11:35and the second one proposed by green
0:11:39uh of lights
0:11:40uh pairwise matching to
0:11:43all sets of right
0:11:45or we can say all
0:11:46the nonzero the overall
0:11:51and uh
0:11:52able T are straight here in case our camp roll
0:11:55recurrence hashing of
0:11:57and a B here in the case of video to mine
0:12:01and uh for the statistics
0:12:03uh a P R F
0:12:05a respectively precision recall or and F one score
0:12:08uh the sub fix
0:12:11and uh
0:12:13uh the subjects as
0:12:15means the frame the sequence table and F for the frame level
0:12:18find the T here in place of for that simple
0:12:22so from this table we can see that to
0:12:24uh our or
0:12:25uh uh of formant
0:12:26the baseline
0:12:27all right
0:12:29and and especially the much sort of the time
0:12:32uh them straight i
0:12:35uh oh
0:12:38besides also implemented a an existing
0:12:41all you hashing technique for
0:12:43for reason
0:12:44and again in this case
0:12:45uh the
0:12:47propose are is out of from that the uh baseline
0:12:50uh uh for all current you have
0:12:53well yeah i've got introduce that's uh this baseline of light
0:12:57the frame
0:12:58uh a role in speech to that you kate
0:13:00a fragment pair
0:13:02and uh is not that
0:13:04uh up to this point to we
0:13:05in D V really of like a lot to the video and audio stream
0:13:09so one question here
0:13:11what we have to you be in the great
0:13:13these two streams
0:13:15uh so
0:13:17uh the we and audio streams can and the reading
0:13:20uh it
0:13:21frame level uh in the print the able or is
0:13:25and for efficiency and the scalability
0:13:28uh reason we propose in the region
0:13:30uh and integration add to that is a
0:13:33so from these table you of the of that's to the sequence level rick or
0:13:38the sequence that we also almost
0:13:41one hundred present in both places
0:13:43so this is my a to you in section variation to combine the detected a commercial second
0:13:49and that would or the
0:13:51uh rate of false alarms
0:13:54results in the number all misses
0:13:57and on that of the vision is that uh a uh in the case of frame level of innovation
0:14:01the preceding a is your T five or then the recall in both we do and
0:14:05all of is
0:14:06and this of this uh i by task
0:14:09a union
0:14:10to combine the detected
0:14:13a frame
0:14:14and uh
0:14:18the results a some fries in this table and we can see that's to
0:14:22uh the
0:14:23six as the devil it once for you to
0:14:25uh ninety eight point one percent
0:14:28and the frame level if one score in
0:14:31intro two
0:14:32ninety seven one four was
0:14:33uh so
0:14:34the them and without yeah demonstrated to the vector
0:14:38oh the proposed uh in separation street
0:14:41oh whether be applied
0:14:43uh the i reason to the one month we stream
0:14:46and the process and find was this and fifty
0:14:49uh i it's for we do and this then
0:14:52uh forty to me
0:14:53uh for audio
0:14:54so this again them stated that the height of regions
0:15:01find the we applied to
0:15:02a of you applied our our isn't to a five here year string
0:15:06this stream was divide
0:15:08uh in into sixty one month sub streams
0:15:11and uh our our reason was in D V do already of to each sub string
0:15:16and we
0:15:17for formant uh how low computing
0:15:19and uh
0:15:21conducted a the recipes
0:15:22for used fifteen months
0:15:24for spread
0:15:25and the the final process time was this and twenty one
0:15:29you be that the person this you will be don't can see
0:15:32the commercial mining
0:15:34commercial detection
0:15:35we just some of the
0:15:37to take cost
0:15:38uh at least to five
0:15:40five years for us to
0:15:43what i want to say
0:15:44our our
0:15:48so we also conducted some
0:15:51for each detected most
0:15:55we hope this that he things could be have
0:15:58market research
0:15:59uh and uh
0:16:01commercial producers and uh uh complete company
0:16:05this example is a beer promotion
0:16:07uh for this he's around the horizontal axis indicates the final day
0:16:13and the was co wine case the for a
0:16:16so from this speaker we can see that a for this commercial there was
0:16:20no broadcast from two em to yeah yeah
0:16:24and actually this very
0:16:25somehow somehow go to stand
0:16:27so i of another one
0:16:29in this case
0:16:31horizontal axis
0:16:32also in case that i'm of that
0:16:34but the what cool one indicates a they a week from a send a man they
0:16:39to set a
0:16:40so we can have so of a real data
0:16:43shape from
0:16:45actually when be for long as before
0:16:47a might a
0:16:49but actually know
0:16:51uh this very yeah data
0:16:53shape was also from from all other
0:16:56i'll call related commercials
0:16:58and we group of this find a be found the reason
0:17:00actually this is because in japan
0:17:02there is of what entry restriction
0:17:05reach probably bits
0:17:06the pro house
0:17:07uh of code or oracle
0:17:08our whole commercials from
0:17:10a five yeah
0:17:12five P M
0:17:13on the
0:17:14and from five em to you have i am
0:17:18so this is one of them
0:17:20and a example is of she's actually
0:17:23and uh we can see that this my thought was
0:17:26a to be row house
0:17:28a a wrong
0:17:29five T M
0:17:31actually a
0:17:32so this i believe D V believe this is because
0:17:37five P M is of fine or the white
0:17:39i i with two
0:17:40what in or
0:17:41and the commercial have them
0:17:43uh no don't need to just by or Z
0:17:45or something
0:17:47and the the third example is a
0:17:49how a commercial
0:17:51and uh we can also the time zones is high
0:17:54a broad cross frequency
0:17:56uh for instance uh problem
0:17:59six the M two
0:18:00from six am to
0:18:01eight A and than most fathers
0:18:03having reference
0:18:05and the
0:18:06scalable clock we that stop
0:18:09and uh after six P M
0:18:12when most fathers have finished their work and watching T V
0:18:18uh actually these observations can also be observed in this figure
0:18:22and the one explanation of this could be the car
0:18:26okay it towards uh
0:18:29uh a rather than you know five
0:18:31and the not the explanation for be
0:18:33i know you most uh uh a japanese and the father Y
0:18:38i mean you in the money so the card either
0:18:40a a pretty for most
0:18:42uh popular T V vol
0:18:44mail than female what
0:18:48so so that time limitation i'm going to us keep
0:18:51the can i like
0:18:54this will you for a kind of change
0:19:10a a i have a as we can jump then use a and not advertising companies going to use your
0:19:16technique need to to do some the marketing research
0:19:20currently be no yeah but to be a a considering to contact low companies
0:19:24and C uh better the are interested in our research
0:19:28and a normally i
0:19:29i i i don't think from things are interested T in this but uh
0:19:33uh actually there are more
0:19:35some commercial producers
0:19:37uh who
0:19:40that would interest in this work and the contract cost we are considering each a about the corporation of the
0:19:49maybe we can provide some
0:19:51uh so is
0:19:52uh for
0:19:54so for so and i is or something like that
0:19:56or most a search yeah
0:19:57thank you the set point is stuff i i'm standing are doing the commercial we you pattern and discovery right
0:20:03so uh i think to extend you want to discover some out types also deals
0:20:09something like a
0:20:11suppose we you all some
0:20:13do use we do
0:20:14yeah i and the should be do is a very special types of would be to with a specialised up
0:20:19at patterns
0:20:20uh you mean
0:20:21for detecting is
0:20:23duplicate be deal yeah all other oh i we maybe
0:20:26um of L out to depends on the type of that you could be was that you want to detect
0:20:32for instance
0:20:33uh in case of sports D also uh
0:20:36suppose we deal in "'cause" sparse we do they are some rib or cost of the sports aims for video
0:20:41uh of course
0:20:42a lot of channels that means in this case the the
0:20:46the but also almost same but in the news broadcast
0:20:49the the the the
0:20:49T V then use program produce a make at some
0:20:52uh a oh out or
0:20:55captions in to the B also that
0:20:57the fingerprinting based
0:20:59straight she is not suitable for in this case
0:21:01and more robust
0:21:05is needed so this why i i
0:21:08i pointed out that that's this work is quite difficult is quite different from
0:21:12uh do near kate
0:21:13a detection it's a kind of exact duplicate detection
0:21:17okay i to so much
0:21:19thank you