okay are are about the non here one my name's is on you and uh
i'm from national institute of informatics
uh in japan
so today a uh this talk is on our recent work
but i know use a temporal recurrence hashing our reason
uh for mining commercials
from a to a string
so come from mining is an important uh preprocessing task you know you though
uh a cost "'em" at i an can and uh uh market research
it i'm i'm at detecting and localising a
duplicate commercial sequences from a large scale
uh a video archive
or to scheme and a
one month
us archive there could be
chains
oh of uh thousands of uh a duplicate
uh commercial sequences
so many
uh detecting
and the localising these so many uh commercial sequences is too time consuming and uh
uh labour intensive
so an automatic
a commercial mining technique is needed
so one direction of commercial mining
is known in is not each based uh commercial mining it use it the
uh the intrinsic uh
characteristics of commercials
forty as a up to an eight for detecting on
for instance
a a use some can trees
the T V stations may uh at some
one a monochrome
or silence frames uh into that under speech into neighbouring commercial segments
so if we can detect the position of these frames then we can use it to get the
uh duration uh i mean the location of the commercial sequence
so most
and no each based
a techniques are uh efficient
but it's not generate enough because of that data dependent
uh a knowledge they use
uh because these up real knowledge
uh maybe be barry this can trees and time
i where there is one up to an order that can be used for detecting commercial but to view
uh not to be scanned all times
that is
commercials are reputations
so this kept to just take you never change
and uh inspired the
another uh direction known as reputation based commercial mining
so most uh uh reputation based techniques are
and super wide
generate more generate
but uh a lot uh can board
so
in this study we proposed a simple but very effective
a call reason
uh for for the and supervised
uh generate each and out high speed commercial mining
so in this study there is no
uh training that he's not harry provide a before and what do we have a is only a very wrong
uh to tree
and uh the a priest does not depend on any prior knowledge that my
they have can't result time
and uh also the are rings is
very fast
uh for ten hour stream
uh the
person time was only a four seconds
and for one man
a
the time was this simple forty two minutes
and for for five here we do stream the processing time was this some twenty one hours
it's very far
so the proposed
uh uh we're is is
a a two-stage hashing out
and uh i i really should i we explain the a two state one by one
and before explaining the first stage by like to discuss that
difference between commercials
and and new duplicate video
and
a it's you mean know a by definition your to decades
a carriers that's all approach
to to D uh identical videos
it's videos are normally are derived from an original
deal
by means of various is transformation
such as uh and coding your train picture
and something that
and on the other hand
commercials
ah
exact duplicates derived from the original video deal on any transformation
so commercials can be considered as a special case of near to the kate we'd
so this is the main difference between commercials and near to K
and in the case of commercials
uh the fragments
for example the frames all the shops sub shots
oh the but videos can be translated into
uh a a be every compact
uh in
so that identical fragments across that duplicates can be mapped to exactly the same thing to for
so based on this assumption if we insert
the fragments
i into a hash table by regarding the fingerprint as a you but in in X
uh a a hash collision
we occur in the corresponding has pocket
so
uh
the duplicate
uh fragments can be easily detect it
by based on a
uh could each a tech process
so these assumptions
uh
uh not to reasonable for in the case of near to
as but only reasonable for exact duplicates like commercial
so in this study we propose
applying a luminance based fingerprint stage she to the be do stream and also a light
uh
and use it all your hashing
technique to the audio stream
and we didn't
test all
the the you existing techniques but to be believe that the
the proposed our reason is performance
uh you by rent to with a
fingerprint sticky so any existing one can you
so we apply these two fingerprinting straight is two
all frames of the but
and uh
so here the same color indicates
uh frames
uh with the same fingerprint
and uh the canteen hours frames
with the same finger
uh a a and boat into a fraction
so
is a huge hole based on the
the condition shouldn't the hash collision uh a tech that's we can use the to detect duplicate fragments
but please note that
the goal of commercial mining is not to detect
these do you keep experiments but to detect
do P eight
see
a uh the commercial sequence is normally composed of
uh site chains all few hundred all
a fragment
so here we read got
uh that duplicate fragment parents that the basic unit
and a project them to of the time X
uh from this figure we can also so strong temporal consistency
uh among these pairs
for instance the
uh positions of the fragments of consecutive
and of the temporal interval
each in each to fragment
almost the same
so this kind of temporal consistency is very useful for distinguishing
duplicate sequences from non duplicate one
and the the time more time
the commercial mining a task can be formulated into a
searching for
duplicate fragment carrots with high temporal consistent
so one sort of and to this is to a i
a a pairwise matching based on temporal information to all
pairs scope duplicate right
so people and P do you know the number of the
the confusion cost
is in your to the scale and P
and we can see and P stomp be a very very large number
and uh a little or
uh can can in cost can be obtained by of lighting pair
was making based on temporal information
to all
uh all sets
of duplicate fragments
uh so give a and all you noting the that's
oh
this this actually
and all use than then all the
uh a the are of beans in the hash table
and the sake even
and no you know from this so the condition cost
a unit know to the sky or and all it's
so you
uh not
you you shouldn't enough
and uh
the besides size is two solutions there is another interesting
uh study which of flights of fragment only
to with that you paid for and pair
and the
but
in your two and P
uh
which is very efficient
but the because the single operation cost
oh with the
fragment growing straight is
five so the overall process time in this case
almost
a that in the previous of
so
in this study you propose applying a second stage had she to that duplicate that paris
so that's the computer cost
can be in your two
and P
we uh a lower single operation
a you're of B we got each duplicate we can then to pair at the basic unit
and uh we propose
two hash
functions
uh to translate the temporal information into
in prince
so the first fingerprints is the temporal position
a more right
and in this case the just it is that to many it's so that uh uh uh as get of
a fragments
can be a a mapped to the same it is the neighbouring ring finger
and uh the second you print is the temporal interval
between the two of fragment
and uh the is that second
and based on these
to different
all pairs of
uh
a duplicate for not pairs
i insert it into a two dimensional hash table
and the
uh but doing so that duplicate frame paris
with high temporal
is this since C can be ultimate sent into the same
B
so that the time-consuming cameras making can be
avoid
and the to detect a the high temporal sit
we
a a used would uh
recurrence hashing histogram
from the hash table
and uh because that you hate
a a friend of pairs with fight "'em" consistency
have been
but same boat into the same B so this in normally
form a local maxima
uh in this
it
and in this case the been embedding
uh indicates that temporal duration of duplicate
uh second
so you eight
sick is can be easily detected by
uh searching for local mixing
from
uh hashing kids
so this fall for the explanation of the proposed are reason
and uh it's very simple but it's
because we
didn't you ever making on this yeah
each you face and here
and is that um
hash table
so that can see in if the be note and B which is much
a lower than that of related stuff
and we
uh a better at the actress
oh
the proposed a reason by using a ten hour as
and also uh one man
stream and uh five years being were used for evaluating the vision
and uh
we i was
you in both C
and frame level
the sequence level really phase how this side
our
yeah
uh detect and the identify the commercial segments
and of the frame that
in right
uh for precise at that
are can local
a commercial sick
for example four
from each frame the sequence start
and the uh to be trained the sequence in
of the results uh some right
this table
uh we implemented to state of the art that studies for comparison
the first one for a
right
i five ring of light
uh
pairwise matching matching to with a duplicate kate for the pair
and the second one proposed by green
uh
uh of lights
uh pairwise matching to
uh
all sets of right
or we can say all
the nonzero the overall
hash
it's
uh
okay
and uh
able T are straight here in case our camp roll
recurrence hashing of
and a B here in the case of video to mine
and uh for the statistics
uh a P R F
a respectively precision recall or and F one score
uh the sub fix
S
and uh
uh the subjects as
means the frame the sequence table and F for the frame level
find the T here in place of for that simple
so from this table we can see that to
uh our or
uh uh of formant
the baseline
all right
and and especially the much sort of the time
uh them straight i
uh oh
the
besides also implemented a an existing
all you hashing technique for
for reason
and again in this case
uh the
propose are is out of from that the uh baseline
uh uh for all current you have
uh
well yeah i've got introduce that's uh this baseline of light
the frame
uh a role in speech to that you kate
a fragment pair
and uh is not that
uh up to this point to we
in D V really of like a lot to the video and audio stream
so one question here
what we have to you be in the great
these two streams
uh so
uh the we and audio streams can and the reading
uh it
frame level uh in the print the able or is
and for efficiency and the scalability
uh reason we propose in the region
uh and integration add to that is a
so from these table you of the of that's to the sequence level rick or
the sequence that we also almost
one hundred present in both places
so this is my a to you in section variation to combine the detected a commercial second
and that would or the
uh rate of false alarms
uh
results in the number all misses
and on that of the vision is that uh a uh in the case of frame level of innovation
the preceding a is your T five or then the recall in both we do and
all of is
and this of this uh i by task
a union
vision
to combine the detected
commercials
a frame
and uh
so
uh
the results a some fries in this table and we can see that's to
uh the
six as the devil it once for you to
uh ninety eight point one percent
and the frame level if one score in
uh
intro two
ninety seven one four was
uh so
the them and without yeah demonstrated to the vector
oh the proposed uh in separation street
oh whether be applied
uh the i reason to the one month we stream
and the process and find was this and fifty
uh i it's for we do and this then
uh forty to me
uh for audio
so this again them stated that the height of regions
oh
our
find the we applied to
a of you applied our our isn't to a five here year string
this stream was divide
uh in into sixty one month sub streams
and uh our our reason was in D V do already of to each sub string
and we
for formant uh how low computing
and uh
conducted a the recipes
for used fifteen months
for spread
and the the final process time was this and twenty one
oh
you be that the person this you will be don't can see
the commercial mining
commercial detection
we just some of the
to take cost
uh at least to five
five years for us to
so
what i want to say
our our
yeah
so we also conducted some
statistics
for each detected most
so
uh
we hope this that he things could be have
market research
uh and uh
commercial producers and uh uh complete company
so
this example is a beer promotion
uh for this he's around the horizontal axis indicates the final day
and the was co wine case the for a
three
so from this speaker we can see that a for this commercial there was
no broadcast from two em to yeah yeah
and actually this very
somehow somehow go to stand
so i of another one
in this case
the
horizontal axis
also in case that i'm of that
but the what cool one indicates a they a week from a send a man they
to set a
so we can have so of a real data
shape from
actually when be for long as before
a might a
but actually know
uh this very yeah data
shape was also from from all other
i'll call related commercials
and we group of this find a be found the reason
actually this is because in japan
there is of what entry restriction
reach probably bits
the pro house
uh of code or oracle
our whole commercials from
a five yeah
two
five P M
on the
and from five em to you have i am
get
so this is one of them
and a example is of she's actually
and uh we can see that this my thought was
most
a to be row house
a a wrong
five T M
actually a
so this i believe D V believe this is because
uh
five P M is of fine or the white
i i with two
what in or
and the commercial have them
uh no don't need to just by or Z
or something
and the the third example is a
how a commercial
and uh we can also the time zones is high
a broad cross frequency
uh for instance uh problem
six the M two
from six am to
eight A and than most fathers
having reference
and the
a
scalable clock we that stop
and uh after six P M
uh
when most fathers have finished their work and watching T V
um
so
uh actually these observations can also be observed in this figure
and the one explanation of this could be the car
is
okay it towards uh
uh a rather than you know five
and the not the explanation for be
i know you most uh uh a japanese and the father Y
i mean you in the money so the card either
a a pretty for most
uh popular T V vol
for
mail than female what
uh
so so that time limitation i'm going to us keep
the can i like
the
this will you for a kind of change
we
a a i have a as we can jump then use a and not advertising companies going to use your
technique need to to do some the marketing research
uh
currently be no yeah but to be a a considering to contact low companies
and C uh better the are interested in our research
and a normally i
i i i don't think from things are interested T in this but uh
uh actually there are more
some commercial producers
uh who
uh
that would interest in this work and the contract cost we are considering each a about the corporation of the
collaboration
uh
true
maybe we can provide some
uh so is
uh for
so for so and i is or something like that
or most a search yeah
thank you the set point is stuff i i'm standing are doing the commercial we you pattern and discovery right
so uh i think to extend you want to discover some out types also deals
something like a
um
suppose we you all some
do use we do
yeah i and the should be do is a very special types of would be to with a specialised up
at patterns
uh you mean
for detecting is
duplicate be deal yeah all other oh i we maybe
um of L out to depends on the type of that you could be was that you want to detect
for instance
uh in case of sports D also uh
suppose we deal in "'cause" sparse we do they are some rib or cost of the sports aims for video
uh of course
a lot of channels that means in this case the the
the but also almost same but in the news broadcast
the the the the
T V then use program produce a make at some
uh a oh out or
text
captions in to the B also that
the fingerprinting based
straight she is not suitable for in this case
and more robust
uh
technique
is needed so this why i i
i pointed out that that's this work is quite difficult is quite different from
uh do near kate
a detection it's a kind of exact duplicate detection
okay i to so much
thank you