The GPU is halted when it hits a MMU exception, so there is no point in
waiting for the job timeout to expire or try to work out if the GPU is
still making progress in the timeout handler, as we know that the GPU
won't make any more progress.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Now that it is only used to track the driver internal state of
the MMU global and cmdbuf objects, we can get rid of this property
by making the free/finit functions of those objects safe to call
on an uninitialized object.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Instead of only tracking if the FE is running, use a enum to better
describe the various states the GPU can be in. This allows some
additional validation to make sure that functions that expect a
certain GPU state are only called when the GPU is actually in that
state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Nothing in this callpath actually touches the GPU, so there is no reason
to get it out of suspend state here. Only if runtime PM isn't enabled at
all we must make sure to enable the clocks, so the GPU init routine can
access the GPU later on.
This also removes the need to guard against the state where the driver
isn't fully initialized yet in the runtime PM resume handler.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Currently the FE is spinning way too fast when polling for new work in
the FE idleloop. As each poll fetches 16 bytes from memory, a GPU running
at 1GHz with the current setting of 200 wait cycle between fetches causes
80 MB/s of memory traffic just to check for new work when the GPU is
otherwise idle, which is more FE traffic than in some GPU loaded cases.
Significantly increase the number of wait cycles to slow down the poll
interval to ~30µs, limiting the FE idle memory traffic to 512 KB/s, while
providing a max latency which should not hurt most use-cases. The FE WAIT
command seems to have some unknown discrete steps in the wait cycles so
we may over/undershoot the target a bit, but that should be harmless.
If the GPU core base frequency is unknown keep the 200 wait cycles as
a sane default.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Sui Jingfeng <suijingfeng@loongson.cn>
Tested-by: Sui Jingfeng <suijingfeng@loongson.cn>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
The fence lock currently protects two distinct things. It protects the fence
IDR from concurrent inserts and removes and also keeps drm_sched_job_arm and
drm_sched_entity_push_job in one atomic section to guarantee the fence seqno
monotonicity. Split the lock into those two functions.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
The MMU tells us the fault status. While the raw register value is
already printed, it's a bit more user friendly to translate the
fault reasons into human readable format.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
There is no reason to use page based mappings, as the established
mappings are special driver mappings anyways and should not be
handled like normal pages.
Be consistent with what other drivers do and use raw PFN based
mappings.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When a idle BO, which is held open by another process, gets freed by
userspace and subsequently referenced again by e.g. importing it again,
userspace may assign a different softpin VA than the last time around.
As the kernel GEM object still exists, we likely have a idle mapping
with the old VA still cached, if it hasn't been reaped in the meantime.
As the context matches, we then simply try to resurrect this mapping by
increasing the refcount. As the VA in this mapping does not match the
new softpin address, we consequently fail the otherwise valid submit.
Instead of failing, reap the idle mapping.
Cc: stable@vger.kernel.org # 5.19
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
The same logic is already used in two different places and now
it will also be needed outside of the compilation unit, so split
it into a separate function.
Cc: stable@vger.kernel.org # 5.19
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Track the pid per submit, so we can print the name and cmdline of
the task which submitted the batch that caused the gpu to hang.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Right now the only point where softpin mappings get removed from the
MMU context is when the mapped GEM object is destroyed. However,
userspace might want to reuse that address space before the object
is destroyed, which is a valid usage, as long as all mapping in that
region of the address space are no longer used by any GPU jobs.
Implement reaping of idle MMU mappings that would otherwise
prevent the insertion of a softpin mapping.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Acked-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The flush sequence is a marker that the page tables have been changed
and any affected TLBs need to be flushed. Move the flush_seq increment
a little further down the call stack to place it next to the actual
page table manipulation. Not functional change.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Acked-by: Guido Günther <agx@sigxcpu.org>
This makes it a little more clear that the mapping holds a reference
to the context once the buffer has been successfully mapped into that
context and simplifies the error handling a bit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Acked-by: Guido Günther <agx@sigxcpu.org>
The STLB and the first command buffer (which is used to set up the TLBs)
has a 32 bit size restriction in hardware. There seems to be no way to
specify addresses larger than 32 bit. Keep it simple and restict the
addresses to the lower 4 GiB range for all coherent DMA memory
allocations.
Please note, that platform_device_alloc() will initialize dev->dma_mask
to point to pdev->platform_dma_mask, thus dma_set_mask() will work as
expected.
While at it, move the dma_mask setup code to the of_dma_configure() to
keep all the DMA setup code next to each other.
Suggested-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
etnaviv_iommu_find_iova has it so etnaviv_iommu_insert_exact and
lockdep_assert_held should have it as well.
Signed-off-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[ Upstream commit 20faf2005ec85fa1a6acc9a74ff27de667f90576 ]
gpu->mmu_context is the MMU context of the last job in the HW queue, which
isn't necessarily the same as the context from the bad job. Dump the MMU
context from the scheduler determined bad submit to make it work as intended.
Fixes: 17e4660ae3d7 ("drm/etnaviv: implement per-process address spaces on MMUv2")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>